Stoa Stack
Active Development Private Repo
A/B testing tells you averages. Stoa discovers which customers respond differently—and why—using Bayesian causal forests that run 10–60x faster than anything publicly available. Then it compounds that learning across every storefront it powers.
The Loop
The core principle is a segment → experiment → analyze → discover cycle where every pass produces better segments, which produce better experiments, which produce richer discoveries.
┌──────────────────────────────────────────────────┐
│ │
▼ │
SEGMENT ────► EXPERIMENT ────► ANALYZE ────► DISCOVER │
(who are (what should (did it (where │
our we test, work?) does it │
customers?) for whom?) differ?) │
▲ │ │
│ ┌─────────────────┤ │
│ ▼ ▼ │
│ REFINE SEGMENTS ASK WHY │
│ (new boundary (voice of │
│ discovered) customer) │
│ └────────┬────────┘ │
│ ▼ │
└──────────────── RICHER MODEL ────────────────────┘
of customer behavior
The Analysis Engine
Standard A/B testing tells you the probability of seeing your data if there's no effect. That's backwards. Stoa tells you what you actually care about: "82% chance variant B is better, expected lift $0.45/visitor."
The core is heterogeneous treatment effect discovery—automatically finding which customer segments respond differently to an intervention. This runs on a custom BCF implementation built on bartz, a JAX-based BART library. At scale, HTE discovery runs 10–60x faster than publicly available BCF methods, with a batched optimization in progress targeting another 2–3x. That speed matters: it makes simulation-based calibration practical—generating thousands of synthetic datasets with known ground truth to verify the inference engine recovers the right answers. SBC is the gold standard for validating models this complex, and it's only feasible when each run is fast.
Revenue effects are decomposed via hurdle models into conversion rate lift and spend-per-converter lift—different problems that require different responses. The full pipeline is protected by BCF prior regularization, SBC validation, and claim-level governance to prevent operator cherry-picking. More on statistical honesty →
The Stack
Each storefront shares the same analysis backbone. Insights compound across the portfolio. No per-transaction fees, no vendor lock-in—your data lives in your databases.
┌─────────────────────────────────────────────────────────┐ │ Storefront SSR, experiment-aware routing, │ │ edge deploy (Cloudflare Workers) │ ├─────────────────────────────────────────────────────────┤ │ Commerce Medusa v2, custom vertical modules │ ├─────────────────────────────────────────────────────────┤ │ Operations Odoo 17, bidirectional sync │ ├─────────────────────────────────────────────────────────┤ │ Analytics Umami + dbt → PostgreSQL │ ├─────────────────────────────────────────────────────────┤ │ Inference bartz-BCF + PyMC, hurdle models, │ │ SBC, sequential monitoring │ ├─────────────────────────────────────────────────────────┤ │ Infrastructure Docker + Caddy / Cloudflare edge │ │ €45/mo runs everything │ └─────────────────────────────────────────────────────────┘
Deep Dives
- Commerce Intelligence — the virtuous loop, capability layers, enriched assignment architecture
- Statistical Honesty — the two gardens problem, BCF, operator governance
- The Experiment Decision Journey — what questions each phase answers
- Statistical Foundations — annotated bibliography of the underlying literature
I deploy Stoa for clients and run their experimentation programs—compounding learning across engagements. If you're running e-commerce and want experimentation that actually tells you something, or you're building in this space and want to talk architecture, I'd like to hear from you.