Commerce Intelligence
What a Stoa store is, why it's different, and how the pieces compose into an integrated commerce intelligence system.
The Core Idea
A Stoa store is a commerce intelligence platform — not a storefront with analytics bolted on. It encodes an opinionated e-commerce operations playbook into deployable, customizable infrastructure. The playbook comes from real operational experience: Article.com (D2C, 8-figure e-commerce, full-stack data product), Game Data Pros/Warner Brothers (enterprise-scale segmentation and targeted treatment systems), and independent consulting (outdoor retailer discovery audits, review systems, data infrastructure).
The differentiator is not any single capability. It's the virtuous experimentation loop — infrastructure for the full cycle of systematically discovering customer structure through experimentation, refining understanding through qualitative feedback, and compounding improvements over time.
Not "we have A/B testing" (every Shopify app does that). Not "we have RFM segments" (that's a dbt tutorial). The differentiator is the full measure → segment → hypothesize → test → learn → act cycle, with agentic assistance at each step.
The Virtuous Loop
This is the core operating principle. Everything else exists to make this loop turn.
┌─────────────────────────────────────────────────┐
│ │
v │
SEGMENT ----> EXPERIMENT ----> TYCHE ----> DISCOVERY │
(rough cuts, (vary something (Bayesian ("variant │
evolving) for these analysis, B works │
segments) HTE) for X │
^ not Y") │
│ ┌─────────────────┤ │
│ v v │
│ REFINE SEGMENTS ASK WHY │
│ (new boundary (survey │
│ discovered) the │
│ │ group) │
│ └───────┬───────┘ │
│ v │
└──────────────── RICHER MODEL ──────────────────┘
of customer behavior
Each node in the loop is a real system component:
| Loop node | System component | What it does |
|---|---|---|
| Segment | dbt models + assignable_attributes cache | Groups visitors by behavioral/transactional/stated signals |
| Experiment | Experiment assignment framework | Delivers targeted variants to segments via storefront |
| Tyche | Python/PyMC analysis engine | Bayesian inference, heterogeneous treatment effect discovery |
| Discovery | Tyche HTE output | Surfaces segment boundaries you didn't hypothesize |
| Refine Segments | Updated dbt models, new segment definitions | Incorporates discovered boundaries into the segmentation model |
| Ask Why | Survey/VoC triggers | Targets qualitative questions at the surprising group |
| Richer Model | The segmentation model itself | Evolves through loop iterations, not built once |
The loop compounds learning. Iteration 1 uses rough segments (new vs. returning). Iteration 3 adds "took a course" because Tyche's HTE analysis showed this matters. Iteration 5 adds "price-sensitive" because a post-experiment survey explained why a segment bounced. Each pass through the loop produces better segments, which produce better experiments, which produce richer discoveries.
Four Capability Layers
Every Stoa store capability belongs to one of four layers. The layers are ordered by the customer journey but interconnected — measurement feeds understanding, understanding shapes acquisition, retention informs the next acquisition experiment.
Understand (know your customer)
The foundation. You can't optimize what you don't understand.
- Journey tracking — first touch → conversion, multi-touch attribution, time-to-purchase by source
- Customer segmentation — RFM, lifecycle stage, category affinity, behavioral cohorts (dbt models + storefront awareness)
- Voice of Customer — survey triggers (post-purchase, NPS, discovery), structured collection, feeds into segmentation
- Review content as structured data — skill level, use patterns, filterable
Acquire & Convert (guide the journey)
Every capability here is both a feature and an experiment surface. The dual-metric principle means we measure both revenue and satisfaction impact.
- Guided selling — finder quiz, persona-based entry points ("I'm new to packrafting")
- Search & discovery — sort, filter, interaction-level tracking, bounce analysis
- Cart & checkout optimization — continuity, abandonment detection, segment-aware recovery flows
- Pricing & promotion — discount display psychology, promotion real estate, bundling
- Cross-sell & recommendations — manual + data-driven, composite cart items
Retain & Grow (keep customers, increase LTV)
- Email lifecycle engine — segmented campaigns, behavioral triggers, win-back, post-purchase sequences
- Post-purchase experience — review solicitation, next-step suggestions, cross-sell based on purchase, reorder prompts
- Customer progression — "you've done X, here's Y" — general pattern that verticals instantiate
Measure & Learn (close the loop)
This layer is what makes the other three layers improve over time rather than stagnate.
- Tyche analysis engine — Bayesian inference, sequential testing, HTE discovery
- Funnel & attribution reporting — "where do people drop off, by source?", "what drove last month's sales?"
- Dashboards — weekly cadence, segment-aware, actually used
- The virtuous loop — measure → segment → hypothesize → test → learn → act, agentic assistance at each step
The Dual-Metric Principle
Every experiment measures against both revenue and satisfaction. Not one or the other — both, always.
Revenue-only optimization leads to dark patterns. Satisfaction-only optimization leaves money on the table. The interesting decisions happen when the two metrics diverge: a change that lifts revenue 5% but drops satisfaction signals a dark pattern worth investigating, not a winner to ship.
Segmentation as Practice
Segmentation is a practice, not a model. The model is a snapshot — a particular set of segment boundaries applied to visitors at a point in time. The practice is the process of the model getting better over time through the virtuous loop.
This distinction matters for implementation:
- The model is a dbt transformation that computes segment assignments from behavioral/transactional/stated signals and pushes them to a cache
- The practice is the organizational process: run experiments → Tyche discovers meaningful boundaries → incorporate boundaries into the model → run better experiments → repeat
- Infrastructure supports the practice by making model updates low-friction: add a new signal to dbt, recompute assignments, new experiments can target the new segment immediately
Starting signals for a v1 model are deliberately simple — new vs. returning, purchase history, basic engagement level. Sophistication comes from loop iterations, not from over-engineering the initial model.
Enriched Assignment Architecture
The system has two sides with different performance characteristics, connected by a cache contract.
HEAVY SIDE (batch, Python/SQL) FAST SIDE (per-request, TS)
────────────────────────────── ────────────────────────────
┌──────────┐ ┌──────────┐ ┌──────────────────────┐
│ Umami │───►│ dbt │ │ Storefront │
│ events │ │ segment │ │ (RR7 loader) │
└──────────┘ │ models │ │ │
└────┬─────┘ │ visitor_id │
┌──────────┐ │ │ │ │
│ Tyche │ ▼ │ ▼ │
│ HTE disc.│ ┌──────────┐ │ ┌──────────────┐ │
│ → new │───►│ segment │──── push ───►│ │ Redis cache │ │
│ segment │ │ assign- │ (Redis) │ │ assignable_ │ │
│ boundaries │ ments │ │ │ attributes │ │
└──────────┘ │ table │ │ └──────┬───────┘ │
└──────────┘ │ │ │
│ ▼ │
│ experiment │
│ assignment │
│ (segment-aware) │
└──────────────────────┘
The heavy side (Python, dbt, PyMC) does the computationally expensive work in batch. The fast side (TypeScript, Redis) serves per-request experiment assignments in microseconds. The contract between them is assignable_attributes — a key-value store keyed by visitor ID containing segment memberships and other targeting attributes.
Why This Matters for Non-Web-Scale Stores
Most A/B testing tooling assumes millions of visitors. Single stores (30K-300K monthly visits) have limited statistical power, low conversion rates, and high order value variance. Standard frequentist A/B tests often say "inconclusive" after weeks of waiting.
The commerce intelligence approach addresses this through:
- Bayesian inference (Tyche) — produces probability distributions, not p-values. "82% chance variant B is better" is actionable even without classical significance
- Sequential testing / optional stopping — don't wait for a fixed sample size; check continuously with proper statistical controls
- Constrained HTEs — extract meaningful signal by finding stable, targetable segments large enough to act on, even with limited total traffic
- The full loop — even when individual experiment results are noisy, the loop compounds understanding over time. Each iteration refines the model, which produces better-targeted experiments, which produce cleaner signal
The infrastructure is designed for stores where every visitor matters and every experiment needs to earn its traffic allocation.