Ecommerce Experimentation Infrastructure

A/B testing with
experimental integrity.

Check enough metrics and one wins by luck. Peek daily and stop on a good day. Slice by segment until something shines. You can make almost any test look like a winner. These tools protect the internal validity of your experiments, so that when you call something a win, you can defend it in any room.

● Live

Two-proportion z-test

SRM detection

Bonferroni correction

Peeking detection

Survival analysis

Platform calibration

Bayesian shrinkage

// 07 tools · all client-side · no data leaves your browser

The workflow

01Validate the platformonce per quarter → 02Register the testbefore it runs → ···Run the testdon't touch it → 03Check conversion speedwith the results → 04Deflate the winnerbefore announcing → 07Value subscriptionsif the goal is subscribers → 05Stamp the receiptattach to readout → 06Reconcile the ledgerevery month

01 — platform calibration

Platform Validator

Verify your A/B platform is assigning and tracking users correctly before you trust any results. Surfaces assignment SRM, asymmetric tracking loss, and consent-driven coverage gaps.

Assignment SRM via chi-squared test

Asymmetric tracking loss detection

Suspicious coverage flag

Metric balance check (Bonferroni-adjusted)

Platform trust score 0–100

02 — experiment governance

Pre-register your experiment before it runs. Lock in your hypothesis, sample size, and success metrics. Prevents p-hacking, segment fishing, and post-hoc goalpost moving.

Sample size calculator with MDE & power

Two-proportion z-test results engine

Peeking & underpowered detection

Causal integrity sidebar

Experiment history in localStorage

03 — time-to-conversion

Survival Curves

Conversion rate tells you who converts. This shows how fast, and whether the speed difference between variants is statistically real. A variant that converts faster is worth money even when the final rates look identical.

Kaplan-Meier survival curves

Log-rank significance test

Median time-to-conversion per variant

Hazard ratio with confidence interval

CSV cohort data import

04 — winner's curse & bayesian analysis

Your test came back a winner, but by how much, really? Underpowered tests that reach significance systematically overstate the effect. Deflate your result before you announce it, and see the probability your variant is actually better.

Winner's curse detection & shrinkage estimation

Bayesian P(B beats A) with sceptical priors

95% credible interval on true lift

Naive vs honest revenue projection

Frequentist vs Bayesian explainer for ecom

05 — experiment integrity report

Anyone can screenshot a dashboard and call it a win. A receipt proves the win was earned: a stamped, printable integrity report covering registration, calibration, attestations, and the honest effect estimate.

Imports registered experiments from Lockbox

7 integrity attestations, weighted A–F grade

Honest (shrunk) estimate + P(B beats A)

SHA-256 fingerprint — tamper-evident

Print-ready — attach to any test readout

06 — claimed vs realized

"You announced +40% cumulative lift this year. Why is revenue flat?" Log every shipped winner, enter your actual monthly numbers, and see whether the wins are showing up in reality. No platform builds this view, because it audits them too.

Ledger of shipped winners — claimed & honest lift

Monthly actuals from your analytics

Claimed vs honest vs actual CVR trajectory

Program realization rate

JSON export / import for backup

07 — subscription ltv & test valuation

Subscriber Value

When the test goal is subscriptions, your analytics counts a signup as one order and quietly buries your best variant. Model subscriber LTV in scenarios, get the exchange rate against one-off orders, and value the trade-off — even with zero subscription data.

LTV in three churn scenarios, not one guess

1 subscriber = X one-off orders exchange rate

Value-per-visitor comparison across arms

Break-even lifetime — no churn data needed

Industry churn benchmarks built in

// why this exists

Most ecommerce A/B testing is governance theatre. Teams pick their metric after seeing the results, peek at significance daily, and ship winners produced by platforms nobody ever bothered to calibrate. The math was never the problem.

Each tool here guards a different failure point. The Validator checks whether your testing platform is telling you the truth. Lockbox locks your hypothesis in before any data exists. Reality Check deflates inflated winners before you announce them, and the Ledger asks the uncomfortable year-end question: did any of it actually show up in revenue?

Everything runs in your browser. There are no accounts and nothing gets sent to a server, which also means you can use these on client data without asking anyone's permission.