What is A/B test significance?
Statistical significance tells you whether the difference between your control and variant is real or just random noise. An A/B test can show a higher conversion rate for the variant purely by chance; significance testing estimates how confident you can be that the lift is genuine. The common bar is 95% confidence, meaning only a 5% chance the result is a fluke.
How this calculator works
The method (two-proportion z-test): p1 = control conversions ÷ control visitors p2 = variant conversions ÷ variant visitors pooled p = (c1 + c2) ÷ (n1 + n2) standard error = √( pooled p × (1 − pooled p) × (1/n1 + 1/n2) ) z = (p2 − p1) ÷ standard error confidence = probability from the normal distribution (two-tailed)
Relative uplift is (variant rate − control rate) ÷ control rate. A result is called significant when confidence reaches 95% or more.
How to read the result
| Confidence | Read |
|---|---|
| Under 90% | Not significant; keep the test running |
| 90 to 95% | Promising, not conclusive |
| 95% or higher | Statistically significant at the standard bar |
Common A/B testing mistakes
- Stopping early. Confidence fluctuates; peeking and stopping at the first 95% inflates false positives.
- Too-small samples. Low-conversion B2B pages often need thousands of visitors per variant.
- Ignoring the uplift. A significant but tiny lift may not be worth shipping. Read confidence with the effect size.
- Short windows. Run at least one to two full business cycles to avoid day-of-week bias.
Frequently asked questions
How do you calculate A/B test significance?
With a two-proportion z-test: compute each variant's rate, the pooled standard error, and a z-score for the difference, then convert to a confidence level via the normal distribution. 95% confidence (p ≤ 0.05) is the common significance threshold.
What does statistically significant mean?
The difference is unlikely to be random. At 95% confidence there is a 5% chance it happened by luck. It does not mean the effect is large, so read it with the uplift.
How big a sample size do I need?
It depends on baseline rate and the uplift you want to detect; smaller effects need much larger samples. Low-conversion B2B pages often need thousands per variant, run over full business cycles.
Why should I not stop a test early?
Confidence fluctuates, so a test can cross 95% by chance then fall back. Decide sample size and duration in advance and let it run to avoid false positives.