What does statistically significant mean in an A/B test?

It means the difference between your control and variant is unlikely to be down to random chance. At 95% confidence there is only a 5% probability the observed difference happened by luck. Significance does not tell you the result is large or important, only that it is probably real, so always read it alongside the uplift and absolute conversion numbers.

How big a sample size do I need for an A/B test?

It depends on your baseline conversion rate and the uplift you want to detect: smaller effects need much larger samples. As a rough guide, low-conversion B2B pages often need thousands of visitors per variant to reach 95% confidence on a modest uplift. Run tests for full business cycles (at least one to two weeks) and avoid stopping the moment significance flickers.

Why should I not stop an A/B test early?

Confidence fluctuates as data comes in, so a test can cross 95% by chance and then fall back. Stopping the moment it looks significant (called peeking) inflates false positives. Decide the sample size and duration in advance, and let the test run its course before acting on the result.

A/B Test Significance Calculator (Free)

What is A/B test significance?

Statistical significance tells you whether the difference between your control and variant is real or just random noise. An A/B test can show a higher conversion rate for the variant purely by chance; significance testing estimates how confident you can be that the lift is genuine. The common bar is 95% confidence, meaning only a 5% chance the result is a fluke.

How this calculator works

The method (two-proportion z-test):

p1 = control conversions ÷ control visitors
p2 = variant conversions ÷ variant visitors
pooled p = (c1 + c2) ÷ (n1 + n2)
standard error = √( pooled p × (1 − pooled p) × (1/n1 + 1/n2) )
z = (p2 − p1) ÷ standard error
confidence = probability from the normal distribution (two-tailed)

Relative uplift is (variant rate − control rate) ÷ control rate. A result is called significant when confidence reaches 95% or more.

How to read the result

Confidence	Read
Under 90%	Not significant; keep the test running
90 to 95%	Promising, not conclusive
95% or higher	Statistically significant at the standard bar

Common A/B testing mistakes

Stopping early. Confidence fluctuates; peeking and stopping at the first 95% inflates false positives.
Too-small samples. Low-conversion B2B pages often need thousands of visitors per variant.
Ignoring the uplift. A significant but tiny lift may not be worth shipping. Read confidence with the effect size.
Short windows. Run at least one to two full business cycles to avoid day-of-week bias.

Frequently asked questions

How do you calculate A/B test significance?

With a two-proportion z-test: compute each variant's rate, the pooled standard error, and a z-score for the difference, then convert to a confidence level via the normal distribution. 95% confidence (p ≤ 0.05) is the common significance threshold.

What does statistically significant mean?

The difference is unlikely to be random. At 95% confidence there is a 5% chance it happened by luck. It does not mean the effect is large, so read it with the uplift.

How big a sample size do I need?

It depends on baseline rate and the uplift you want to detect; smaller effects need much larger samples. Low-conversion B2B pages often need thousands per variant, run over full business cycles.

Why should I not stop a test early?

Confidence fluctuates, so a test can cross 95% by chance then fall back. Decide sample size and duration in advance and let it run to avoid false positives.