The Best Sample Sizer (Calculator)

The Best Sample Sizer is a free advanced Sample Size Calculator and Power Analysis tool that takes into account your full experiment design, including:

Limits on maximum sample size (you can't wait forever...)
Peeking [Looking at results multiple times during an experiment].
Multiple treatments in one test.
Both continuous and binary metrics.
Results for various significance and power levels.
Does sample-sizing based on percentage treatment effects, the most common treatment effects reported in industry.
Detecting negative percentage treatment effects.

Treatments

Select what percentage of the "subjects" (visitors to a website, users, patients, etc) will get each treatment (non-control experience). By default, the Sample Sizer assumes a 50-50 split, adjust as needed and add additional treatments if the experiment has multiple arms.

Enter percentages from 1-100, for example: 25 = 25% or 0.2 = 0.2%.

Total percentages should be less than 100. The remainder of the subjects are assigned to the control group. So you only enter 1 number for a standard A/B test.

You can include up to 12 treatments.

Percentage of Subjects Receiving Treatment 1:
Percentage of Subjects Receiving Treatment 2:
Percentage of Subjects Receiving Treatment 3:
Percentage of Subjects Receiving Treatment 4:
Percentage of Subjects Receiving Treatment 5:
Percentage of Subjects Receiving Treatment 6:
Percentage of Subjects Receiving Treatment 7:
Percentage of Subjects Receiving Treatment 8:
Percentage of Subjects Receiving Treatment 9:
Percentage of Subjects Receiving Treatment 10:
Percentage of Subjects Receiving Treatment 11:
Percentage of Subjects Receiving Treatment 12:

Metric Information

Fill in the current (or baseline) metric average before this experiment, as well as the standard deviation of the metric.

TIP: If your metric is a "binary metric" (for each subject in the experiment, the metric value is either True or False) like a Conversion Rate or a Retention Rate, then the number to put for standard deviation is just: SQRT(Current Conversion Rate x (1 - Current Conversion Rate)).

Baseline Metric Average:
Metric Standard Deviation:

Back Next

Design

Peeking

Peeking is a powerful tool to reduce average experiment duration, but we need to adjust for its effect to properly size the experiment. Peeking is when you decide in the middle of the experiment whether you will continue with the experiment or just stop and make a decision now.

The Sample Sizer assumes the peeking periods are equally-spaced. For example, if you have a four-week experiment and two peeks, then you will make a decision about the experiment after the second week and at the end of the fourth week.

Number of Peeks:

Adjust Significance Level and Power

By default, the significance level is set to 5% and the power of the sample sizing is set to 80%. These are "standard" values, but we shouldn't be wedded to them, so you can adjust power and significance level below.

Significance Level
Power

Compute

Minimum Detectable Percentage Effect

The minimum detectable percentage effect is the percentage effect size that you want to be able to detect. Note that this is different from what you expect the treatment to be. It is usually smaller than what you expect.

A value of "1" means that you want the experiment to be able to detect a 1% increase in the metric.

You can also enter negative values if you want to reduce the metric.

TIP: if your metric is say, a conversion rate, and it's baseline value is 10%. If you put 1% here, then it will be for detecting at least a 1% change in 10% which is 10.1%. If you only want to be able to detect a full percentage point move, then use 10% here because 10% more than 10% is 11%.

Minimum Detectable Effect:

Maximum Sample Size [optional]

If there is a limit to how long you can wait for experiment results (likely), specifying this option will take that into account. The results will tell you the minimum effect size you can be confident you'll be able to detect given your time constraints, if your experiment does not have enough power to detect the minimum detectable effect given the maximum sample size.

For example, if your minimum detectable effect implies you need 1,000,000 users, but you only have time to wait for 100,000. The Sample Sizer will return 100,000 and tell you what effect sizes you can expect to detect (the results will include the unconstrained sample size as well).

To not limit the maximum sample size, leave it at 0.

Maximum Sample Size: