Optimal Sample Sizes for Testing the Equivalence of Two Means
Abstract
Abstract. Equivalence tests (also known as similarity or parity tests) have become more and more popular in addition to equality tests. However, in testing the equivalence of two population means, approximate sample sizes developed using conventional techniques found in the literature on this topic have usually been under-valued as having less statistical power than is required. In this paper, the authors first address the reason for this problem and then provide a solution using an exhaustive local search algorithm to find the optimal sample size. The proposed method is not only accurate but is also flexible so that unequal variances or sampling unit costs for different groups can be considered using different sample size allocations. Figures and a numerical example are presented to demonstrate various configurations. An R Shiny App is also available for easy use (https://optimal-sample-size.shinyapps.io/equivalence-of-means/).
References
2013). Evaluating the equivalence of, or difference between, psychological treatments: A exploration of recent intervention studies. Canadian Journal of Behavioral Science, 45, 320–328. https://doi.org/10.1037/a0033357
(2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21, 1–12. https://doi.org/10.1037/met0000051
(2013). Beyond gender differences using tests of equivalence to evaluate gender similarities. Psychology of Women Quarterly, 37, 147–154. https://doi.org/10.1177/0361684313480483
(2002). Assessing equivalence: An alternative to the use of difference tests for measuring disparities in vaccination coverage. American Journal of Epidemiology, 156, 1056–1061. https://doi.org/10.1093/aje/kwf149
(1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, 11, 283–319. https://doi.org/10.1214/ss/1032280304
(2001). Equivalence testing with dental clinical trials. Journal of Dental Research, 80, 1513–1517. https://doi.org/10.1177/00220345010800060701
(2011). Sample size calculations for clinical trials. WIREs Computational Statistics, 3, 414–427. https://doi.org/10.1002/wics.155
(2008). Design and analysis of bioavailability and bioequivalence studies (3rd ed.). New York, NY: Marcel Dekker.
(2008). Sample size calculations in clinical research (2nd ed.). New York, NY: Taylor & Francis.
(1994). An extension of Welch’s approximate t-solution to comparative bioequivalence trials. Biometrika, 81, 91–101. https://doi.org/10.1093/biomet/81.1.91
(1997). Optimum allocation of treatments for Welch’s test in equivalence assessment. Biometrics, 53, 1143–1150. https://doi.org/10.2307/2533572
(2001). Guidance for industry: Statistical approaches to establishing bioequivalence. Rockville, MD: Center for Drug Evaluation and Research.
. (1997). Use of statistical tests of equivalence (bioequivalence tests) in plant pathology (Letter to the Editor). Phytopathology, 87, 372–374.
(2011). Sample size planning with the cost constraint for testing superiority and equivalence of two independent groups. British Journal of Mathematical and Statistical Psychology, 64, 439–461. https://doi.org/10.1348/000711010X512408
(2009). Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test. British Journal of Mathematical and Statistical Psychology, 62, 283–298. https://doi.org/10.1348/000711007X267289
(2017). Sample size calculations for testing equivalence of two exponential distributions with right censoring: Allocation with costs. Methodology, 13, 144–156. https://doi.org/10.1027/1614-2241/a000139
(1999). Sample size determination for proving equivalence based on the ratio of two means for normally distributed data. Statistics in Medicine, 18, 93–105. https://doi.org/10.1002/(SICI)1097-0258(19990115)18:1<93::AID-SIM992>3.3.CO;2-#
(2017). Optimal sample size determinations for the heteroscedastic two one-sided tests of mean equivalence: Design schemes and software implementations. Journal of Educational and Behavioral Statistics, 42, 145–165. https://doi.org/10.3102/1076998616671974
(1996). Trials to assess equivalence: The importance of rigorous methods. British Medical Journal, 313, 36–39. https://doi.org/10.1136/bmj.313.7048.36
(2004). Tutorial in biostatistics: Sample sizes for clinical trials with normal data. Statistics in Medicine, 23, 1921–1986. https://doi.org/10.1002/sim.1783
(2010). Sample sizes for clinical trials. Boca Raton, FL: Taylor & Francis.
(1999). Approximate sample sizes for testing hypotheses about the ratio and difference of two means. Journal of Biopharmaceutical Statistics, 9, 641–650. https://doi.org/10.1081/BIP-100101200
(2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362. https://doi.org/10.1177/1948550617697177
(2018). Improving inferences about null effects with Bayes factors and equivalence tests. Journal of Gerontology: Psychological Sciences, Series B, gby065. https://doi.org/10.1093/geronb/gby065
(2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1, 259–269. https://doi.org/10.1177/2515245918770963
(1986). Testing statistical hypotheses. New York, NY: Springer.
(2005). Beyond the t-test: Statistical equivalence testing. Analytical Chemistry, 77, 221–226. https://doi.org/10.1021/ac053390m
(1992). Sample size determination for the two one-sided tests procedure in bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics, 20, 101–104. https://doi.org/10.1007/BF01143188
(2016). Sample size planning for the noninferiority or equivalence of a linear contrast with cost considerations. Psychological Methods, 21, 13–34. https://doi.org/10.1037/met0000039
(2003). The use of equivalence testing in conjunction with standard hypothesis testing and effect sizes. Journal of Modern Applied Statistical Methods, 2, 329–340. https://doi.org/10.22237/jmasm/1067645160
(1974). Bioavailablity: A problem of equivalence. Biometrics, 30, 309–317. https://doi.org/10.2307/2529651
(2012). Equivalence tests – A review. Food Quality and Preference, 26, 231–245.
(1965). A special case of a bivariate non-central t-distribution. Biometrika, 52, 437–446. https://doi.org/10.1093/biomet/52.3-4.437
(1990). Power of the two one-sided tests procedure in bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics, 18, 137–144. https://doi.org/10.1007/BF01063556
(2011). How to statistically show the absence of an effect. Psychologica Belgica, 51, 109–127. https://doi.org/10.5334/pb-51-2-109
(2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria. Retrieved from http://www.r-project.org
. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565. https://doi.org/10.1037/0033-2909.113.3.553
(2012). Variance heterogeneity in published psychological research: A review and a new index. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, 1–11. https://doi.org/10.1027/1614-2241/a000034
(1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15, 657–680. https://doi.org/10.1007/BF01068419
(1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3, 403–411. https://doi.org/10.1037/1082-989X.3.4.403
(1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19, 193–198. https://doi.org/10.1016/0149-7189(96)00011-0
(2014). An approximate approach to sample size determination in bioequivalence testing with multiple pharmacokinetic responses. Statistics in Medicine, 33, 3300–3317. https://doi.org/10.1002/sim.6182
(2002). A practical approach for comparing means of two groups without equal variance assumption. Statistics in Medicine, 21, 3137–3151. https://doi.org/10.1002/sim.1238
(1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29, 350–362. https://doi.org/10.1093/biomet/29.3-4.350
(2003). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC.
(1979). Statistical aspects of comparative bioequivalence trials. Biometrics, 35, 273–280. https://doi.org/10.2307/2529949
(1988).
(Bioavailability and bioequivalence of pharmaceutical formulations . In K. PeaceEd., Biopharmaceutical statistics for drug development. New York, NY: Marcel Dekker.2003). A simple formula for sample size calculation in equivalence studies. Journal of Biopharmaceutical Statistics, 13, 529–538. https://doi.org/10.1081/BIP-120022772
(