Skip to main content
Original Article

Optimal Sample Sizes for Testing the Equivalence of Two Means

Published Online:https://doi.org/10.1027/1614-2241/a000171

Abstract. Equivalence tests (also known as similarity or parity tests) have become more and more popular in addition to equality tests. However, in testing the equivalence of two population means, approximate sample sizes developed using conventional techniques found in the literature on this topic have usually been under-valued as having less statistical power than is required. In this paper, the authors first address the reason for this problem and then provide a solution using an exhaustive local search algorithm to find the optimal sample size. The proposed method is not only accurate but is also flexible so that unequal variances or sampling unit costs for different groups can be considered using different sample size allocations. Figures and a numerical example are presented to demonstrate various configurations. An R Shiny App is also available for easy use (https://optimal-sample-size.shinyapps.io/equivalence-of-means/).

References

  • Allan, T. A., & Cribbie, R. A. (2013). Evaluating the equivalence of, or difference between, psychological treatments: A exploration of recent intervention studies. Canadian Journal of Behavioral Science, 45, 320–328. https://doi.org/10.1037/a0033357 First citation in articleCrossrefGoogle Scholar

  • Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21, 1–12. https://doi.org/10.1037/met0000051 First citation in articleCrossrefGoogle Scholar

  • Ball, L. C., Cribbie, R. A., & Steele, J. R. (2013). Beyond gender differences using tests of equivalence to evaluate gender similarities. Psychology of Women Quarterly, 37, 147–154. https://doi.org/10.1177/0361684313480483 First citation in articleCrossrefGoogle Scholar

  • Barker, L., Luman, E. T., McCauley, M. M., & Chu, S. Y. (2002). Assessing equivalence: An alternative to the use of difference tests for measuring disparities in vaccination coverage. American Journal of Epidemiology, 156, 1056–1061. https://doi.org/10.1093/aje/kwf149 First citation in articleCrossrefGoogle Scholar

  • Berger, R. L., & Hsu, J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statistical Science, 11, 283–319. https://doi.org/10.1214/ss/1032280304 First citation in articleCrossrefGoogle Scholar

  • Burns, D. R., & Elswick, R. K. Jr. (2001). Equivalence testing with dental clinical trials. Journal of Dental Research, 80, 1513–1517. https://doi.org/10.1177/00220345010800060701 First citation in articleCrossrefGoogle Scholar

  • Chow, S. C. (2011). Sample size calculations for clinical trials. WIREs Computational Statistics, 3, 414–427. https://doi.org/10.1002/wics.155 First citation in articleCrossrefGoogle Scholar

  • Chow, S. C., & Liu, J. P. (2008). Design and analysis of bioavailability and bioequivalence studies (3rd ed.). New York, NY: Marcel Dekker. First citation in articleCrossrefGoogle Scholar

  • Chow, S. C., Shao, J., & Wang, H. (2008). Sample size calculations in clinical research (2nd ed.). New York, NY: Taylor & Francis. First citation in articleGoogle Scholar

  • Dannenberg, O., Dette, H., & Munk, A. (1994). An extension of Welch’s approximate t-solution to comparative bioequivalence trials. Biometrika, 81, 91–101. https://doi.org/10.1093/biomet/81.1.91 First citation in articleCrossrefGoogle Scholar

  • Dette, H., & Munk, A. (1997). Optimum allocation of treatments for Welch’s test in equivalence assessment. Biometrics, 53, 1143–1150. https://doi.org/10.2307/2533572 First citation in articleCrossrefGoogle Scholar

  • FDA. (2001). Guidance for industry: Statistical approaches to establishing bioequivalence. Rockville, MD: Center for Drug Evaluation and Research. First citation in articleGoogle Scholar

  • Garrett, K. A. (1997). Use of statistical tests of equivalence (bioequivalence tests) in plant pathology (Letter to the Editor). Phytopathology, 87, 372–374. First citation in articleCrossrefGoogle Scholar

  • Guo, J. H., Chen, H. J., & Luh, W. M. (2011). Sample size planning with the cost constraint for testing superiority and equivalence of two independent groups. British Journal of Mathematical and Statistical Psychology, 64, 439–461. https://doi.org/10.1348/000711010X512408 First citation in articleCrossrefGoogle Scholar

  • Guo, J. H., & Luh, W. M. (2009). Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test. British Journal of Mathematical and Statistical Psychology, 62, 283–298. https://doi.org/10.1348/000711007X267289 First citation in articleCrossrefGoogle Scholar

  • Guo, J. H., & Luh, W. M. (2017). Sample size calculations for testing equivalence of two exponential distributions with right censoring: Allocation with costs. Methodology, 13, 144–156. https://doi.org/10.1027/1614-2241/a000139 First citation in articleLinkGoogle Scholar

  • Hauschke, D., Kieser, M., Diletti, E., & Burke, M. (1999). Sample size determination for proving equivalence based on the ratio of two means for normally distributed data. Statistics in Medicine, 18, 93–105. https://doi.org/10.1002/(SICI)1097-0258(19990115)18:1<93::AID-SIM992>3.3.CO;2-# First citation in articleCrossrefGoogle Scholar

  • Jan, S.-L., & Shieh, G. (2017). Optimal sample size determinations for the heteroscedastic two one-sided tests of mean equivalence: Design schemes and software implementations. Journal of Educational and Behavioral Statistics, 42, 145–165. https://doi.org/10.3102/1076998616671974 First citation in articleCrossrefGoogle Scholar

  • Jones, B., Jarvis, P., Lewis, J. A., & Ebbutt, A. F. (1996). Trials to assess equivalence: The importance of rigorous methods. British Medical Journal, 313, 36–39. https://doi.org/10.1136/bmj.313.7048.36 First citation in articleCrossrefGoogle Scholar

  • Julious, S. A. (2004). Tutorial in biostatistics: Sample sizes for clinical trials with normal data. Statistics in Medicine, 23, 1921–1986. https://doi.org/10.1002/sim.1783 First citation in articleCrossrefGoogle Scholar

  • Julious, S. A. (2010). Sample sizes for clinical trials. Boca Raton, FL: Taylor & Francis. First citation in articleGoogle Scholar

  • Kieser, M., & Hauschke, D. (1999). Approximate sample sizes for testing hypotheses about the ratio and difference of two means. Journal of Biopharmaceutical Statistics, 9, 641–650. https://doi.org/10.1081/BIP-100101200 First citation in articleCrossrefGoogle Scholar

  • Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362. https://doi.org/10.1177/1948550617697177 First citation in articleCrossrefGoogle Scholar

  • Lakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2018). Improving inferences about null effects with Bayes factors and equivalence tests. Journal of Gerontology: Psychological Sciences, Series B, gby065. https://doi.org/10.1093/geronb/gby065 First citation in articleCrossrefGoogle Scholar

  • Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1, 259–269. https://doi.org/10.1177/2515245918770963 First citation in articleCrossrefGoogle Scholar

  • Lehmann, E. L. (1986). Testing statistical hypotheses. New York, NY: Springer. First citation in articleCrossrefGoogle Scholar

  • Limentani, G. B., Ringo, M. C., Ye, F., Bergquist, M. L., & McSorley, E. O. (2005). Beyond the t-test: Statistical equivalence testing. Analytical Chemistry, 77, 221–226. https://doi.org/10.1021/ac053390m First citation in articleCrossrefGoogle Scholar

  • Liu, J. P., & Chow, S. C. (1992). Sample size determination for the two one-sided tests procedure in bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics, 20, 101–104. https://doi.org/10.1007/BF01143188 First citation in articleCrossrefGoogle Scholar

  • Luh, W. M., & Guo, J. H. (2016). Sample size planning for the noninferiority or equivalence of a linear contrast with cost considerations. Psychological Methods, 21, 13–34. https://doi.org/10.1037/met0000039 First citation in articleCrossrefGoogle Scholar

  • Mecklin, C. J. (2003). The use of equivalence testing in conjunction with standard hypothesis testing and effect sizes. Journal of Modern Applied Statistical Methods, 2, 329–340. https://doi.org/10.22237/jmasm/1067645160 First citation in articleCrossrefGoogle Scholar

  • Metzler, C. M. (1974). Bioavailablity: A problem of equivalence. Biometrics, 30, 309–317. https://doi.org/10.2307/2529651 First citation in articleCrossrefGoogle Scholar

  • Meyners, M. (2012). Equivalence tests – A review. Food Quality and Preference, 26, 231–245. First citation in articleCrossrefGoogle Scholar

  • Owen, D. B. (1965). A special case of a bivariate non-central t-distribution. Biometrika, 52, 437–446. https://doi.org/10.1093/biomet/52.3-4.437 First citation in articleCrossrefGoogle Scholar

  • Phillips, K. F. (1990). Power of the two one-sided tests procedure in bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics, 18, 137–144. https://doi.org/10.1007/BF01063556 First citation in articleCrossrefGoogle Scholar

  • Quertemont, E. (2011). How to statistically show the absence of an effect. Psychologica Belgica, 51, 109–127. https://doi.org/10.5334/pb-51-2-109 First citation in articleCrossrefGoogle Scholar

  • R Development Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria. Retrieved from http://www.r-project.org First citation in articleGoogle Scholar

  • Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565. https://doi.org/10.1037/0033-2909.113.3.553 First citation in articleCrossrefGoogle Scholar

  • Ruscio, J., & Roche, B. (2012). Variance heterogeneity in published psychological research: A review and a new index. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, 1–11. https://doi.org/10.1027/1614-2241/a000034 First citation in articleLinkGoogle Scholar

  • Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15, 657–680. https://doi.org/10.1007/BF01068419 First citation in articleCrossrefGoogle Scholar

  • Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3, 403–411. https://doi.org/10.1037/1082-989X.3.4.403 First citation in articleCrossrefGoogle Scholar

  • Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19, 193–198. https://doi.org/10.1016/0149-7189(96)00011-0 First citation in articleCrossrefGoogle Scholar

  • Tsai, C. A., Huang, C. Y., & Liu, J. P. (2014). An approximate approach to sample size determination in bioequivalence testing with multiple pharmacokinetic responses. Statistics in Medicine, 33, 3300–3317. https://doi.org/10.1002/sim.6182 First citation in articleCrossrefGoogle Scholar

  • Wang, H., & Chow, S. C. (2002). A practical approach for comparing means of two groups without equal variance assumption. Statistics in Medicine, 21, 3137–3151. https://doi.org/10.1002/sim.1238 First citation in articleCrossrefGoogle Scholar

  • Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29, 350–362. https://doi.org/10.1093/biomet/29.3-4.350 First citation in articleCrossrefGoogle Scholar

  • Wellek, S. (2003). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC. First citation in articleGoogle Scholar

  • Westlake, W. J. (1979). Statistical aspects of comparative bioequivalence trials. Biometrics, 35, 273–280. https://doi.org/10.2307/2529949 First citation in articleCrossrefGoogle Scholar

  • Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In K. PeaceEd., Biopharmaceutical statistics for drug development. New York, NY: Marcel Dekker. First citation in articleGoogle Scholar

  • Zhang, P. (2003). A simple formula for sample size calculation in equivalence studies. Journal of Biopharmaceutical Statistics, 13, 529–538. https://doi.org/10.1081/BIP-120022772 First citation in articleCrossrefGoogle Scholar