Skip to main content

A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments

Published Online:https://doi.org/10.1027/1015-5759/a000494

Abstract. Unproctored, web-based assessments are frequently compromised by a lack of control over the participants’ test-taking behavior. It is likely that participants cheat if personal consequences are high. This meta-analysis summarizes findings on context effects in unproctored and proctored ability assessments and examines mean score differences and correlations between both assessment contexts. As potential moderators, we consider (a) the perceived consequences of the assessment, (b) countermeasures against cheating, (c) the susceptibility to cheating of the measure itself, and (d) the use of different test media. For standardized mean differences, a three-level random-effects meta-analysis based on 109 effect sizes from 49 studies (total N = 100,434) identified a pooled effect of Δ = 0.20, 95% CI [0.10, 0.31], indicating higher scores in unproctored assessments. Moderator analyses revealed significantly smaller effects for measures that are difficult to research on the Internet. These results demonstrate that unproctored ability assessments are biased by cheating. Unproctored assessments may be most suitable for tasks that are difficult to search on the Internet.

References

  • AERA, APA, & NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. First citation in articleGoogle Scholar

  • Allen, E. & Seaman, J. (2014). Grade change: Tracking online education in the United States. Newburyport, MA: Babson Survey Research Group. First citation in articleGoogle Scholar

  • Aloe, A. M. (2015). Inaccuracy of regression results in replacing bivariate correlations: Inaccuracy of Regression Results. Research Synthesis Methods, 6, 21–27. https://doi.org/10.1002/jrsm.1126 First citation in articleCrossrefGoogle Scholar

  • Arthur, W., Glaze, R. M., Villado, A. J. & Taylor, J. E. (2010). The magnitude and extent of cheating and response distortion effects on unproctored Internet-based tests of cognitive ability and personality. International Journal of Selection and Assessment, 18, 1–16. https://doi.org/10.1111/j.1468-2389.2010.00476.x First citation in articleCrossrefGoogle Scholar

  • Aust, F., Diedenhofen, B., Ullrich, S. & Musch, J. (2012). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535. https://doi.org/10.3758/s13428-012-0265-2 First citation in articleCrossrefGoogle Scholar

  • Bartram, D. (2006). Testing on the Internet: Issues, challenges and opportunities in the field of occupational assessment. In D. BartramR. K. HambletonEds., Computer-based testing and the Internet: Issues and advances (pp. 13–37). Hoboken, NJ: Wiley. First citation in articleGoogle Scholar

  • Bartram, D. (2009). The International Test Commission guidelines on computer-based and Internet-delivered testing. Industrial and Organizational Psychology, 2, 11–13. https://doi.org/10.1111/j.1754-9434.2008.01098.x First citation in articleCrossrefGoogle Scholar

  • Begg, C. B. & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50, 1088–1101. https://doi.org/10.2307/2533446 First citation in articleCrossrefGoogle Scholar

  • Bing, M. N., Kluemper, D., Kristl Davison, H., Taylor, S. & Novicevic, M. (2011). Overclaiming as a measure of faking. Organizational Behavior and Human Decision Processes, 116, 148–162. https://doi.org/10.1016/j.obhdp.2011.05.006 First citation in articleCrossrefGoogle Scholar

  • Bloemers, W., Oud, A. & van Dam, K. (2016). Cheating on unproctored Internet intelligence tests: Strategies and effects. Personnel Assessment and Decisions, 2, 21–29. https://doi.org/10.25035/pad.2016.003 First citation in articleCrossrefGoogle Scholar

  • Brallier, S. & Palm, L. (2015). Proctored and unproctored test performance. International Journal of Teaching and Learning in Higher Education, 27, 221–226. First citation in articleGoogle Scholar

  • Brannick, M. T., Yang, L.-Q. & Cafri, G. (2011). Comparison of weights for meta-analysis of r and d under realistic conditions. Organizational Research Methods, 14, 587–607. https://doi.org/10.1177/1094428110368725 First citation in articleCrossrefGoogle Scholar

  • Bryan, C. J., Adams, G. S. & Monin, B. (2013). When cheating would make you a cheater: Implicating the self prevents unethical behavior. Journal of Experimental Psychology: General, 142, 1001–1005. https://doi.org/10.1037/a0030655 First citation in articleCrossrefGoogle Scholar

  • Buhrmester, M., Kwang, T. & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980 First citation in articleCrossrefGoogle Scholar

  • Carstairs, J. & Myors, B. (2009). Internet testing: A natural experiment reveals test score inflation on a high-stakes, unproctored cognitive test. Computers in Human Behavior, 25, 738–742. https://doi.org/10.1016/j.chb.2009.01.011 First citation in articleCrossrefGoogle Scholar

  • Center for Open Science. (2017, September 29). Retrieved from https://cos.io/ First citation in articleGoogle Scholar

  • Chapman, K. J. & Lupton, R. A. (2004). Academic dishonesty in a global educational market: A comparison of Hong Kong and American university business students. International Journal of Educational Management, 18, 425–435. https://doi.org/10.1108/09513540410563130 First citation in articleGoogle Scholar

  • Cheung, M. W.-L. (2014). Modeling dependent effect sizes with three-level meta-analyses: A structural equation modeling approach. Psychological Methods, 19, 211–229. https://doi.org/10.1037/a0032968 First citation in articleCrossrefGoogle Scholar

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. https://doi.org/10.1177/001316446002000104 First citation in articleCrossrefGoogle Scholar

  • Couper, M. P. (2005). Technology trends in survey data collection. Social Science Computer Review, 23, 486–501. https://doi.org/10.1177/0894439305278972 First citation in articleCrossrefGoogle Scholar

  • Coyne, I., Warszta, T., Beadle, S. & Sheehan, N. (2005). The impact of mode of administration on the equivalence of a test battery: A quasi-experimental design. International Journal of Selection and Assessment, 13, 220–224. https://doi.org/10.1111/j.1468-2389.2005.00318.x First citation in articleCrossrefGoogle Scholar

  • Diedenhofen, B. & Musch, J. (2017). PageFocus: Using paradata to detect and prevent cheating on online achievement tests. Behavior Research Methods, 49, 1444–1459. https://doi.org/10.3758/s13428-016-0800-7 First citation in articleCrossrefGoogle Scholar

  • Do, B.-R. (2009). Research on unproctored Internet testing. Industrial and Organizational Psychology, 2, 49–51. https://doi.org/10.1111/j.1754-9434.2008.01107.x First citation in articleCrossrefGoogle Scholar

  • Dwight, S. A. & Donovan, J. J. (2003). Do warnings not to fake reduce faking? Human Performance, 16, 1–23. https://doi.org/10.1207/S15327043HUP1601_1 First citation in articleCrossrefGoogle Scholar

  • Evans, J. R. & Mathur, A. (2005). The value of online surveys. Internet Research, 15, 195–219. https://doi.org/10.1108/10662240510590360 First citation in articleCrossrefGoogle Scholar

  • Fahrenberg, J., Myrtek, M., Pawlik, K. & Perrez, M. (2007). Ambulatory assessment – monitoring behavior in daily life settings. European Journal of Psychological Assessment, 23, 206–213. https://doi.org/10.1027/1015-5759.23.4.206 First citation in articleLinkGoogle Scholar

  • Flesch, M. & Ostler, E. (2010). Analysis of proctored versus non-proctored tests in online algebra courses. MathAMATYC Educator, 2, 8–14. First citation in articleGoogle Scholar

  • Geiger, M., Sauter, R., Olderbak, S. & Wilhelm, O. (2016). Faking ability: Measurement and validity. Personality and Individual Differences, 101, 480. https://doi.org/10.1016/j.paid.2016.05.147 First citation in articleGoogle Scholar

  • Germine, L., Nakayama, K., Duchaine, B. C., Chabris, C. F., Chatterjee, G. & Wilmer, J. B. (2012). Is the web as good as the lab? Comparable performance from web and lab in cognitive/perceptual experiments. Psychonomic Bulletin & Review, 19, 847–857. https://doi.org/10.3758/s13423-012-0296-9 First citation in articleCrossrefGoogle Scholar

  • Gibby, R. E., Ispas, D., Mccloy, R. A. & Biga, A. (2009). Moving beyond the challenges to make unproctored Internet testing a reality. Industrial and Organizational Psychology, 2, 64–68. https://doi.org/10.1111/j.1754-9434.2008.01110.x First citation in articleCrossrefGoogle Scholar

  • Gnambs, T. & Kaspar, K. (2017). Socially desirable responding in web-based questionnaires: A meta-analytic review of the candor hypothesis. Assessment, 24, 746–762. https://doi.org/10.1177/1073191115624547 First citation in articleCrossrefGoogle Scholar

  • Gosling, S. D. & Mason, W. (2015). Internet research in psychology. Annual Review of Psychology, 66, 877–902. https://doi.org/10.1146/annurev-psych-010814-015321 First citation in articleCrossrefGoogle Scholar

  • Gosling, S. D., Sandy, C. J., John, O. P. & Potter, J. (2010). Wired but not WEIRD: The promise of the Internet in reaching more diverse samples. Behavioral and Brain Sciences, 33, 94–95. https://doi.org/10.1017/S0140525X10000300 First citation in articleCrossrefGoogle Scholar

  • Green, B. F. (1991). Guidelines for computer testing. In T. B. GutkinS. L. WiseEds., The computer and the decision-making process (pp. 245–273). Hillsdale, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Guo, J. & Drasgow, F. (2010). Identifying cheating on unproctored Internet tests: The Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351–364. https://doi.org/10.1111/j.1468-2389.2010.00518.x First citation in articleCrossrefGoogle Scholar

  • Harari, G. M., Lane, N. D., Wang, R., Crosier, B. S., Campbell, A. T. & Gosling, S. D. (2016). Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11, 838–854. https://doi.org/10.1177/1745691616650285 First citation in articleCrossrefGoogle Scholar

  • Haworth, C. M. A., Harlaar, N., Kovas, Y., Davis, O. S. P., Oliver, B. R., Hayiou-Thomas, M. E., … Plomin, R. (2007). Internet cognitive testing of large samples needed in genetic research. Twin Research and Human Genetics, 10, 554–563. https://doi.org/10.1375/twin.10.4.554 First citation in articleCrossrefGoogle Scholar

  • Hedges, L. V. (1981). Distribution theory for glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107. https://doi.org/10.2307/1164588 First citation in articleCrossrefGoogle Scholar

  • Higgins, J. P. T. & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21, 1539–1558. https://doi.org/10.1002/sim.1186 First citation in articleCrossrefGoogle Scholar

  • Higgins, J. P. T., Thompson, S. G., Deeks, J. J. & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327, 557–560. https://doi.org/10.1136/bmj.327.7414.557 First citation in articleCrossrefGoogle Scholar

  • Hofer, P. J. & Green, B. F. (1985). The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology, 53, 826–838. https://doi.org/10.1037/0022-006X.53.6.826 First citation in articleCrossrefGoogle Scholar

  • Ihme, J. M., Lemke, F., Lieder, K., Martin, F., Müller, J. C. & Schmidt, S. (2009). Comparison of ability tests administered online and in the laboratory. Behavior Research Methods, 41, 1183–1189. https://doi.org/10.3758/BRM.41.4.1183 First citation in articleCrossrefGoogle Scholar

  • International Test Commission. (2006). International guidelines on computer-based and Internet-delivered testing. International Journal of Testing, 6, 143–171. https://doi.org/10.1207/s15327574ijt0602_4 First citation in articleCrossrefGoogle Scholar

  • Jensen, C. & Thomsen, J. P. F. (2014). Self-reported cheating in web surveys on political knowledge. Quality & Quantity, 48, 3343–3354. https://doi.org/10.1007/s11135-013-9960-z First citation in articleCrossrefGoogle Scholar

  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604_2 First citation in articleCrossrefGoogle Scholar

  • Karim, M. N., Kaminsky, S. E. & Behrend, T. S. (2014). Cheating, reactions, and performance in remotely proctored testing: An exploratory experimental study. Journal of Business and Psychology, 29, 555–572. https://doi.org/10.1007/s10869-014-9343-z First citation in articleCrossrefGoogle Scholar

  • Kaufmann, E., Reips, U.-D. & Merki, K. M. (2016). Avoiding methodological biases in meta-analysis: Use of online versus offline individual participant data (IPD) in psychology. Zeitschrift für Psychologie, 224, 157–167. https://doi.org/10.1027/2151-2604/a000251 First citation in articleLinkGoogle Scholar

  • Khan, S. M., Suendermann-Oeft, D., Evanini, K., Williamson, D. M., Paris, S., Qian, Y., … Davis, L. (2017). MAP: Multimodal assessment platform for interactive communication competency. In S. ShehataJ. P.-L. TanEds., Practitioner Track Proceedings of the 7th International Learning Analytics & Knowledge Conference (pp. 815–852). Vancouver, CA: SoLAR. First citation in articleGoogle Scholar

  • LeBreton, J. M. & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11. https://doi.org/10.1177/1094428106296642 First citation in articleCrossrefGoogle Scholar

  • Lievens, F. & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational and Organizational Psychology, 84, 817–824. https://doi.org/10.1348/096317910X522672 First citation in articleCrossrefGoogle Scholar

  • Lievens, F. & Harris, M. M. (2003). Research on Internet recruiting and testing: Current status and future directions. In C. L. CooperI. T. RobertsonEds., International Review of Industrial and Organizational Psychology (pp. 131–165). Chichester, UK: Wiley. First citation in articleGoogle Scholar

  • Marín-Martínez, F. & Sánchez-Meca, J. (2010). Weighting by inverse variance or by sample size in random-effects meta-analysis. Educational and Psychological Measurement, 70, 56–73. https://doi.org/10.1177/0013164409344534 First citation in articleCrossrefGoogle Scholar

  • McCabe, D. L., Feghali, T. & Abdallah, H. (2008). Academic dishonesty in the Middle East: Individual and contextual factors. Research in Higher Education, 49, 451–467. https://doi.org/10.1007/s11162-008-9092-9 First citation in articleCrossrefGoogle Scholar

  • McCabe, D. L. & Treviño, L. K. (2002). Honesty and honor codes. Academe, 88, 37. https://doi.org/10.2307/40252118 First citation in articleCrossrefGoogle Scholar

  • McFarland, L. A. & Ryan, A. M. (2000). Variance in faking across noncognitive measures. Journal of Applied Psychology, 85, 812–821. https://doi.org/10.1037/0021-9010.85.5.812 First citation in articleCrossrefGoogle Scholar

  • Mead, A. D. & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458. https://doi.org/10.1037/0033-2909.114.3.449 First citation in articleCrossrefGoogle Scholar

  • Miller, G. (2012). The smartphone psychology manifesto. Perspectives on Psychological Science, 7, 221–237. https://doi.org/10.1177/1745691612441215 First citation in articleCrossrefGoogle Scholar

  • Morris, S. B. & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7, 105–125. https://doi.org/10.1037/1082-989X.7.1.105 First citation in articleCrossrefGoogle Scholar

  • Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348, 1420–1422. https://doi.org/10.1126/science.aab2374 First citation in articleCrossrefGoogle Scholar

  • Nye, C. D., Do, B.-R., Drasgow, F. & Fine, S. (2008). Two-step testing in employee selection: Is score inflation a problem? International Journal of Selection and Assessment, 16, 112–120. https://doi.org/10.1080/09639284.2011.590012 First citation in articleCrossrefGoogle Scholar

  • O’Neill, H. M. & Pfeiffer, C. A. (2012). The impact of honour codes and perceptions of cheating on academic cheating behaviours, especially for MBA bound undergraduates. Accounting Education, 21, 231–245. https://doi.org/10.1080/09639284.2011.590012 First citation in articleCrossrefGoogle Scholar

  • Preckel, F. & Thiemann, H. (2003). Online- versus paper-pencil-version of a high potential intelligence test. Swiss Journal of Psychology, 62, 131–138. https://doi.org/10.1024/1421-0185.62.2.131 First citation in articleLinkGoogle Scholar

  • Raju, N. S., Laffitte, L. J. & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–529. https://doi.org/10.1037/0021-9010.87.3.517 First citation in articleCrossrefGoogle Scholar

  • Reynolds, D. H., Wasko, L. E., Sinar, E. F., Raymark, P. H. & Jones, J. A. (2009). UIT or not UIT? That is not the only question. Industrial and Organizational Psychology, 2, 52–57. https://doi.org/10.1111/j.1754-9434.2008.01108.x First citation in articleCrossrefGoogle Scholar

  • Rovai, A. P. (2000). Online and traditional assessments: What is the difference? The Internet and Higher Education, 3, 141–151. https://doi.org/10.1016/S1096-7516(01)00028-8 First citation in articleCrossrefGoogle Scholar

  • Schroeders, U. (2009). Testing for equivalence of test data across media. In F. ScheuermannJ. BjörnssonEds., The transition to computer-based assessment. Lesson learned from the PISA 2006 computer-based assessment of science (CBAS) and implications for large scale testing (pp. 164–170). JRC Scientific and Technical Report EUR 23679 EN. Luxembourg: Publications Office of the European Union. First citation in articleGoogle Scholar

  • Schroeders, U. & Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. https://doi.org/10.1027/1015-5759/a000038 First citation in articleLinkGoogle Scholar

  • Schroeders, U. & Wilhelm, O. (2011). Equivalence of reading and listening comprehension across test media. Educational and Psychological Measurement, 71, 849–869. https://doi.org/10.1177/0013164410391468 First citation in articleCrossrefGoogle Scholar

  • Stowell, J. R. & Bennett, D. (2010). Effects of online testing on student exam performance and test anxiety. Journal of Educational Computing Research, 42, 161–171. https://doi.org/10.2190/EC.42.2.b First citation in articleCrossrefGoogle Scholar

  • Templer, K. J. & Lange, S. R. (2008). Internet testing: Equivalence between proctored lab and unproctored field conditions. Computers in Human Behavior, 24, 1216–1228. https://doi.org/10.1016/j.chb.2007.04.006 First citation in articleCrossrefGoogle Scholar

  • Tendeiro, J. N., Meijer, R. R., Schakel, L. & Maij-de Meij, A. M. (2013). Using cumulative sum statistics to detect inconsistencies in unproctored Internet testing. Educational and Psychological Measurement, 73, 143–161. https://doi.org/10.1177/0013164412444787 First citation in articleCrossrefGoogle Scholar

  • Tett, R. P., Hundley, N. A. & Christiansen, N. D. (2017). Meta-analysis and the myth of generalizability. Industrial and Organizational Psychology, 10(3), 421–456. https://doi.org/10.1017/iop.2017.26 First citation in articleCrossrefGoogle Scholar

  • Tippins, N. T. (2009). Internet alternatives to traditional proctored testing: Where are we now? Industrial and Organizational Psychology, 2, 2–10. https://doi.org/10.1111/j.1754-9434.2008.01097.x First citation in articleCrossrefGoogle Scholar

  • Tippins, N. T. (2011). Overview of technology-enhanced assessments. In N. T. TippinsS. AdlerEds., Technology-enhanced assessment of talent (pp. 1–19). San Francisco, CA: Wiley. First citation in articleGoogle Scholar

  • Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O. & Shepherd, W. (2006). Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225. https://doi.org/10.1111/j.1744-6570.2006.00909.x First citation in articleCrossrefGoogle Scholar

  • Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley. First citation in articleGoogle Scholar

  • Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. https://doi.org/10.3102/10769986030003261 First citation in articleCrossrefGoogle Scholar

  • Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36, 1–48. https://doi.org/10.18637/jss.v036.i03 First citation in articleCrossrefGoogle Scholar

  • Viechtbauer, W. & Cheung, M. W.-L. (2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1, 112–125. https://doi.org/10.1002/jrsm.11 First citation in articleCrossrefGoogle Scholar

  • Wilhelm, O. & McKnight, P. E. (2002). Ability and achievement testing on the World Wide Web. In B. BatinicU.-D. ReipsM. BosnjakEds., Online social sciences (pp. 167–193). Seattle, WA: Hogrefe & Huber. First citation in articleGoogle Scholar

  • Williamson, K. C., Williamson, V. M. & Hinze, S. R. (2016). Administering spatial and cognitive instruments in-class and on-line: Are these equivalent? Journal of Science Education and Technology, 26, 12–23. https://doi.org/10.1007/s10956-016-9645-1 First citation in articleCrossrefGoogle Scholar