Multistudy Report

A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments

Diana Steger

Bamberg Graduate School of Social Sciences, University of Bamberg, Germany

Department for Individual Differences and Psychological Assessment, Ulm University, Germany

Search for more papers by this author

Ulrich Schroeders

Department of Psychological Assessment, University of Kassel, Germany

Search for more papers by this author

, and

Timo Gnambs

Educational Measurement, Leibniz Institute for Educational Trajectories, Germany

Institute for Education and Psychology, Johannes Kepler University Linz, Austria

Search for more papers by this author

Published Online:September 18, 2018https://doi.org/10.1027/1015-5759/a000494

Abstract

Abstract. Unproctored, web-based assessments are frequently compromised by a lack of control over the participants’ test-taking behavior. It is likely that participants cheat if personal consequences are high. This meta-analysis summarizes findings on context effects in unproctored and proctored ability assessments and examines mean score differences and correlations between both assessment contexts. As potential moderators, we consider (a) the perceived consequences of the assessment, (b) countermeasures against cheating, (c) the susceptibility to cheating of the measure itself, and (d) the use of different test media. For standardized mean differences, a three-level random-effects meta-analysis based on 109 effect sizes from 49 studies (total N = 100,434) identified a pooled effect of Δ = 0.20, 95% CI [0.10, 0.31], indicating higher scores in unproctored assessments. Moderator analyses revealed significantly smaller effects for measures that are difficult to research on the Internet. These results demonstrate that unproctored ability assessments are biased by cheating. Unproctored assessments may be most suitable for tasks that are difficult to search on the Internet.

References

AERA, APA, & NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. First citation in article Google Scholar
Allen, E. & Seaman, J. (2014). Grade change: Tracking online education in the United States. Newburyport, MA: Babson Survey Research Group. First citation in article Google Scholar
Aloe, A. M. (2015). Inaccuracy of regression results in replacing bivariate correlations: Inaccuracy of Regression Results. Research Synthesis Methods, 6, 21–27. https://doi.org/10.1002/jrsm.1126 First citation in article Crossref, Google Scholar
Arthur, W., Glaze, R. M., Villado, A. J. & Taylor, J. E. (2010). The magnitude and extent of cheating and response distortion effects on unproctored Internet-based tests of cognitive ability and personality. International Journal of Selection and Assessment, 18, 1–16. https://doi.org/10.1111/j.1468-2389.2010.00476.x First citation in article Crossref, Google Scholar
Aust, F., Diedenhofen, B., Ullrich, S. & Musch, J. (2012). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535. https://doi.org/10.3758/s13428-012-0265-2 First citation in article Crossref, Google Scholar
Bartram, D. (2006). Testing on the Internet: Issues, challenges and opportunities in the field of occupational assessment. In D. BartramR. K. HambletonEds., Computer-based testing and the Internet: Issues and advances (pp. 13–37). Hoboken, NJ: Wiley. First citation in article Google Scholar
Bartram, D. (2009). The International Test Commission guidelines on computer-based and Internet-delivered testing. Industrial and Organizational Psychology, 2, 11–13. https://doi.org/10.1111/j.1754-9434.2008.01098.x First citation in article Crossref, Google Scholar
Begg, C. B. & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50, 1088–1101. https://doi.org/10.2307/2533446 First citation in article Crossref, Google Scholar
Bing, M. N., Kluemper, D., Kristl Davison, H., Taylor, S. & Novicevic, M. (2011). Overclaiming as a measure of faking. Organizational Behavior and Human Decision Processes, 116, 148–162. https://doi.org/10.1016/j.obhdp.2011.05.006 First citation in article Crossref, Google Scholar
Bloemers, W., Oud, A. & van Dam, K. (2016). Cheating on unproctored Internet intelligence tests: Strategies and effects. Personnel Assessment and Decisions, 2, 21–29. https://doi.org/10.25035/pad.2016.003 First citation in article Crossref, Google Scholar
Brallier, S. & Palm, L. (2015). Proctored and unproctored test performance. International Journal of Teaching and Learning in Higher Education, 27, 221–226. First citation in article Google Scholar
Brannick, M. T., Yang, L.-Q. & Cafri, G. (2011). Comparison of weights for meta-analysis of r and d under realistic conditions. Organizational Research Methods, 14, 587–607. https://doi.org/10.1177/1094428110368725 First citation in article Crossref, Google Scholar
Bryan, C. J., Adams, G. S. & Monin, B. (2013). When cheating would make you a cheater: Implicating the self prevents unethical behavior. Journal of Experimental Psychology: General, 142, 1001–1005. https://doi.org/10.1037/a0030655 First citation in article Crossref, Google Scholar
Buhrmester, M., Kwang, T. & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980 First citation in article Crossref, Google Scholar
Carstairs, J. & Myors, B. (2009). Internet testing: A natural experiment reveals test score inflation on a high-stakes, unproctored cognitive test. Computers in Human Behavior, 25, 738–742. https://doi.org/10.1016/j.chb.2009.01.011 First citation in article Crossref, Google Scholar
Center for Open Science. (2017, September 29). Retrieved from https://cos.io/ First citation in article Google Scholar
Chapman, K. J. & Lupton, R. A. (2004). Academic dishonesty in a global educational market: A comparison of Hong Kong and American university business students. International Journal of Educational Management, 18, 425–435. https://doi.org/10.1108/09513540410563130 First citation in article Google Scholar
Cheung, M. W.-L. (2014). Modeling dependent effect sizes with three-level meta-analyses: A structural equation modeling approach. Psychological Methods, 19, 211–229. https://doi.org/10.1037/a0032968 First citation in article Crossref, Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. https://doi.org/10.1177/001316446002000104 First citation in article Crossref, Google Scholar
Couper, M. P. (2005). Technology trends in survey data collection. Social Science Computer Review, 23, 486–501. https://doi.org/10.1177/0894439305278972 First citation in article Crossref, Google Scholar
Coyne, I., Warszta, T., Beadle, S. & Sheehan, N. (2005). The impact of mode of administration on the equivalence of a test battery: A quasi-experimental design. International Journal of Selection and Assessment, 13, 220–224. https://doi.org/10.1111/j.1468-2389.2005.00318.x First citation in article Crossref, Google Scholar
Diedenhofen, B. & Musch, J. (2017). PageFocus: Using paradata to detect and prevent cheating on online achievement tests. Behavior Research Methods, 49, 1444–1459. https://doi.org/10.3758/s13428-016-0800-7 First citation in article Crossref, Google Scholar
Do, B.-R. (2009). Research on unproctored Internet testing. Industrial and Organizational Psychology, 2, 49–51. https://doi.org/10.1111/j.1754-9434.2008.01107.x First citation in article Crossref, Google Scholar
Dwight, S. A. & Donovan, J. J. (2003). Do warnings not to fake reduce faking? Human Performance, 16, 1–23. https://doi.org/10.1207/S15327043HUP1601_1 First citation in article Crossref, Google Scholar
Evans, J. R. & Mathur, A. (2005). The value of online surveys. Internet Research, 15, 195–219. https://doi.org/10.1108/10662240510590360 First citation in article Crossref, Google Scholar
Fahrenberg, J., Myrtek, M., Pawlik, K. & Perrez, M. (2007). Ambulatory assessment – monitoring behavior in daily life settings. European Journal of Psychological Assessment, 23, 206–213. https://doi.org/10.1027/1015-5759.23.4.206 First citation in article Link, Google Scholar
Flesch, M. & Ostler, E. (2010). Analysis of proctored versus non-proctored tests in online algebra courses. MathAMATYC Educator, 2, 8–14. First citation in article Google Scholar
Geiger, M., Sauter, R., Olderbak, S. & Wilhelm, O. (2016). Faking ability: Measurement and validity. Personality and Individual Differences, 101, 480. https://doi.org/10.1016/j.paid.2016.05.147 First citation in article Google Scholar
Germine, L., Nakayama, K., Duchaine, B. C., Chabris, C. F., Chatterjee, G. & Wilmer, J. B. (2012). Is the web as good as the lab? Comparable performance from web and lab in cognitive/perceptual experiments. Psychonomic Bulletin & Review, 19, 847–857. https://doi.org/10.3758/s13423-012-0296-9 First citation in article Crossref, Google Scholar
Gibby, R. E., Ispas, D., Mccloy, R. A. & Biga, A. (2009). Moving beyond the challenges to make unproctored Internet testing a reality. Industrial and Organizational Psychology, 2, 64–68. https://doi.org/10.1111/j.1754-9434.2008.01110.x First citation in article Crossref, Google Scholar
Gnambs, T. & Kaspar, K. (2017). Socially desirable responding in web-based questionnaires: A meta-analytic review of the candor hypothesis. Assessment, 24, 746–762. https://doi.org/10.1177/1073191115624547 First citation in article Crossref, Google Scholar
Gosling, S. D. & Mason, W. (2015). Internet research in psychology. Annual Review of Psychology, 66, 877–902. https://doi.org/10.1146/annurev-psych-010814-015321 First citation in article Crossref, Google Scholar
Gosling, S. D., Sandy, C. J., John, O. P. & Potter, J. (2010). Wired but not WEIRD: The promise of the Internet in reaching more diverse samples. Behavioral and Brain Sciences, 33, 94–95. https://doi.org/10.1017/S0140525X10000300 First citation in article Crossref, Google Scholar
Green, B. F. (1991). Guidelines for computer testing. In T. B. GutkinS. L. WiseEds., The computer and the decision-making process (pp. 245–273). Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Guo, J. & Drasgow, F. (2010). Identifying cheating on unproctored Internet tests: The Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351–364. https://doi.org/10.1111/j.1468-2389.2010.00518.x First citation in article Crossref, Google Scholar
Harari, G. M., Lane, N. D., Wang, R., Crosier, B. S., Campbell, A. T. & Gosling, S. D. (2016). Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11, 838–854. https://doi.org/10.1177/1745691616650285 First citation in article Crossref, Google Scholar
Haworth, C. M. A., Harlaar, N., Kovas, Y., Davis, O. S. P., Oliver, B. R., Hayiou-Thomas, M. E., … Plomin, R. (2007). Internet cognitive testing of large samples needed in genetic research. Twin Research and Human Genetics, 10, 554–563. https://doi.org/10.1375/twin.10.4.554 First citation in article Crossref, Google Scholar
Hedges, L. V. (1981). Distribution theory for glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107. https://doi.org/10.2307/1164588 First citation in article Crossref, Google Scholar
Higgins, J. P. T. & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21, 1539–1558. https://doi.org/10.1002/sim.1186 First citation in article Crossref, Google Scholar
Higgins, J. P. T., Thompson, S. G., Deeks, J. J. & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327, 557–560. https://doi.org/10.1136/bmj.327.7414.557 First citation in article Crossref, Google Scholar
Hofer, P. J. & Green, B. F. (1985). The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology, 53, 826–838. https://doi.org/10.1037/0022-006X.53.6.826 First citation in article Crossref, Google Scholar
Ihme, J. M., Lemke, F., Lieder, K., Martin, F., Müller, J. C. & Schmidt, S. (2009). Comparison of ability tests administered online and in the laboratory. Behavior Research Methods, 41, 1183–1189. https://doi.org/10.3758/BRM.41.4.1183 First citation in article Crossref, Google Scholar
International Test Commission. (2006). International guidelines on computer-based and Internet-delivered testing. International Journal of Testing, 6, 143–171. https://doi.org/10.1207/s15327574ijt0602_4 First citation in article Crossref, Google Scholar
Jensen, C. & Thomsen, J. P. F. (2014). Self-reported cheating in web surveys on political knowledge. Quality & Quantity, 48, 3343–3354. https://doi.org/10.1007/s11135-013-9960-z First citation in article Crossref, Google Scholar
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604_2 First citation in article Crossref, Google Scholar
Karim, M. N., Kaminsky, S. E. & Behrend, T. S. (2014). Cheating, reactions, and performance in remotely proctored testing: An exploratory experimental study. Journal of Business and Psychology, 29, 555–572. https://doi.org/10.1007/s10869-014-9343-z First citation in article Crossref, Google Scholar
Kaufmann, E., Reips, U.-D. & Merki, K. M. (2016). Avoiding methodological biases in meta-analysis: Use of online versus offline individual participant data (IPD) in psychology. Zeitschrift für Psychologie, 224, 157–167. https://doi.org/10.1027/2151-2604/a000251 First citation in article Link, Google Scholar
Khan, S. M., Suendermann-Oeft, D., Evanini, K., Williamson, D. M., Paris, S., Qian, Y., … Davis, L. (2017). MAP: Multimodal assessment platform for interactive communication competency. In S. ShehataJ. P.-L. TanEds., Practitioner Track Proceedings of the 7th International Learning Analytics & Knowledge Conference (pp. 815–852). Vancouver, CA: SoLAR. First citation in article Google Scholar
LeBreton, J. M. & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11. https://doi.org/10.1177/1094428106296642 First citation in article Crossref, Google Scholar
Lievens, F. & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational and Organizational Psychology, 84, 817–824. https://doi.org/10.1348/096317910X522672 First citation in article Crossref, Google Scholar
Lievens, F. & Harris, M. M. (2003). Research on Internet recruiting and testing: Current status and future directions. In C. L. CooperI. T. RobertsonEds., International Review of Industrial and Organizational Psychology (pp. 131–165). Chichester, UK: Wiley. First citation in article Google Scholar
Marín-Martínez, F. & Sánchez-Meca, J. (2010). Weighting by inverse variance or by sample size in random-effects meta-analysis. Educational and Psychological Measurement, 70, 56–73. https://doi.org/10.1177/0013164409344534 First citation in article Crossref, Google Scholar
McCabe, D. L., Feghali, T. & Abdallah, H. (2008). Academic dishonesty in the Middle East: Individual and contextual factors. Research in Higher Education, 49, 451–467. https://doi.org/10.1007/s11162-008-9092-9 First citation in article Crossref, Google Scholar
McCabe, D. L. & Treviño, L. K. (2002). Honesty and honor codes. Academe, 88, 37. https://doi.org/10.2307/40252118 First citation in article Crossref, Google Scholar
McFarland, L. A. & Ryan, A. M. (2000). Variance in faking across noncognitive measures. Journal of Applied Psychology, 85, 812–821. https://doi.org/10.1037/0021-9010.85.5.812 First citation in article Crossref, Google Scholar
Mead, A. D. & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458. https://doi.org/10.1037/0033-2909.114.3.449 First citation in article Crossref, Google Scholar
Miller, G. (2012). The smartphone psychology manifesto. Perspectives on Psychological Science, 7, 221–237. https://doi.org/10.1177/1745691612441215 First citation in article Crossref, Google Scholar
Morris, S. B. & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7, 105–125. https://doi.org/10.1037/1082-989X.7.1.105 First citation in article Crossref, Google Scholar
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348, 1420–1422. https://doi.org/10.1126/science.aab2374 First citation in article Crossref, Google Scholar
Nye, C. D., Do, B.-R., Drasgow, F. & Fine, S. (2008). Two-step testing in employee selection: Is score inflation a problem? International Journal of Selection and Assessment, 16, 112–120. https://doi.org/10.1080/09639284.2011.590012 First citation in article Crossref, Google Scholar
O’Neill, H. M. & Pfeiffer, C. A. (2012). The impact of honour codes and perceptions of cheating on academic cheating behaviours, especially for MBA bound undergraduates. Accounting Education, 21, 231–245. https://doi.org/10.1080/09639284.2011.590012 First citation in article Crossref, Google Scholar
Preckel, F. & Thiemann, H. (2003). Online- versus paper-pencil-version of a high potential intelligence test. Swiss Journal of Psychology, 62, 131–138. https://doi.org/10.1024/1421-0185.62.2.131 First citation in article Link, Google Scholar
Raju, N. S., Laffitte, L. J. & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–529. https://doi.org/10.1037/0021-9010.87.3.517 First citation in article Crossref, Google Scholar
Reynolds, D. H., Wasko, L. E., Sinar, E. F., Raymark, P. H. & Jones, J. A. (2009). UIT or not UIT? That is not the only question. Industrial and Organizational Psychology, 2, 52–57. https://doi.org/10.1111/j.1754-9434.2008.01108.x First citation in article Crossref, Google Scholar
Rovai, A. P. (2000). Online and traditional assessments: What is the difference? The Internet and Higher Education, 3, 141–151. https://doi.org/10.1016/S1096-7516(01)00028-8 First citation in article Crossref, Google Scholar
Schroeders, U. (2009). Testing for equivalence of test data across media. In F. ScheuermannJ. BjörnssonEds., The transition to computer-based assessment. Lesson learned from the PISA 2006 computer-based assessment of science (CBAS) and implications for large scale testing (pp. 164–170). JRC Scientific and Technical Report EUR 23679 EN. Luxembourg: Publications Office of the European Union. First citation in article Google Scholar
Schroeders, U. & Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. https://doi.org/10.1027/1015-5759/a000038 First citation in article Link, Google Scholar
Schroeders, U. & Wilhelm, O. (2011). Equivalence of reading and listening comprehension across test media. Educational and Psychological Measurement, 71, 849–869. https://doi.org/10.1177/0013164410391468 First citation in article Crossref, Google Scholar
Stowell, J. R. & Bennett, D. (2010). Effects of online testing on student exam performance and test anxiety. Journal of Educational Computing Research, 42, 161–171. https://doi.org/10.2190/EC.42.2.b First citation in article Crossref, Google Scholar
Templer, K. J. & Lange, S. R. (2008). Internet testing: Equivalence between proctored lab and unproctored field conditions. Computers in Human Behavior, 24, 1216–1228. https://doi.org/10.1016/j.chb.2007.04.006 First citation in article Crossref, Google Scholar
Tendeiro, J. N., Meijer, R. R., Schakel, L. & Maij-de Meij, A. M. (2013). Using cumulative sum statistics to detect inconsistencies in unproctored Internet testing. Educational and Psychological Measurement, 73, 143–161. https://doi.org/10.1177/0013164412444787 First citation in article Crossref, Google Scholar
Tett, R. P., Hundley, N. A. & Christiansen, N. D. (2017). Meta-analysis and the myth of generalizability. Industrial and Organizational Psychology, 10(3), 421–456. https://doi.org/10.1017/iop.2017.26 First citation in article Crossref, Google Scholar
Tippins, N. T. (2009). Internet alternatives to traditional proctored testing: Where are we now? Industrial and Organizational Psychology, 2, 2–10. https://doi.org/10.1111/j.1754-9434.2008.01097.x First citation in article Crossref, Google Scholar
Tippins, N. T. (2011). Overview of technology-enhanced assessments. In N. T. TippinsS. AdlerEds., Technology-enhanced assessment of talent (pp. 1–19). San Francisco, CA: Wiley. First citation in article Google Scholar
Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O. & Shepherd, W. (2006). Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225. https://doi.org/10.1111/j.1744-6570.2006.00909.x First citation in article Crossref, Google Scholar
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley. First citation in article Google Scholar
Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. https://doi.org/10.3102/10769986030003261 First citation in article Crossref, Google Scholar
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36, 1–48. https://doi.org/10.18637/jss.v036.i03 First citation in article Crossref, Google Scholar
Viechtbauer, W. & Cheung, M. W.-L. (2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1, 112–125. https://doi.org/10.1002/jrsm.11 First citation in article Crossref, Google Scholar
Wilhelm, O. & McKnight, P. E. (2002). Ability and achievement testing on the World Wide Web. In B. BatinicU.-D. ReipsM. BosnjakEds., Online social sciences (pp. 167–193). Seattle, WA: Hogrefe & Huber. First citation in article Google Scholar
Williamson, K. C., Williamson, V. M. & Hinze, S. R. (2016). Administering spatial and cognitive instruments in-class and on-line: Are these equivalent? Journal of Science Education and Technology, 26, 12–23. https://doi.org/10.1007/s10956-016-9645-1 First citation in article Crossref, Google Scholar

Volume 36Issue 1January 2020

ISSN: 1015-5759eISSN: 2151-2426

History

ReceivedSeptember 29, 2017
RevisedApril 4, 2018
AcceptedApril 4, 2018
Published onlineSeptember 18, 2018

Licenses & Copyright

Keywords

Acknowledgments:

This work was supported by the Bamberg Graduate School of Social Sciences which is funded by the German Research Foundation (DFG) under the German Excellence Initiative (GSC1024).

PDF download

Verify Phone

Congrats!

A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners