Abstract
Abstract. Unproctored, web-based assessments are frequently compromised by a lack of control over the participants’ test-taking behavior. It is likely that participants cheat if personal consequences are high. This meta-analysis summarizes findings on context effects in unproctored and proctored ability assessments and examines mean score differences and correlations between both assessment contexts. As potential moderators, we consider (a) the perceived consequences of the assessment, (b) countermeasures against cheating, (c) the susceptibility to cheating of the measure itself, and (d) the use of different test media. For standardized mean differences, a three-level random-effects meta-analysis based on 109 effect sizes from 49 studies (total N = 100,434) identified a pooled effect of Δ = 0.20, 95% CI [0.10, 0.31], indicating higher scores in unproctored assessments. Moderator analyses revealed significantly smaller effects for measures that are difficult to research on the Internet. These results demonstrate that unproctored ability assessments are biased by cheating. Unproctored assessments may be most suitable for tasks that are difficult to search on the Internet.
References
2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
. (2014). Grade change: Tracking online education in the United States. Newburyport, MA: Babson Survey Research Group.
(2015). Inaccuracy of regression results in replacing bivariate correlations: Inaccuracy of Regression Results. Research Synthesis Methods, 6, 21–27. https://doi.org/10.1002/jrsm.1126
(2010). The magnitude and extent of cheating and response distortion effects on unproctored Internet-based tests of cognitive ability and personality. International Journal of Selection and Assessment, 18, 1–16. https://doi.org/10.1111/j.1468-2389.2010.00476.x
(2012). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535. https://doi.org/10.3758/s13428-012-0265-2
(2006).
(Testing on the Internet: Issues, challenges and opportunities in the field of occupational assessment . In D. BartramR. K. HambletonEds., Computer-based testing and the Internet: Issues and advances (pp. 13–37). Hoboken, NJ: Wiley.2009). The International Test Commission guidelines on computer-based and Internet-delivered testing. Industrial and Organizational Psychology, 2, 11–13. https://doi.org/10.1111/j.1754-9434.2008.01098.x
(1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50, 1088–1101. https://doi.org/10.2307/2533446
(2011). Overclaiming as a measure of faking. Organizational Behavior and Human Decision Processes, 116, 148–162. https://doi.org/10.1016/j.obhdp.2011.05.006
(2016). Cheating on unproctored Internet intelligence tests: Strategies and effects. Personnel Assessment and Decisions, 2, 21–29. https://doi.org/10.25035/pad.2016.003
(2015). Proctored and unproctored test performance. International Journal of Teaching and Learning in Higher Education, 27, 221–226.
(2011). Comparison of weights for meta-analysis of r and d under realistic conditions. Organizational Research Methods, 14, 587–607. https://doi.org/10.1177/1094428110368725
(2013). When cheating would make you a cheater: Implicating the self prevents unethical behavior. Journal of Experimental Psychology: General, 142, 1001–1005. https://doi.org/10.1037/a0030655
(2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980
(2009). Internet testing: A natural experiment reveals test score inflation on a high-stakes, unproctored cognitive test. Computers in Human Behavior, 25, 738–742. https://doi.org/10.1016/j.chb.2009.01.011
(2017, September 29). Retrieved from https://cos.io/
. (2004). Academic dishonesty in a global educational market: A comparison of Hong Kong and American university business students. International Journal of Educational Management, 18, 425–435. https://doi.org/10.1108/09513540410563130
(2014). Modeling dependent effect sizes with three-level meta-analyses: A structural equation modeling approach. Psychological Methods, 19, 211–229. https://doi.org/10.1037/a0032968
(1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. https://doi.org/10.1177/001316446002000104
(2005). Technology trends in survey data collection. Social Science Computer Review, 23, 486–501. https://doi.org/10.1177/0894439305278972
(2005). The impact of mode of administration on the equivalence of a test battery: A quasi-experimental design. International Journal of Selection and Assessment, 13, 220–224. https://doi.org/10.1111/j.1468-2389.2005.00318.x
(2017). PageFocus: Using paradata to detect and prevent cheating on online achievement tests. Behavior Research Methods, 49, 1444–1459. https://doi.org/10.3758/s13428-016-0800-7
(2009). Research on unproctored Internet testing. Industrial and Organizational Psychology, 2, 49–51. https://doi.org/10.1111/j.1754-9434.2008.01107.x
(2003). Do warnings not to fake reduce faking? Human Performance, 16, 1–23. https://doi.org/10.1207/S15327043HUP1601_1
(2005). The value of online surveys. Internet Research, 15, 195–219. https://doi.org/10.1108/10662240510590360
(2007). Ambulatory assessment – monitoring behavior in daily life settings. European Journal of Psychological Assessment, 23, 206–213. https://doi.org/10.1027/1015-5759.23.4.206
(2010). Analysis of proctored versus non-proctored tests in online algebra courses. MathAMATYC Educator, 2, 8–14.
(2016). Faking ability: Measurement and validity. Personality and Individual Differences, 101, 480. https://doi.org/10.1016/j.paid.2016.05.147
(2012). Is the web as good as the lab? Comparable performance from web and lab in cognitive/perceptual experiments. Psychonomic Bulletin & Review, 19, 847–857. https://doi.org/10.3758/s13423-012-0296-9
(2009). Moving beyond the challenges to make unproctored Internet testing a reality. Industrial and Organizational Psychology, 2, 64–68. https://doi.org/10.1111/j.1754-9434.2008.01110.x
(2017). Socially desirable responding in web-based questionnaires: A meta-analytic review of the candor hypothesis. Assessment, 24, 746–762. https://doi.org/10.1177/1073191115624547
(2015). Internet research in psychology. Annual Review of Psychology, 66, 877–902. https://doi.org/10.1146/annurev-psych-010814-015321
(2010). Wired but not WEIRD: The promise of the Internet in reaching more diverse samples. Behavioral and Brain Sciences, 33, 94–95. https://doi.org/10.1017/S0140525X10000300
(1991).
(Guidelines for computer testing . In T. B. GutkinS. L. WiseEds., The computer and the decision-making process (pp. 245–273). Hillsdale, NJ: Erlbaum.2010). Identifying cheating on unproctored Internet tests: The Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351–364. https://doi.org/10.1111/j.1468-2389.2010.00518.x
(2016). Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11, 838–854. https://doi.org/10.1177/1745691616650285
(2007). Internet cognitive testing of large samples needed in genetic research. Twin Research and Human Genetics, 10, 554–563. https://doi.org/10.1375/twin.10.4.554
(1981). Distribution theory for glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107. https://doi.org/10.2307/1164588
(2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21, 1539–1558. https://doi.org/10.1002/sim.1186
(2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327, 557–560. https://doi.org/10.1136/bmj.327.7414.557
(1985). The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology, 53, 826–838. https://doi.org/10.1037/0022-006X.53.6.826
(2009). Comparison of ability tests administered online and in the laboratory. Behavior Research Methods, 41, 1183–1189. https://doi.org/10.3758/BRM.41.4.1183
(2006). International guidelines on computer-based and Internet-delivered testing. International Journal of Testing, 6, 143–171. https://doi.org/10.1207/s15327574ijt0602_4
. (2014). Self-reported cheating in web surveys on political knowledge. Quality & Quantity, 48, 3343–3354. https://doi.org/10.1007/s11135-013-9960-z
(2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604_2
(2014). Cheating, reactions, and performance in remotely proctored testing: An exploratory experimental study. Journal of Business and Psychology, 29, 555–572. https://doi.org/10.1007/s10869-014-9343-z
(2016). Avoiding methodological biases in meta-analysis: Use of online versus offline individual participant data (IPD) in psychology. Zeitschrift für Psychologie, 224, 157–167. https://doi.org/10.1027/2151-2604/a000251
(2017).
(MAP: Multimodal assessment platform for interactive communication competency . In S. ShehataJ. P.-L. TanEds., Practitioner Track Proceedings of the 7th International Learning Analytics & Knowledge Conference (pp. 815–852). Vancouver, CA: SoLAR.2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11. https://doi.org/10.1177/1094428106296642
(2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational and Organizational Psychology, 84, 817–824. https://doi.org/10.1348/096317910X522672
(2003).
(Research on Internet recruiting and testing: Current status and future directions . In C. L. CooperI. T. RobertsonEds., International Review of Industrial and Organizational Psychology (pp. 131–165). Chichester, UK: Wiley.2010). Weighting by inverse variance or by sample size in random-effects meta-analysis. Educational and Psychological Measurement, 70, 56–73. https://doi.org/10.1177/0013164409344534
(2008). Academic dishonesty in the Middle East: Individual and contextual factors. Research in Higher Education, 49, 451–467. https://doi.org/10.1007/s11162-008-9092-9
(2002). Honesty and honor codes. Academe, 88, 37. https://doi.org/10.2307/40252118
(2000). Variance in faking across noncognitive measures. Journal of Applied Psychology, 85, 812–821. https://doi.org/10.1037/0021-9010.85.5.812
(1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458. https://doi.org/10.1037/0033-2909.114.3.449
(2012). The smartphone psychology manifesto. Perspectives on Psychological Science, 7, 221–237. https://doi.org/10.1177/1745691612441215
(2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7, 105–125. https://doi.org/10.1037/1082-989X.7.1.105
(2015). Promoting an open research culture. Science, 348, 1420–1422. https://doi.org/10.1126/science.aab2374
(2008). Two-step testing in employee selection: Is score inflation a problem? International Journal of Selection and Assessment, 16, 112–120. https://doi.org/10.1080/09639284.2011.590012
(2012). The impact of honour codes and perceptions of cheating on academic cheating behaviours, especially for MBA bound undergraduates. Accounting Education, 21, 231–245. https://doi.org/10.1080/09639284.2011.590012
(2003). Online- versus paper-pencil-version of a high potential intelligence test. Swiss Journal of Psychology, 62, 131–138. https://doi.org/10.1024/1421-0185.62.2.131
(2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–529. https://doi.org/10.1037/0021-9010.87.3.517
(2009). UIT or not UIT? That is not the only question. Industrial and Organizational Psychology, 2, 52–57. https://doi.org/10.1111/j.1754-9434.2008.01108.x
(2000). Online and traditional assessments: What is the difference? The Internet and Higher Education, 3, 141–151. https://doi.org/10.1016/S1096-7516(01)00028-8
(2009).
(Testing for equivalence of test data across media . In F. ScheuermannJ. BjörnssonEds., The transition to computer-based assessment. Lesson learned from the PISA 2006 computer-based assessment of science (CBAS) and implications for large scale testing (pp. 164–170). JRC Scientific and Technical Report EUR 23679 EN. Luxembourg: Publications Office of the European Union.2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. https://doi.org/10.1027/1015-5759/a000038
(2011). Equivalence of reading and listening comprehension across test media. Educational and Psychological Measurement, 71, 849–869. https://doi.org/10.1177/0013164410391468
(2010). Effects of online testing on student exam performance and test anxiety. Journal of Educational Computing Research, 42, 161–171. https://doi.org/10.2190/EC.42.2.b
(2008). Internet testing: Equivalence between proctored lab and unproctored field conditions. Computers in Human Behavior, 24, 1216–1228. https://doi.org/10.1016/j.chb.2007.04.006
(2013). Using cumulative sum statistics to detect inconsistencies in unproctored Internet testing. Educational and Psychological Measurement, 73, 143–161. https://doi.org/10.1177/0013164412444787
(2017). Meta-analysis and the myth of generalizability. Industrial and Organizational Psychology, 10(3), 421–456. https://doi.org/10.1017/iop.2017.26
(2009). Internet alternatives to traditional proctored testing: Where are we now? Industrial and Organizational Psychology, 2, 2–10. https://doi.org/10.1111/j.1754-9434.2008.01097.x
(2011).
(Overview of technology-enhanced assessments . In N. T. TippinsS. AdlerEds., Technology-enhanced assessment of talent (pp. 1–19). San Francisco, CA: Wiley.2006). Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225. https://doi.org/10.1111/j.1744-6570.2006.00909.x
(1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
(2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. https://doi.org/10.3102/10769986030003261
(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36, 1–48. https://doi.org/10.18637/jss.v036.i03
(2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1, 112–125. https://doi.org/10.1002/jrsm.11
(2002).
(Ability and achievement testing on the World Wide Web . In B. BatinicU.-D. ReipsM. BosnjakEds., Online social sciences (pp. 167–193). Seattle, WA: Hogrefe & Huber.2016). Administering spatial and cognitive instruments in-class and on-line: Are these equivalent? Journal of Science Education and Technology, 26, 12–23. https://doi.org/10.1007/s10956-016-9645-1
(