Skip to main content
Published Online:https://doi.org/10.1027/1015-5759/a000250

Abstract. During the past 20 years, Situational Judgment Tests (SJTs) have developed into a viable tool in personnel selection. Despite their growing popularity, research examining the extent of measurement error is widely lacking. Using reliability generalization, the aim of this article was twofold: (1) establish an estimate for an average coefficient alpha of SJT scores across studies and (2) examine the influence of essential SJT features and selected study variables on score reliability. To handle potential dependent observations a three-level hierarchical linear model was used. The results indicate that the reliability of SJT scores is typically rather low and below recommended levels for high-stakes applications. Additionally, both SJT and study characteristics affect score reliability. Implications for practitioners and researchers are provided to guide an appropriate use of SJTs and to initiate future research.

References

  • Ahmed, I., Sutton, A. J. & Riley, R. D. (2012). Assessment of publication bias, and unavailable data in meta-analyses using individual participant data: A database survey. British Medical Journal, 344, d7762. First citation in articleGoogle Scholar

  • Beretvas, S. N., Meyers, J. L. & Leite, W. L. (2002). A reliability generalization study of the Marlone-Crowne Social Desirability Scale. Educational and Psychological Measurement, 62, 570–589. First citation in articleCrossrefGoogle Scholar

  • Beretvas, S. N. & Pastor, D. A. (2003). Using mixed-effect models in reliability generalization studies. Educational and Psychological Measurement, 63, 75–95. First citation in articleCrossrefGoogle Scholar

  • Bergman, M. E., Drasgow, F., Donovan, M. A., Henning, J. B. & Juraska, S. E. (2006). Scoring situational judgment tests: Once you get the data, your troubles begin. International Journal of Selection and Assessment, 14, 223–235. First citation in articleCrossrefGoogle Scholar

  • Borman, W. C., Hedge, J. W., Ferstel, K. L., Kaufman, J. D., Farmer, W. L. & Bearden, R. M. (2003). Current directions and issues in personnel selection and classification. In J. J. MartocchioG. R. FerrisEds., Research in personnel and human resources management (pp. 287–356). London, UK: Elsevier. First citation in articleGoogle Scholar

  • Catano, V. M., Brochu, A. & Lamerson, C. D. (2012). Assessing the reliability of situational judgment tests used in high-stakes situations. International Journal of Selection and Assessment, 20, 334–346. First citation in articleCrossrefGoogle Scholar

  • Christian, M. S., Edwards, B. D. & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63, 83–117. First citation in articleCrossrefGoogle Scholar

  • Fertig, S. (2009). The incremental validity of a situational judgment test (SJT) relative to personality and cognitive ability to predict managerial performance (Unpublished master’s thesis). Stellenbosch University, South Africa. First citation in articleGoogle Scholar

  • Hakstian, A. R. & Whalen, T. E. (1976). A k-sample significance test for independent alpha coefficients. Psychometrika, 41, 219–231. First citation in articleCrossrefGoogle Scholar

  • Haladyna, T. M. & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement, 23, 17–27. First citation in articleCrossrefGoogle Scholar

  • Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177–189. First citation in articleCrossrefGoogle Scholar

  • Henson, R. K. & Thompson, B. (2002). Characterizing measurement error in scores across studies: Some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development, 35, 113–126. First citation in articleCrossrefGoogle Scholar

  • Hornick, C. W., James, L. R. & Jones, A. P. (1977). Empirical item keying versus a rational approach to analyzing a psychological climate questionnaire. Applied Psychological Measurement, 1, 489–500. First citation in articleCrossrefGoogle Scholar

  • Krokos, K. J., Meade, A. W., Cantwell, A. R., Pond, S. B. & Wilson, M. A. (2004). Empirical keying for situational judgment tests: Rationale and some examples. Paper presented at the 19th annual conference of the Industrial and Organizational Psychology, Chicago, IL. First citation in articleGoogle Scholar

  • Legree, P. J. (1994). The effect of response format on reliability estimates for tacit knowledge scales (Research Note No. 94–25). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. First citation in articleGoogle Scholar

  • Lievens, F. (2006). International situational judgment tests. In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 279–300). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Lievens, F., Peeters, H. & Schollaert, E. (2007). Situational judgment tests: A review of recent research. Personnel Review, 37, 426–441. First citation in articleCrossrefGoogle Scholar

  • Lievens, F. & Sackett, P. R. (2006). Video-based versus written situational judgment tests: A comparison in terms of predictive validity. Journal of Applied Psychology, 91, 1181–1188. First citation in articleCrossrefGoogle Scholar

  • McDaniel, M. A., Hartman, N. S., Whetzel, D. L. & Grubb, W. L. III (2007). Situational judgment tests, response instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91. First citation in articleCrossrefGoogle Scholar

  • McDaniel, M. A. & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and constructs assessed. International Journal of Selection and Assessment, 9, 103–113. First citation in articleCrossrefGoogle Scholar

  • Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspective of classical test theory and structural equation modelling. Structural Equation Modeling, 2, 255–273. First citation in articleCrossrefGoogle Scholar

  • Motowidlo, S. J., Dunnette, M. D. & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity simulation. Journal of Applied Psychology, 75, 640–647. First citation in articleCrossrefGoogle Scholar

  • Mount, M. K., Witt, L. A. & Barrick, M. R. (2000). Incremental validity of empirically keyed biodata scales over GMA and the five factor personality constructs. Personnel Psychology, 53, 299–323. First citation in articleCrossrefGoogle Scholar

  • Muros, J. P. (2008). Know the score: An exploration of keying and scoring approaches for situational judgment tests (Unpublished dissertation). University of Minnesota, Minnesota. First citation in articleGoogle Scholar

  • Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory. New York, NY: McGraw-Hill. First citation in articleGoogle Scholar

  • Olson-Buchanan, J. B. & Drasgow, F. (2006). Multimedia situational judgment tests: The medium creates the message. In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 135–155). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Ployhart, R. E. & Ehrhart, M. G. (2003). Be careful what you ask for: Effects of response instructions on the construct validity and reliability of situational judgment tests. International Journal of Selection and Assessment, 11, 1–16. First citation in articleCrossrefGoogle Scholar

  • Ployhart, R. E., Schneider, B. & Schmitt, N. (2006). Staffing organizations: Contemporary practice and research. Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Ramsay, M. J. (2002). Comparing five empirical biodata methods for personnel selection (Unpublished master’s thesis). University of North Texas, Texas. First citation in articleGoogle Scholar

  • Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage. First citation in articleGoogle Scholar

  • Raudenbush, S. W., Bryk, A. S., Cheong, Y. F. & Congdon, R. (2009). HLM 6.08: Hierarchical linear and nonlinear modelling [Computer software]. Chicago, IL: Scientific Software International. First citation in articleGoogle Scholar

  • Rodriguez, M. C. & Maeda, Y. (2006). Meta-analysis of coefficient alpha. Psychological Methods, 11, 306–322. First citation in articleCrossrefGoogle Scholar

  • Romano, J. L. & Kromrey, J. D. (2009). What are the consequences if the assumption of independent observations is violated in reliability generalization meta-analysis studies? Educational and Psychological Measurement, 69, 404–428. First citation in articleCrossrefGoogle Scholar

  • Schmidt, F. L., Hunter, J. E. & Urry, V. (1976). Statistical power in criterion-related validation studies. Journal of Applied Psychology, 61, 473–485. First citation in articleCrossrefGoogle Scholar

  • Schmitt, N. & Chan, D. (2006). Situational judgment tests: Method or construct?. In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 135–155). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Sterne, J. A. C. & Harbord, R. M. (2004). Funnel plots in meta-analysis. The Stata Journal, 4, 127–141. First citation in articleCrossrefGoogle Scholar

  • Thompson, B. (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage. First citation in articleCrossrefGoogle Scholar

  • Vacha-Haase, T. & Thompson, B. (2011). Score reliability: A retrospective look back at 12 years of reliability generalization studies. Measurement and Evaluation in Counseling and Development, 44, 159–168. First citation in articleCrossrefGoogle Scholar

  • Viera, A. J. & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37, 360–363. First citation in articleGoogle Scholar

  • Warne, R. T. (2011). A reliability generalization of the Overexcitability Questionnaire-Two. Journal of Advanced Academics, 22, 671–692. First citation in articleCrossrefGoogle Scholar

  • Weekley, J. A. & Ployhart, R. E. (2006a). Situational judgment tests: Theory, measurement and application. Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Weekley, J. A. & Ployhart, R. E. (2006b). An introduction to situational judgment testing. In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 1–10). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Weekley, J. A., Ployhart, R. E. & Holtz, B. C. (2006). On the development of situational judgment tests: Issues in item development, scaling, and scoring. In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 157–182). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Whetzel, D. L. & McDaniel, M. A. (2009). Situational judgment tests: An overview of current research. Human Resource Management Review, 19, 188–202. First citation in articleCrossrefGoogle Scholar