A Meta-Analytical Multilevel Reliability Generalization of Situational Judgment Tests (SJTs)
Abstract
Abstract. During the past 20 years, Situational Judgment Tests (SJTs) have developed into a viable tool in personnel selection. Despite their growing popularity, research examining the extent of measurement error is widely lacking. Using reliability generalization, the aim of this article was twofold: (1) establish an estimate for an average coefficient alpha of SJT scores across studies and (2) examine the influence of essential SJT features and selected study variables on score reliability. To handle potential dependent observations a three-level hierarchical linear model was used. The results indicate that the reliability of SJT scores is typically rather low and below recommended levels for high-stakes applications. Additionally, both SJT and study characteristics affect score reliability. Implications for practitioners and researchers are provided to guide an appropriate use of SJTs and to initiate future research.
References
2012). Assessment of publication bias, and unavailable data in meta-analyses using individual participant data: A database survey. British Medical Journal, 344, d7762.
(2002). A reliability generalization study of the Marlone-Crowne Social Desirability Scale. Educational and Psychological Measurement, 62, 570–589.
(2003). Using mixed-effect models in reliability generalization studies. Educational and Psychological Measurement, 63, 75–95.
(2006). Scoring situational judgment tests: Once you get the data, your troubles begin. International Journal of Selection and Assessment, 14, 223–235.
(2003).
(Current directions and issues in personnel selection and classification . In J. J. MartocchioG. R. FerrisEds., Research in personnel and human resources management (pp. 287–356). London, UK: Elsevier.2012). Assessing the reliability of situational judgment tests used in high-stakes situations. International Journal of Selection and Assessment, 20, 334–346.
(2010). Situational judgment tests: Constructs assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63, 83–117.
(2009). The incremental validity of a situational judgment test (SJT) relative to personality and cognitive ability to predict managerial performance (Unpublished master’s thesis). Stellenbosch University, South Africa.
(1976). A k-sample significance test for independent alpha coefficients. Psychometrika, 41, 219–231.
(2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement, 23, 17–27.
(2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177–189.
(2002). Characterizing measurement error in scores across studies: Some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development, 35, 113–126.
(1977). Empirical item keying versus a rational approach to analyzing a psychological climate questionnaire. Applied Psychological Measurement, 1, 489–500.
(2004). Empirical keying for situational judgment tests: Rationale and some examples. Paper presented at the 19th annual conference of the Industrial and Organizational Psychology, Chicago, IL.
(1994). The effect of response format on reliability estimates for tacit knowledge scales (Research Note No. 94–25). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
(2006).
(International situational judgment tests . In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 279–300). Mahwah, NJ: Erlbaum.2007). Situational judgment tests: A review of recent research. Personnel Review, 37, 426–441.
(2006). Video-based versus written situational judgment tests: A comparison in terms of predictive validity. Journal of Applied Psychology, 91, 1181–1188.
(2007). Situational judgment tests, response instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91.
(2001). Situational judgment tests: A review of practice and constructs assessed. International Journal of Selection and Assessment, 9, 103–113.
(1995). Coefficient alpha: A basic introduction from the perspective of classical test theory and structural equation modelling. Structural Equation Modeling, 2, 255–273.
(1990). An alternative selection procedure: The low-fidelity simulation. Journal of Applied Psychology, 75, 640–647.
(2000). Incremental validity of empirically keyed biodata scales over GMA and the five factor personality constructs. Personnel Psychology, 53, 299–323.
(2008). Know the score: An exploration of keying and scoring approaches for situational judgment tests (Unpublished dissertation). University of Minnesota, Minnesota.
(1994). Psychometric theory. New York, NY: McGraw-Hill.
(2006).
(Multimedia situational judgment tests: The medium creates the message . In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 135–155). Mahwah, NJ: Erlbaum.2003). Be careful what you ask for: Effects of response instructions on the construct validity and reliability of situational judgment tests. International Journal of Selection and Assessment, 11, 1–16.
(2006). Staffing organizations: Contemporary practice and research. Mahwah, NJ: Erlbaum.
(2002). Comparing five empirical biodata methods for personnel selection (Unpublished master’s thesis). University of North Texas, Texas.
(2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage.
(2009). HLM 6.08: Hierarchical linear and nonlinear modelling [Computer software]. Chicago, IL: Scientific Software International.
(2006). Meta-analysis of coefficient alpha. Psychological Methods, 11, 306–322.
(2009). What are the consequences if the assumption of independent observations is violated in reliability generalization meta-analysis studies? Educational and Psychological Measurement, 69, 404–428.
(1976). Statistical power in criterion-related validation studies. Journal of Applied Psychology, 61, 473–485.
(2006).
(Situational judgment tests: Method or construct? . In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 135–155). Mahwah, NJ: Erlbaum.2004). Funnel plots in meta-analysis. The Stata Journal, 4, 127–141.
(2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage.
(2011). Score reliability: A retrospective look back at 12 years of reliability generalization studies. Measurement and Evaluation in Counseling and Development, 44, 159–168.
(2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37, 360–363.
(2011). A reliability generalization of the Overexcitability Questionnaire-Two. Journal of Advanced Academics, 22, 671–692.
(2006a). Situational judgment tests: Theory, measurement and application. Mahwah, NJ: Erlbaum.
(2006b).
(An introduction to situational judgment testing . In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 1–10). Mahwah, NJ: Erlbaum.2006).
(On the development of situational judgment tests: Issues in item development, scaling, and scoring . In J. A. WeekleyR. E. PloyhartEds., Situational judgment tests: Theory, measurement and application (pp. 157–182). Mahwah, NJ: Erlbaum.2009). Situational judgment tests: An overview of current research. Human Resource Management Review, 19, 188–202.
(