Skip to main content
Original Article

A Common Measurement Scale for Self-Report Instruments in Mental Health Care

T Scores With a Normal Distribution

Published Online:https://doi.org/10.1027/1015-5759/a000740

Abstract: The diversity of measures in clinical psychology hampers a straightforward interpretation of test results, complicates communication with the patient, and constitutes a challenge to the implementation of measurement-based care. In educational research and assessment, it is common practice to convert test scores to a common metric, such as T scores. We recommend applying this also in clinical psychology and propose and test a procedure to arrive at T scores approximating a normal distribution that can be applied to individual test scores. We established formulas to estimate normalized T scores from raw scale scores by regressing IRT-based θ scores on raw scores. With data from a large population and clinical samples, we established crosswalk formulas. Their validity was investigated by comparing calculated T scores with IRT-based T scores. IRT and formulas yielded very similar T scores, supporting the validity of the latter approach. Theoretical and practical advantages and disadvantages of both approaches to convert scores to common metrics and alternative approaches are discussed. Provided that scale characteristics allow for their computation, T scores will help to better understand measurement results, which makes it easier for patients and practitioners to use test results in joint decision-making about the course of treatment.

References

  • Bakker, I. M., Terluin, B., Van Marwijk, H. W., van der Windt, D. A. M., Rijmen, F., van Mechelen, W., & Stalman, W. A. (2007). A cluster-randomised trial evaluating an intervention for patients with stress-related mental disorders and sick leave in primary care. PLoS Clinical Trials, 2(6), Article e26. https://doi.org/10.1371/journal.pctr.0020026 First citation in articleCrossrefGoogle Scholar

  • Batterham, P. J., Sunderland, M., Slade, T., Calear, A. L., & Carragher, N. (2018). Assessing distress in the community: Psychometric properties and crosswalk comparison of eight measures of psychological distress. Psychological Medicine, 48(8), 1316–1324. https://doi.org/10.1017/S0033291717002835 First citation in articleCrossrefGoogle Scholar

  • Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 84767, 307–310. https://doi.org/10.1016/S140-6736(86)90837-8 First citation in articleCrossrefGoogle Scholar

  • Bowman, M. L. (2002). The perfidy of percentiles. Archives of Clinical Neuropsychology, 17(3), 295–303. https://doi.org/10.1016/S0887-6177(01)00116-0 First citation in articleCrossrefGoogle Scholar

  • Camstra, A., & Boomsma, A. (1992). Cross-validation in regression and covariance structure analysis: An overview. Sociological Methods & Research, 21(1), 89–115. https://doi.org/10.1177/0049124192021001004 First citation in articleCrossrefGoogle Scholar

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in articleCrossrefGoogle Scholar

  • Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychological Assessment, 26(2), 513–527. https://doi.org/10.1037/a0035768 First citation in articleCrossrefGoogle Scholar

  • Cook, K. F., Schalet, B. D., Kallen, M. A., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI Pain Interference and SF-36 Bodily Pain Subscale scores to the PROMIS Pain Interference metric. Quality of Life Research, 24(10), 2305–2318. https://doi.org/10.1007/s11136-014-0790-9 First citation in articleCrossrefGoogle Scholar

  • Crawford, J. R., & Garthwaite, P. H. (2009). Percentiles please: The case for expressing neuropsychological test scores and accompanying confidence limits as percentile ranks. The Clinical Neuropsychologist, 23(2), 193–204. https://doi.org/10.1080/13854040801968450 First citation in articleCrossrefGoogle Scholar

  • Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522 First citation in articleCrossrefGoogle Scholar

  • Cronbach, L. J. (1984). Essentials of psychological testing (4th ed.). Harper & Row. First citation in articleGoogle Scholar

  • de Beurs, E., Carlier, I. V., & van Hemert, A. M. (2019). Approaches to denote treatment outcome: Clinical significance and clinical global impression compared. International Journal of Methods in Psychiatric Research, 28(4), Article e1797. https://doi.org/10.1002/mpr.1797 First citation in articleCrossrefGoogle Scholar

  • de Beurs, E., den Hollander-Gijsman, M., Buwalda, V., Trijsburg, W., & Zitman, F. G. (2005). De Outcome Questionnaire (OQ-45): Een meetinstrument voor meer dan alleen psychische klachten [The Outcome Questionnaire (OQ-45): A measure for psychiatric symptoms and more]. De Psycholoog, 40(1), 53–63. First citation in articleGoogle Scholar

  • de Beurs, E., den Hollander-Gijsman, M. E., van Rood, Y. R., van der Wee, N. J., Giltay, E. J., van Noorden, M. S., van der Lem, R., van Fenema, E., & Zitman, F. G. (2011). Routine outcome monitoring in the Netherlands: Practical experiences with a web-based strategy for the assessment of treatment outcome in clinical practice. Clinical Psychology & Psychotherapy, 18(1), 1–12. https://doi.org/10.1002/cpp.696 First citation in articleCrossrefGoogle Scholar

  • de Beurs, E., Fried, E. I., & Boehnke, J. (2022). Common measures or common metrics? A plea to harmonize measurement results. Clinical Psychology and Psychotherapy, 29(5), 1755–1767. https://doi.org/10.1002/cpp.2742 First citation in articleCrossrefGoogle Scholar

  • de Beurs, E., Oudejans, S., & Terluin, B. (2022). A common measurement scale for self-report instruments in mental health care: T scores with a normal distribution (supplementary materials). https://www.psycharchives.org/en/item/86e598e9–4828-4127–86ae-5f0d18e9586a First citation in articleGoogle Scholar

  • de Beurs, E., & Zitman, F. G. (2006). De Brief Symptom Inventory (BSI): De betrouwbaarheid en validiteit van een handzaam alternatief voor de SCL-90 [The Brief Symptom Inventory: Reliability and validity of a handy alternative for the SCL-90]. Maandblad Geestelijke Volksgezondheid, 61, 120–141. First citation in articleGoogle Scholar

  • de Jong, K., Nugter, M. A., Polak, M. G., Wagenborg, J. E. A., Spinhoven, P., & Heiser, W. J. (2007). The Outcome Questionnaire (OQ-45) in a Dutch population: A cross-cultural validation. Clinical Psychology & Psychotherapy, 14(4), 288–301. https://doi.org/10.1002/cpp.529 First citation in articleCrossrefGoogle Scholar

  • Derogatis, L. R. (1975). The Brief Symptom Inventory. Clinical Psychometric Research. First citation in articleGoogle Scholar

  • Dorans, N. J.Pommerich, M.Holland, P. W. (Eds.) (2007). Linking and aligning scores and scales. Springer. First citation in articleCrossrefGoogle Scholar

  • Embretson, S. E., & Reise, S. P. (2013). Item response theory for psychologists. Erlbaum. First citation in articleCrossrefGoogle Scholar

  • Fischer, H. F., & Rose, M. (2016). www.common-metrics.org: A web application to estimate scores from different patient-reported outcome measures on a common scale. BMC Medical Research Methodology, 16(1), Article 142. https://doi.org/10.1186/s12874-016-0241-0 First citation in articleCrossrefGoogle Scholar

  • Fischer, H. F., Tritt, K., Klapp, B. F., & Fliege, H. (2011). How to compare scores from different depression scales: Equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using item response theory. International Journal of Methods in Psychiatric Research, 20(4), 203–214. https://doi.org/10.1002/mpr.350 First citation in articleCrossrefGoogle Scholar

  • Friedrich, M., Hinz, A., Kuhnt, S., Schulte, T., Rose, M., & Fischer, F. (2019). Measuring fatigue in cancer patients: A common metric for six fatigue instruments. Quality of Life Research, 28(6), 1615–1626. https://doi.org/10.1007/s11136-019-02147-3 First citation in articleCrossrefGoogle Scholar

  • Harding, K. J., Rush, A. J., Arbuckle, M., Trivedi, M. H., & Pincus, H. A. (2011). Measurement-based care in psychiatric practice: A policy framework for implementation. Journal of Clinical Psychiatry, 72(8), 1136–1143. https://doi.org/10.4088/JCP.10r06282whi First citation in articleCrossrefGoogle Scholar

  • Holland, P. W., Dorans, N. J., & Petersen, N. S. (2006). Equating test scores. In C. R. RaoS. SinharayEds., Handbook of statistics (Vol. 26, pp. 169–203). https://doi.org/10.1016/S0169-7161(06)26006-1 First citation in articleCrossrefGoogle Scholar

  • Kilbourne, A. M., Beck, K., Spaeth-Rublee, B., Ramanuj, P., O’Brien, R. W., Tomoyasu, N., & Pincus, H. A. (2018). Measuring and improving the quality of mental health care: A global perspective. World Psychiatry, 17(1), 30–38. https://doi.org/10.1002/wps.20482 First citation in articleCrossrefGoogle Scholar

  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer Science & Business Media. First citation in articleCrossrefGoogle Scholar

  • Lambert, M. J., Gregersen, A. T., & Burlingame, G. M. (2004). The Outcome Questionnaire – 45. In M. E. MaruishEd., The use of psychological testing for treatment planning and outcomes assessment: Volume 3: Instruments for adults (3rd ed., pp. 191–234). Erlbaum. http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2004-14941-006&site=ehost-live First citation in articleGoogle Scholar

  • Lambert, M. J., & Harmon, K. L. (2018). The merits of implementing routine outcome monitoring in clinical practice. Clinical Psychology: Science and Practice, 25(4), Article e12268. https://doi.org/10.1111/cpsp.12268 First citation in articleCrossrefGoogle Scholar

  • Ley, P. (1972). Quantitative aspects of psychological assessment (Vol. 1). Duckworth. First citation in articleGoogle Scholar

  • Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409 First citation in articleCrossrefGoogle Scholar

  • McCall, W. A. (1922). How to measure in education. MacMillan. First citation in articleGoogle Scholar

  • Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. Eleven International. https://books.google.es/books?id=jRJJYAAACAAJ First citation in articleGoogle Scholar

  • Miller, S. D., Hubble, M. A., Chow, D., & Seidel, J. (2015). Beyond measures and monitoring: Realizing the potential of feedback-informed treatment. Psychotherapy, 52(4), 449–457. https://doi.org/10.1037/pst0000031 First citation in articleCrossrefGoogle Scholar

  • Patel, S. R., Bakken, S., & Ruland, C. (2008). Recent advances in shared decision making for mental health. Current Opinion in Psychiatry, 21(6), 606–6012. https://doi.org/10.1097/YCO.0b013e32830eb6b4 First citation in articleCrossrefGoogle Scholar

  • Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in articleCrossrefGoogle Scholar

  • Schalet, B. D., Cook, K. F., Choi, S. W., & Cella, D. (2014). Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of Anxiety Disorders, 28(1), 88–96. https://doi.org/10.1016/j.janxdis.2013.11.006 First citation in articleCrossrefGoogle Scholar

  • Schalet, B. D., Revicki, D. A., Cook, K. F., Krishnan, E., Fries, J. F., & Cella, D. (2015). Establishing a common metric for physical function: Linking the HAQ-DI and SF-36 PF subscale to PROMIS® Physical Function. Journal of General Internal Medicine, 30(10), 1517–1523. https://doi.org/10.1007/s11606-015-3360-0 First citation in articleCrossrefGoogle Scholar

  • Scherpenzeel, A. C. (2018). “True” longitudinal and probability-based Internet panels: Evidence from the Netherlands. In M. DasP. EsterL. KaszmirekEds., Social and behavioral research and the Internet (pp. 77–104). Routledge. https://doi.org/10.4324/9780203844922-4 First citation in articleCrossrefGoogle Scholar

  • Scherpenzeel, A. C., & Bethlehem, J. G. (2011). How representative are online panels? Problems of coverage and selection and possible solutions. In M. DasP. EsterL. KaczmirekEds., Social and behavioral research and the Internet: Advances in applied methods and research strategies (pp. 105–132). Taylor & Francis. First citation in articleGoogle Scholar

  • Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of Educational Research, 99(6), 323–338. https://doi.org/10.3200/JOER.99.6.323-338 First citation in articleCrossrefGoogle Scholar

  • Slade, M., Amering, M., & Oades, L. (2008). Recovery: An international perspective. Epidemiology and Psychiatric Sciences, 17(2), 128–137. https://doi.org/10.1017/S1121189X00002827 First citation in articleCrossrefGoogle Scholar

  • Smits, N. (2016). On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: A simulation study. Quality of Life Research, 25(7), 1635–1644. https://doi.org/10.1007/s11136-015-1199-9 First citation in articleCrossrefGoogle Scholar

  • Smits, N., Öğreden, O., Garnier-Villarreal, M., Terwee, C. B., & Chalmers, R. P. (2020). A study of alternative approaches to non-normal latent trait distributions in item response theory models used for health outcome measurement. Statistical Methods in Medical Research, 29(4), 1030–1048. https://doi.org/10.1177/0962280220907625 First citation in articleCrossrefGoogle Scholar

  • Stasinopoulos, M. D., Rigby, R. A., & Bastiani, F. D. (2018). GAMLSS: A distributional regression approach. Statistical Modelling, 18(3–4), 248–273. https://doi.org/10.1177/1471082X18759144 First citation in articleCrossrefGoogle Scholar

  • ten Klooster, P. M., Oude Voshaar, M. A. H., Gandek, B., Rose, M., Bjorner, J. B., Taal, E., Glas, C. A. W., van Riel, P. L. C. M., & van de Laar, M. A. F. J. (2013). Development and evaluation of a crosswalk between the SF-36 Physical Functioning Scale and Health Assessment Questionnaire Disability Index in rheumatoid arthritis. Health and Quality of Life Outcomes, 11, 199. https://doi.org/10.1186/1477-7525-11-199 First citation in articleCrossrefGoogle Scholar

  • Terluin, B., Smits, N., Brouwers, E. P. M., & de Vet, H. C. W. (2016). The Four-Dimensional Symptom Questionnaire (4DSQ) in the general population: Scale structure, reliability, measurement invariance and normative data: A cross-sectional survey. Health and Quality of Life Outcomes, 14(1), Article 130. https://doi.org/10.1186/s12955-016-0533-4 First citation in articleCrossrefGoogle Scholar

  • Terluin, B., van Marwijk, H. W., Adèr, H. J., de Vet, H. C., Penninx, B. W., Hermens, M. L., van Boeijen, C. A., van Balkom, A. J., van der Klink, J. J., & Stalman, W. A. (2006). The Four-Dimensional Symptom Questionnaire (4DSQ): A validation study of a multidimensional self-report questionnaire to assess distress, depression, anxiety and somatization. BMC Psychiatry, 6(1), Article 1. https://doi.org/10.1186/1471-244X-6-34 First citation in articleCrossrefGoogle Scholar

  • Timman, R., de Jong, K., & de Neve-Enthoven, N. (2017). Cut-off scores and clinical change indices for the Dutch Outcome Questionnaire (OQ-45) in a large sample of normal and several psychotherapeutic populations. Clinical Psychology & Psychotherapy, 24(1), 72–81. https://doi.org/10.1002/cpp.1979 First citation in articleCrossrefGoogle Scholar

  • Timmerman, M. E., Voncken, L., & Albers, C. J. (2021). A tutorial on regression-based norming of psychological tests with GAMLSS. Psychological Methods, 26(3), 357–373. https://doi.org/10.1037/met0000348 First citation in articleCrossrefGoogle Scholar

  • van der Laan, J. (2009). Representativity of the LISS panel. Statistics Netherlands. First citation in articleGoogle Scholar

  • Van Hoeck, K. J. M., Lilien, M. R., Brinkman, D. C., & Schroeder, C. H. (2000). Comparing a urea kinetic monitor with Daugirdas formula and dietary records in children. Pediatric Nephrology, 14(4), 280–283. https://doi.org/10.1007/s004670050759 First citation in articleCrossrefGoogle Scholar

  • van Stralen, K. J., Dekker, F. W., Zoccali, C., & Jager, K. J. (2012). Measuring agreement, more complicated than it seems. Nephron Clinical Practice, 120(3), c162–c167. https://doi.org/10.1159/000337798 First citation in articleCrossrefGoogle Scholar

  • Wahl, I., Löwe, B., Bjorner, J. B., Fischer, F., Langs, G., Voderholzer, U., Aita, S. A., Bergemann, N., Brähler, E., & Rose, M. (2014). Standardization of depression measurement: A common metric was developed for 11 self-report depression measures. Journal of Clinical Epidemiology, 67(1), 73–86. https://doi.org/10.1016/j.jclinepi.2013.04.019 First citation in articleCrossrefGoogle Scholar

  • Wall, M. M., Park, J. Y., & Moustaki, I. (2015). IRT Modeling in the presence of zero-Inflation with application to psychiatric disorder severity. Applied Psychological Measurement, 39(8), 583–597. https://doi.org/10.1177/0146621615588184 First citation in articleCrossrefGoogle Scholar