Original Article

A Common Measurement Scale for Self-Report Instruments in Mental Health Care

T Scores With a Normal Distribution

Edwin de Beurs, Department of Clinical Psychology, Faculty of Social Sciences, Leiden University, Wassenaarseweg 52, 2333 AK Leiden, The Netherlands

[email protected]

Department of Clinical Psychology, Faculty of Social Sciences, Leiden University, The Netherlands

Arkin Mental Health Institute, Amsterdam, The Netherlands

Search for more papers by this author

Suzan Oudejans

Mark Bench, Amsterdam, The Netherlands

Search for more papers by this author

, and

Berend Terluin

EMGO Institute, VU Medical Center, Amsterdam, The Netherlands

Search for more papers by this author

Published Online:December 16, 2022https://doi.org/10.1027/1015-5759/a000740

Abstract

Abstract: The diversity of measures in clinical psychology hampers a straightforward interpretation of test results, complicates communication with the patient, and constitutes a challenge to the implementation of measurement-based care. In educational research and assessment, it is common practice to convert test scores to a common metric, such as T scores. We recommend applying this also in clinical psychology and propose and test a procedure to arrive at T scores approximating a normal distribution that can be applied to individual test scores. We established formulas to estimate normalized T scores from raw scale scores by regressing IRT-based θ scores on raw scores. With data from a large population and clinical samples, we established crosswalk formulas. Their validity was investigated by comparing calculated T scores with IRT-based T scores. IRT and formulas yielded very similar T scores, supporting the validity of the latter approach. Theoretical and practical advantages and disadvantages of both approaches to convert scores to common metrics and alternative approaches are discussed. Provided that scale characteristics allow for their computation, T scores will help to better understand measurement results, which makes it easier for patients and practitioners to use test results in joint decision-making about the course of treatment.

References

Bakker, I. M., Terluin, B., Van Marwijk, H. W., van der Windt, D. A. M., Rijmen, F., van Mechelen, W., & Stalman, W. A. (2007). A cluster-randomised trial evaluating an intervention for patients with stress-related mental disorders and sick leave in primary care. PLoS Clinical Trials, 2(6), Article e26. https://doi.org/10.1371/journal.pctr.0020026 First citation in article Crossref, Google Scholar
Batterham, P. J., Sunderland, M., Slade, T., Calear, A. L., & Carragher, N. (2018). Assessing distress in the community: Psychometric properties and crosswalk comparison of eight measures of psychological distress. Psychological Medicine, 48(8), 1316–1324. https://doi.org/10.1017/S0033291717002835 First citation in article Crossref, Google Scholar
Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 84767, 307–310. https://doi.org/10.1016/S140-6736(86)90837-8 First citation in article Crossref, Google Scholar
Bowman, M. L. (2002). The perfidy of percentiles. Archives of Clinical Neuropsychology, 17(3), 295–303. https://doi.org/10.1016/S0887-6177(01)00116-0 First citation in article Crossref, Google Scholar
Camstra, A., & Boomsma, A. (1992). Cross-validation in regression and covariance structure analysis: An overview. Sociological Methods & Research, 21(1), 89–115. https://doi.org/10.1177/0049124192021001004 First citation in article Crossref, Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in article Crossref, Google Scholar
Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychological Assessment, 26(2), 513–527. https://doi.org/10.1037/a0035768 First citation in article Crossref, Google Scholar
Cook, K. F., Schalet, B. D., Kallen, M. A., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI Pain Interference and SF-36 Bodily Pain Subscale scores to the PROMIS Pain Interference metric. Quality of Life Research, 24(10), 2305–2318. https://doi.org/10.1007/s11136-014-0790-9 First citation in article Crossref, Google Scholar
Crawford, J. R., & Garthwaite, P. H. (2009). Percentiles please: The case for expressing neuropsychological test scores and accompanying confidence limits as percentile ranks. The Clinical Neuropsychologist, 23(2), 193–204. https://doi.org/10.1080/13854040801968450 First citation in article Crossref, Google Scholar
Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522 First citation in article Crossref, Google Scholar
Cronbach, L. J. (1984). Essentials of psychological testing (4th ed.). Harper & Row. First citation in article Google Scholar
de Beurs, E., Carlier, I. V., & van Hemert, A. M. (2019). Approaches to denote treatment outcome: Clinical significance and clinical global impression compared. International Journal of Methods in Psychiatric Research, 28(4), Article e1797. https://doi.org/10.1002/mpr.1797 First citation in article Crossref, Google Scholar
de Beurs, E., den Hollander-Gijsman, M., Buwalda, V., Trijsburg, W., & Zitman, F. G. (2005). De Outcome Questionnaire (OQ-45): Een meetinstrument voor meer dan alleen psychische klachten [The Outcome Questionnaire (OQ-45): A measure for psychiatric symptoms and more]. De Psycholoog, 40(1), 53–63. First citation in article Google Scholar
de Beurs, E., den Hollander-Gijsman, M. E., van Rood, Y. R., van der Wee, N. J., Giltay, E. J., van Noorden, M. S., van der Lem, R., van Fenema, E., & Zitman, F. G. (2011). Routine outcome monitoring in the Netherlands: Practical experiences with a web-based strategy for the assessment of treatment outcome in clinical practice. Clinical Psychology & Psychotherapy, 18(1), 1–12. https://doi.org/10.1002/cpp.696 First citation in article Crossref, Google Scholar
de Beurs, E., Fried, E. I., & Boehnke, J. (2022). Common measures or common metrics? A plea to harmonize measurement results. Clinical Psychology and Psychotherapy, 29(5), 1755–1767. https://doi.org/10.1002/cpp.2742 First citation in article Crossref, Google Scholar
de Beurs, E., Oudejans, S., & Terluin, B. (2022). A common measurement scale for self-report instruments in mental health care: T scores with a normal distribution (supplementary materials). https://www.psycharchives.org/en/item/86e598e9–4828-4127–86ae-5f0d18e9586a First citation in article Google Scholar
de Beurs, E., & Zitman, F. G. (2006). De Brief Symptom Inventory (BSI): De betrouwbaarheid en validiteit van een handzaam alternatief voor de SCL-90 [The Brief Symptom Inventory: Reliability and validity of a handy alternative for the SCL-90]. Maandblad Geestelijke Volksgezondheid, 61, 120–141. First citation in article Google Scholar
de Jong, K., Nugter, M. A., Polak, M. G., Wagenborg, J. E. A., Spinhoven, P., & Heiser, W. J. (2007). The Outcome Questionnaire (OQ-45) in a Dutch population: A cross-cultural validation. Clinical Psychology & Psychotherapy, 14(4), 288–301. https://doi.org/10.1002/cpp.529 First citation in article Crossref, Google Scholar
Derogatis, L. R. (1975). The Brief Symptom Inventory. Clinical Psychometric Research. First citation in article Google Scholar
Dorans, N. J.Pommerich, M.Holland, P. W. (Eds.) (2007). Linking and aligning scores and scales. Springer. First citation in article Crossref, Google Scholar
Embretson, S. E., & Reise, S. P. (2013). Item response theory for psychologists. Erlbaum. First citation in article Crossref, Google Scholar
Fischer, H. F., & Rose, M. (2016). www.common-metrics.org: A web application to estimate scores from different patient-reported outcome measures on a common scale. BMC Medical Research Methodology, 16(1), Article 142. https://doi.org/10.1186/s12874-016-0241-0 First citation in article Crossref, Google Scholar
Fischer, H. F., Tritt, K., Klapp, B. F., & Fliege, H. (2011). How to compare scores from different depression scales: Equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using item response theory. International Journal of Methods in Psychiatric Research, 20(4), 203–214. https://doi.org/10.1002/mpr.350 First citation in article Crossref, Google Scholar
Friedrich, M., Hinz, A., Kuhnt, S., Schulte, T., Rose, M., & Fischer, F. (2019). Measuring fatigue in cancer patients: A common metric for six fatigue instruments. Quality of Life Research, 28(6), 1615–1626. https://doi.org/10.1007/s11136-019-02147-3 First citation in article Crossref, Google Scholar
Harding, K. J., Rush, A. J., Arbuckle, M., Trivedi, M. H., & Pincus, H. A. (2011). Measurement-based care in psychiatric practice: A policy framework for implementation. Journal of Clinical Psychiatry, 72(8), 1136–1143. https://doi.org/10.4088/JCP.10r06282whi First citation in article Crossref, Google Scholar
Holland, P. W., Dorans, N. J., & Petersen, N. S. (2006). Equating test scores. In C. R. RaoS. SinharayEds., Handbook of statistics (Vol. 26, pp. 169–203). https://doi.org/10.1016/S0169-7161(06)26006-1 First citation in article Crossref, Google Scholar
Kilbourne, A. M., Beck, K., Spaeth-Rublee, B., Ramanuj, P., O’Brien, R. W., Tomoyasu, N., & Pincus, H. A. (2018). Measuring and improving the quality of mental health care: A global perspective. World Psychiatry, 17(1), 30–38. https://doi.org/10.1002/wps.20482 First citation in article Crossref, Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer Science & Business Media. First citation in article Crossref, Google Scholar
Lambert, M. J., Gregersen, A. T., & Burlingame, G. M. (2004). The Outcome Questionnaire – 45. In M. E. MaruishEd., The use of psychological testing for treatment planning and outcomes assessment: Volume 3: Instruments for adults (3rd ed., pp. 191–234). Erlbaum. http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2004-14941-006&site=ehost-live First citation in article Google Scholar
Lambert, M. J., & Harmon, K. L. (2018). The merits of implementing routine outcome monitoring in clinical practice. Clinical Psychology: Science and Practice, 25(4), Article e12268. https://doi.org/10.1111/cpsp.12268 First citation in article Crossref, Google Scholar
Ley, P. (1972). Quantitative aspects of psychological assessment (Vol. 1). Duckworth. First citation in article Google Scholar
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409 First citation in article Crossref, Google Scholar
McCall, W. A. (1922). How to measure in education. MacMillan. First citation in article Google Scholar
Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. Eleven International. https://books.google.es/books?id=jRJJYAAACAAJ First citation in article Google Scholar
Miller, S. D., Hubble, M. A., Chow, D., & Seidel, J. (2015). Beyond measures and monitoring: Realizing the potential of feedback-informed treatment. Psychotherapy, 52(4), 449–457. https://doi.org/10.1037/pst0000031 First citation in article Crossref, Google Scholar
Patel, S. R., Bakken, S., & Ruland, C. (2008). Recent advances in shared decision making for mental health. Current Opinion in Psychiatry, 21(6), 606–6012. https://doi.org/10.1097/YCO.0b013e32830eb6b4 First citation in article Crossref, Google Scholar
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in article Crossref, Google Scholar
Schalet, B. D., Cook, K. F., Choi, S. W., & Cella, D. (2014). Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of Anxiety Disorders, 28(1), 88–96. https://doi.org/10.1016/j.janxdis.2013.11.006 First citation in article Crossref, Google Scholar
Schalet, B. D., Revicki, D. A., Cook, K. F., Krishnan, E., Fries, J. F., & Cella, D. (2015). Establishing a common metric for physical function: Linking the HAQ-DI and SF-36 PF subscale to PROMIS^® Physical Function. Journal of General Internal Medicine, 30(10), 1517–1523. https://doi.org/10.1007/s11606-015-3360-0 First citation in article Crossref, Google Scholar
Scherpenzeel, A. C. (2018). “True” longitudinal and probability-based Internet panels: Evidence from the Netherlands. In M. DasP. EsterL. KaszmirekEds., Social and behavioral research and the Internet (pp. 77–104). Routledge. https://doi.org/10.4324/9780203844922-4 First citation in article Crossref, Google Scholar
Scherpenzeel, A. C., & Bethlehem, J. G. (2011). How representative are online panels? Problems of coverage and selection and possible solutions. In M. DasP. EsterL. KaczmirekEds., Social and behavioral research and the Internet: Advances in applied methods and research strategies (pp. 105–132). Taylor & Francis. First citation in article Google Scholar
Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. The Journal of Educational Research, 99(6), 323–338. https://doi.org/10.3200/JOER.99.6.323-338 First citation in article Crossref, Google Scholar
Slade, M., Amering, M., & Oades, L. (2008). Recovery: An international perspective. Epidemiology and Psychiatric Sciences, 17(2), 128–137. https://doi.org/10.1017/S1121189X00002827 First citation in article Crossref, Google Scholar
Smits, N. (2016). On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: A simulation study. Quality of Life Research, 25(7), 1635–1644. https://doi.org/10.1007/s11136-015-1199-9 First citation in article Crossref, Google Scholar
Smits, N., Öğreden, O., Garnier-Villarreal, M., Terwee, C. B., & Chalmers, R. P. (2020). A study of alternative approaches to non-normal latent trait distributions in item response theory models used for health outcome measurement. Statistical Methods in Medical Research, 29(4), 1030–1048. https://doi.org/10.1177/0962280220907625 First citation in article Crossref, Google Scholar
Stasinopoulos, M. D., Rigby, R. A., & Bastiani, F. D. (2018). GAMLSS: A distributional regression approach. Statistical Modelling, 18(3–4), 248–273. https://doi.org/10.1177/1471082X18759144 First citation in article Crossref, Google Scholar
ten Klooster, P. M., Oude Voshaar, M. A. H., Gandek, B., Rose, M., Bjorner, J. B., Taal, E., Glas, C. A. W., van Riel, P. L. C. M., & van de Laar, M. A. F. J. (2013). Development and evaluation of a crosswalk between the SF-36 Physical Functioning Scale and Health Assessment Questionnaire Disability Index in rheumatoid arthritis. Health and Quality of Life Outcomes, 11, 199. https://doi.org/10.1186/1477-7525-11-199 First citation in article Crossref, Google Scholar
Terluin, B., Smits, N., Brouwers, E. P. M., & de Vet, H. C. W. (2016). The Four-Dimensional Symptom Questionnaire (4DSQ) in the general population: Scale structure, reliability, measurement invariance and normative data: A cross-sectional survey. Health and Quality of Life Outcomes, 14(1), Article 130. https://doi.org/10.1186/s12955-016-0533-4 First citation in article Crossref, Google Scholar
Terluin, B., van Marwijk, H. W., Adèr, H. J., de Vet, H. C., Penninx, B. W., Hermens, M. L., van Boeijen, C. A., van Balkom, A. J., van der Klink, J. J., & Stalman, W. A. (2006). The Four-Dimensional Symptom Questionnaire (4DSQ): A validation study of a multidimensional self-report questionnaire to assess distress, depression, anxiety and somatization. BMC Psychiatry, 6(1), Article 1. https://doi.org/10.1186/1471-244X-6-34 First citation in article Crossref, Google Scholar
Timman, R., de Jong, K., & de Neve-Enthoven, N. (2017). Cut-off scores and clinical change indices for the Dutch Outcome Questionnaire (OQ-45) in a large sample of normal and several psychotherapeutic populations. Clinical Psychology & Psychotherapy, 24(1), 72–81. https://doi.org/10.1002/cpp.1979 First citation in article Crossref, Google Scholar
Timmerman, M. E., Voncken, L., & Albers, C. J. (2021). A tutorial on regression-based norming of psychological tests with GAMLSS. Psychological Methods, 26(3), 357–373. https://doi.org/10.1037/met0000348 First citation in article Crossref, Google Scholar
van der Laan, J. (2009). Representativity of the LISS panel. Statistics Netherlands. First citation in article Google Scholar
Van Hoeck, K. J. M., Lilien, M. R., Brinkman, D. C., & Schroeder, C. H. (2000). Comparing a urea kinetic monitor with Daugirdas formula and dietary records in children. Pediatric Nephrology, 14(4), 280–283. https://doi.org/10.1007/s004670050759 First citation in article Crossref, Google Scholar
van Stralen, K. J., Dekker, F. W., Zoccali, C., & Jager, K. J. (2012). Measuring agreement, more complicated than it seems. Nephron Clinical Practice, 120(3), c162–c167. https://doi.org/10.1159/000337798 First citation in article Crossref, Google Scholar
Wahl, I., Löwe, B., Bjorner, J. B., Fischer, F., Langs, G., Voderholzer, U., Aita, S. A., Bergemann, N., Brähler, E., & Rose, M. (2014). Standardization of depression measurement: A common metric was developed for 11 self-report depression measures. Journal of Clinical Epidemiology, 67(1), 73–86. https://doi.org/10.1016/j.jclinepi.2013.04.019 First citation in article Crossref, Google Scholar
Wall, M. M., Park, J. Y., & Moustaki, I. (2015). IRT Modeling in the presence of zero-Inflation with application to psychiatric disorder severity. Applied Psychological Measurement, 39(8), 583–597. https://doi.org/10.1177/0146621615588184 First citation in article Crossref, Google Scholar

Volume 40Issue 2March 2024

ISSN: 1015-5759eISSN: 2151-2426

History

ReceivedApril 19, 2021
RevisedJuly 6, 2022
AcceptedAugust 3, 2022
Published onlineDecember 16, 2022

Licenses & Copyright

Keywords

Acknowledgments:

In this paper, we gratefully made use of BSI- and 4DSQ data of the LISS (Longitudinal Internet Studies for the Social sciences) panel administered by CentERdata (Tilburg University, The Netherlands) and OQ-45 data from TNS-NIPO, The Netherlands. We would also like to thank MHC providers in the Netherlands for providing patient data on the BSI and the OQ-45 and the VU Medical Center for providing patient data on the 4DSQ.

Conflict of Interest:

The authors report no conflict of interest.

Publication Ethics:

The data and narrative interpretations of the data/research appearing in the manuscript have not been presented at a conference or meeting, posted on a listserv, shared on a website, including academic social networks like ResearchGate, and so forth.

Open Science:

We report on a reanalysis of data about which has been published before. We report and refer to previous publications on how we determined our sample size, all data exclusions, all data inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all measures in the study, and all analyses including all tested models. If we use inferential tests, we report exact p values, effect sizes, and 95% confidence or credible intervals.

Open Data: The information needed to reproduce all of the reported results are not openly accessible, but can be requested from the first author.

Open Materials: The information needed to reproduce all of the reported methodology is made available. We provided sufficient information for an independent researcher to reproduce the reported results, including a codebook (https://www.psycharchives.org/en/item/86e598e9-4828-4127-86ae-5f0d18e9586a; de Beurs, Oudejans, et al., 2022). Moreover, we have uploaded an annotated version of our R-code with two practice datasets, which will allow other researchers to apply it to these data (and their own data).

Preregistration of Studies and Analysis Plans: This study was not preregistered.

PDF download

Verify Phone

Congrats!

A Common Measurement Scale for Self-Report Instruments in Mental Health Care

T Scores With a Normal Distribution

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

A Common Measurement Scale for Self-Report Instruments in Mental Health Care

T Scores With a Normal Distribution

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners