How to Identify Hot Topics in Psychology Using Topic Modeling
Abstract
Abstract. Latent topics and trends in psychological publications were examined to identify hotspots in psychology. Topic modeling was contrasted with a classification-based scientometric approach in order to demonstrate the benefits of the former. Specifically, the psychological publication output in the German-speaking countries containing German- and English-language publications from 1980 to 2016 documented in the PSYNDEX database was analyzed. Topic modeling based on latent Dirichlet allocation (LDA) was applied to a corpus of 314,573 publications. Input for topic modeling was the controlled terms of the publications, that is, a standardized vocabulary of keywords in psychology. Based on these controlled terms, 500 topics were determined and trending topics were identified. Hot topics, indicated by the highest increasing trends in this data, were facets of neuropsychology, online therapy, cross-cultural aspects, traumatization, and visual attention. In conclusion, the findings indicate that topics can reveal more detailed insights into research trends than standardized classifications. Possible applications of this method, limitations, and implications for research synthesis are discussed.
References
2012). Topic models: A novel method for modeling couple and family text data. Journal of Family Psychology, 26, 816–827. https://doi.org/10.1037/a0029607
(2015, September 29). A gentle introduction to topic modeling using R [Blog post]. Retrieved from https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/
(2012). Probabilistic topic models. Communications of the ACM, 55, 77–84. https://doi.org/10.1145/2133806.2133826
(2006).
(Dynamic topic models . In W. CohenA. MooreEds., Proceedings of the 23rd International Conference on Machine Learning (pp. 113–120). New York, NY: ACM. https://doi.org/10.1145/1143844.11438592007). A correlated topic model of science. The Annals of Applied Statistics, 1, 17–35. https://doi.org/10.1214/07-AOAS114
(2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993
(2012, March). Visualizing topic models. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (IWSCM). Retrieved from https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/viewFile/4645/5021
(2016). A practical guide to big data research in psychology. Psychological Methods, 21, 458–474. https://doi.org/10.1037/met0000111
(2014).
(Quantifying mental health signals in Twitter . In P. ResnikR. ResnikM. MitchellEds., Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (pp. 51–60). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/W14-32072015). A decade of research in statistics: A topic model approach. Scientometrics, 103, 413–433. https://doi.org/10.1007/s11192-015-1554-1
(2016). “Hotspots in Psychology”: A new format for special issues of the Zeitschrift für Psychologie. Zeitschrift für Psychologie, 224, 141–144. https://doi.org/10.1027/2151-2604/a000249
(2008). Text mining infrastructure in R. Journal of Statistical Software, 25, 1–54. https://doi.org/10.18637/jss.v025.i05
(2015). How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecological Modelling, 309, 60–63. https://doi.org/10.1016/j.ecolmodel.2015.04.015
(2008). Public outreach: A scientific imperative. Journal of Neuroscience, 28, 11743–11745. https://doi.org/10.1523/JNEUROSCI.0005-08.2008
(2014). An interactive topic model of signs. Signs at 40. Retrieved from http://signsat40.signsjournal.org/topic-model
(2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 5228–5235. https://doi.org/10.1073/pnas.0307752101
(2007). Topics in semantic representation. Psychological Review, 114, 211–244. https://doi.org/10.1037/0033-295X.114.2.211
(2011). Topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40, 1–30. https://doi.org/10.18637/jss.v040.i13
(2015). Computational psychotherapy research: Scaling up the evaluation of patient-provider interactions. Psychotherapy, 52, 19–30. https://doi.org/10.1037/a0036841
(2016). A topic modeling based bibliometric exploration of hydropower research. Renewable and Sustainable Energy Reviews, 57, 226–237. https://doi.org/10.1016/j.rser.2015.12.194
(2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21, 493–506. https://doi.org/10.1037/met0000105
(2016). Scientometric trend analyses of publications on the history of psychology: Is psychology becoming an unhistorical science? Scientometrics, 106, 1217–1238. https://doi.org/10.1007/s11192-016-1834-4
(2013).
(Research on emotions in developmental psychology contexts: Hot topics, trends, and neglected research domains . In C. MohiyeddiniM. EysenckS. BauerEds., Handbook of psychology of emotions. Recent theoretical perspectives and novel empirical findings (Vol. 1, pp. 63–79). New York, NY: Nova Science.2012). Measuring author research relatedness: A comparison of word‐based, topic‐based, and author cocitation approaches. Journal of the Association for Information Science and Technology, 63, 1973–1986. https://doi.org/10.1002/asi.22628
(2009).
(Polylingual topic models . In P. KoehnR. MihalceaEds., Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 (pp. 880–889). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/old_anthology/D/D09/D09-1.pdf#page=9182012). Mapping the research on aquaculture. A bibliometric analysis of aquaculture literature. Scientometrics, 90, 983–999. https://doi.org/10.1007/s11192-011-0562-z
(2016). Women are warmer but no less assertive than men: Gender and language on Facebook. PLoS One, 11, e0155885. https://doi.org/10.1371/journal.pone.0155885.t003
(2009).
(Topic modeling of research fields: An interdisciplinary perspective . In R. MitkovG. AngelovaEds., Proceedings of the International Conference RANLP-2009 (pp. 337–342). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://www.anthology.aclweb.org/R/R09/R09-1.pdf#page=3612014).
(Finding scientific topics revisited . In M. CarpitaE. BentariE. QannariEds., Advances in latent variables (pp. 93–100). Cham, Switzerland: Springer International. https://doi.org/10.1007/10104_2014_112017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. [Computer software]. Retrieved from https://www.R-project.org/
. (2013).
(Using topic modeling to improve prediction of neuroticism and depression . In D. YarowskyT. BaldwinA. KorhonenK. LivescuS. BethardEds., Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1348–1353). New York, NY: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/D13-11332004).
(The author-topic model for authors and documents . In M. ChickeringJ. HalpernEds., Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 487–494). Arlington, VA: AUAI Press. Retrieved from https://mimno.infosci.cornell.edu/info6150/readings/398.pdf2016). RStudio: Integrated development for R [Computer software]. Boston, MA: RStudio, Inc. Retrieved from http://www.rstudio.com/
. (2017).
(Understanding text pre-processing for latent Dirichlet allocation . In M. LapataP. BlunsomA. KollerEds., Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 432–436). New York, NY: Association for Computational Linguistics. Retrieved from http://www.cs.cornell.edu/~xanda/winlp2017.pdf2014).
(LDAvis: A method for visualizing and interpreting topics . In J. ChuangS. GreenM. HearstJ. HeerP. KoehnEds., Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces (pp. 63–70). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/W14-31102015). LDAvis: Interactive visualization of topic models. R package version 0.3.2. [Computer software]. Retrieved from https://CRAN.R-project.org/package=LDAvis
(2008).
(Rational analysis as a link between human memory and information retrieval . In N. ChaterM. OaksfordEds., The probabilistic mind: Prospects for a Bayesian cognitive science (pp. 329–350). Oxford, UK: Oxford University Press.2014).
(Understanding the limiting factors of topic modeling via posterior contraction analysis . In E. P. XingEd., 31st International Conference on Machine Learning (ICML 2014) (pp. 190–198). Stroudsburg, PA: International Machine Learning Society. Retrieved from http://proceedings.mlr.press/v32/tang14.pdf2016). Development studies research 1975–2014 in academic journal articles: The end of economics? El Profesional de la Información, 25, 47–58. https://doi.org/10.3145/epi.2016.ene.06
(Tuleya L. G.Eds.. (2007). Thesaurus of psychological index terms (11th ed.). Washington, DC: American Psychological Association.
2015). Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications. Information Processing & Management, 51, 111–147. https://doi.org/10.1016/j.ipm.2014.08.003
(2009).
(Rethinking LDA: Why priors matter . In Y. BengioD. SchuurmansJ. D. LaffertyC. K. I. WilliamsA. CulottaEds., Advances in neural information processing systems 22 (NIPS 2009) (pp. 1973–1981). La Jolla, CA: Neural Information Processing Systems. Retrieved from http://dirichlet.net/pdf/wallach09rethinking.pdf2011).
(Collaborative topic modeling for recommending scientific articles . In C. ApteJ. GhoshP. SmythEds., Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 448–456). New York, NY: ACM. https://doi.org/10.1145/2020408.20204802014). Clustering scientific documents with topic modeling. Scientometrics, 100, 767–786. https://doi.org/10.1007/s11192-014-1321-8
(2006). Identifying biological concepts from a protein-related corpus with a probabilistic topic model. BMC Bioinformatics, 7, 58. https://doi.org/10.1186/1471-2105-7-58
(ZPID–Leibniz-Zentrum für Psychologische Information und Dokumentation. (Eds.). (2016). PSYNDEX terms (10th ed.). Trier, Germany: ZPID. Retrieved from https://www.zpid.de/pub/info/PSYNDEXterms2016.pdf