Reliability of MTurk Data From Masters and Workers
Abstract
Abstract. Previous research has supported the use of Amazon’s Mechanical Turk (MTurk) for online data collection in individual differences research. Although MTurk Masters have reached an elite status because of strong approval ratings on previous tasks (and therefore gain higher payment for their work) no research has empirically examined whether researchers actually obtain higher quality data when they require that their MTurk Workers have Master status. In two different online survey studies (one using a personality test and one using a cognitive abilities test), the psychometric reliability of MTurk data was compared between a sample that required a Master qualification type and a sample that placed no status-level qualification requirement. In both studies, the Master samples failed to outperform the standard samples.
References
2011). Requester best practices guide. Retrieved from http://mturkpublic.s3.amazonaws.com/docs/MTURK_BP.pdf
. (2013). Conducting psychology student research via the Mechanical Turk crowdsourcing service. North American Journal of Psychology, 15, 385–394.
(2003). Sample size requirements for comparing two alpha coefficients. Applied Psychological Measurement, 27, 72–74. https://doi.org/10.1177/0146621602239477
(2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980
(2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13, 149–154. https://doi.org/10.1177/1745691617706516
(2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64. https://doi.org/10.1016/j.intell.2014.01.004
(2017). Crowdsourcing consumer research. Journal of Consumer Research, 44, 196–210. https://doi.org/10.1093/jcr/ucx047
(2016). Survey satisficing inflates reliability and validity measures: A experimental comparison of college and Amazon Mechanical Turk samples. Educational and Psychological Measurement, 76, 912–932. https://doi.org/10.1177/0013164415627349
(2013). Assessing the reliability of the M5-120 on Amazon’s Mechanical Turk. Computers in Human Behavior, 29, 1749–1754. https://doi.org/10.1016/j.chb.2013.02.020
(2012). Participants at your fingertips: Using Amazon’s Mechanical Turk to increase student-faculty collaborative research. Teaching of Psychology, 39, 245–251. https://doi.org/10.1177/0098628312456615
(2017). Reliability and validity of data obtained from alcohol, cannabis, and gambling populations on Amazon’s Mechanical Turk. Psychology of Addictive Behaviors, 31, 85–94. https://doi.org/10.1037/adb0000219
(2017). Data quality from crowdsourced surveys: A mixed method inquiry into perceptions of Amazon’s Mechanical Turk Masters. Applied Psychology: An International Review, 66, 1–28. https://doi.org/10.1111/apps.12124
(2017). Using online, crowdsourcing platforms for data collection in personality disorder research: The example of Amazon’s Mechanical Turk. Personality Disorders: Theory, Research, and Treatment, 8, 26–34. https://doi.org/10.1037/per0000191
(2015). Estimating the reproducibility of psychological science. Science, 349, 943–951. https://doi.org/10.1126/science.aac4716
. (2016). Psychological research in the Internet age: The quality of web-based data. Computers in Human Behavior, 58, 354–360. https://doi.org/10.1016/j.chb.2015.12.049
(2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304–307. https://doi.org/10.1016/j.chb.2014.11.004
(2017). Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77, 184–197. https://doi.org/10.1016/j.chb.2017.08.038
(2016). Coding psychological constructs in text using Mechanical Turk: A reliable, accurate, and efficient alternative. Frontiers in Psychology, 7, 1–9. https://doi.org/10.3389/fpsyg.2016.00741
(2012). Agentic and communal values: Their scope and measurement. Journal of Personality Assessment, 94, 39–52. https://doi.org/10.1080/00223891.2011.627968
(2005). Measurement error and research design. Thousand Oaks, CA: Sage.
(