Measurement and Representation Biases in Digital Trace Data-based Studies (ChatGPT's Version)

Note: This webpage is a result of me trying to get ChatGPT to do clerical jobs for me, i.e., embed links to a predefined list of papers for my reading seminar. It failed. And so I hope it goes without saying that this is not me endorsing ChatGPT’s application for this task. Here’s the ‘hand-curated version’: https://indiiigo.github.io/mrb/

Construct definition

Ruths, Derek, and Jürgen Pfeffer. “Social media for large studies of behavior.” Science 346.6213 (2014): 1063-1064.
Blodgett, Su Lin, et al. “Language (Technology) is Power: A Critical Survey of ‘Bias’ in NLP.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.
Wagner, Claudia, et al. “Measuring algorithmically infused societies.” Nature 595.7866 (2021): 197-204.

Platform Effects

Malik, Momin, and Jürgen Pfeffer. “Identifying platform effects in social media data.” Proceedings of the International AAAI Conference on Web and Social Media. Vol. 10. No. 1. 2016.
Gligorić, Kristina, Ashton Anderson, and Robert West. “How constraints affect content: The case of Twitter’s switch from 140 to 280 characters.” Proceedings of the International AAAI Conference on Web and Social Media. Vol. 12. No. 1. 2018.
Arazy, Ofer, et al. “Information quality in Wikipedia: The effects of group composition and task conflict.” Journal of management information systems 27.4 (2011): 71-98.

Data Collection

Zafar, Muhammad Bilal, et al. “Sampling content from online social networks: Comparing random vs. expert sampling of the twitter stream.” ACM Transactions on the Web (TWEB) 9.3 (2015): 1-33.
Gaffney, Devin, and J. Nathan Matias. “Caveat emptor, computational social science: Large-scale missing data in a widely-published Reddit corpus.” PloS one 13.7 (2018): e0200162.
Pfeffer, Juergen, et al. “This Sample seems to be good enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API.” Proceedings of the International AAAI Conference on Web and Social Media. Vol. 17. 2023.

Data Preprocessing and Modeling

Zagheni, Emilio, and Ingmar Weber. “Demographic research with non-representative internet data.” International Journal of Manpower 36.1 (2015): 13-25.
Culotta, Aron. “Reducing sampling bias in social media data for county health inference.” Joint Statistical Meetings Proceedings. Citeseer, 2014.
Jurgens, David, et al. “Geolocation prediction in twitter using social networks: A critical analysis and review of current practice.” Proceedings of the international AAAI conference on web and social media. Vol. 9. No. 1. 2015.
Cohen, Raviv, and Derek Ruths. “Classifying political orientation on Twitter: It’s not easy!.” Proceedings of the International AAAI Conference on Web and Social Media. Vol. 7. No. 1. 2013.
Fleisig, Eve, Rediet Abebe, and Dan Klein. “When the majority is wrong: Leveraging annotator disagreement for subjective tasks.” arXiv preprint arXiv:2305.06626 (2023).
Lucy, Li, and David Bamman. “Gender and representation bias in GPT-3 generated stories.” Proceedings of the third workshop on narrative understanding. 2021.