Measurement and Representation Biases in Digital Trace Data-based Studies

This reading-based seminar will cover the latest research on using digital trace data from web and social media platforms like Facebook, Instagram, Wikipedia and others to measure social phenomena such as political attitudes and health behaviours. It will be centred around readings and discussions to understand how representation and measurement errors can creep into research studies using this type of data in conjunction with large-scale computational and data-driven models. The course will also cover methods to quantify and mitigate these errors and demonstrate how to design valid and reliable research studies.

The course assessment will be based on a presentation and final report of a chosen reading (60%), on weekly critiques of other papers (40%), and a bonus of 10% for implementing a part of one or more of the discussed papers.

Schedule and Assigned Readings

April 10th Introduction and kickoff

April 24th How to read and review a research paper AND overview of research with digital traces

Keshav, Srinivasan. “How to read a paper.” ACM SIGCOMM Computer Communication Review 37.3 (2007): 83-84.
Pain, Elisabeth “How to review a paper”

May 8th + 15th Social data biases

Olteanu, Alexandra, et al. “Social data: Biases, methodological pitfalls, and ethical boundaries.” Frontiers in big data 2 (2019)
Slides from a related tutorial by Olteanu and colleagues

May 22nd Measurement and Representation Errors

Groves, Robert M., and Lars Lyberg. “Total survey error: Past, present, and future.” Public opinion quarterly 74.5 (2010)
Sen, Indira, et al. “A total error framework for digital traces of human behavior on online platforms.” Public Opinion Quarterly 85.S1 (2021)

Jun 5 Guest Presentation by Max Pellert

Pellert, Max, et al. “Validating daily social media macroscopes of emotions.” Scientific reports 12.1 (2022): 11236.

Jun 12 Guest Presentation by Manuel Tonneou

Tonneau, Manuel, et al. “From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets.” Workshop on Online Harms (WOAH), colocated with North American Association of Computational Lingusitics (NAACL) (2024).

Jun 19 Presentation by Leonard Tiedemann

Jaidka, Kokil, Alvin Zhou, and Yphtach Lelkes. “Brevity is the soul of Twitter: The constraint affordance and political discussion.” Journal of Communication 69.4 (2019): 345-372.

Jun 26 Guest presentation by Nils Feldhus

Feldhus, Nils, et al. “InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations.” Findings of the Association for Computational Linguistics: EMNLP 2023.
Schmitt, Vera, et al. “The Role of Explainability in Collaborative Human-AI Disinformation Detection.” The 2024 ACM Conference on Fairness, Accountability, and Transparency. 2024.

Jul 3 Guest presentation by Katrin Weller

Breuer, Johannes, Katrin Weller, and Katharina Kinder-Kurlanda. 2023. The Role of Participants in Online Privacy Research: Ethical and Practical Considerations. In The Routledge Handbook of Privacy and Social Media, edited by Sabine Trepte, and Philipp K. Masur, 314-323. Routledge.
Kinder-Kurlanda, Katharina E., and Katrin Weller. 2020. “Perspective: Acknowledging data work in the social media research lifecycle.” Frontiers in Big Data 3 (509954).

Jul 10 Guest presentation by Giordano de Marzo

De Marzo, Giordano, Luciano Pietronero, and David Garcia. “Emergence of scale-free networks in social interactions among large language models.” arXiv preprint arXiv:2312.06619 (2023).

Background on LLMs and Social Simulations:

Riedl, Mark. A Very Gentle Introduction to Large Language Models without the Hype
Park et al. Social Simulacra: Creating Populated Prototypes for Social Computing Systems

Jul 17 Presentation by Theresa Wagner

Lucy, Li, and David Bamman. “Gender and representation bias in GPT-3 generated stories.” Proceedings of the third workshop on narrative understanding. 2021.

Jul 24 Guest presentation by Jessica Daikeler

Daikeler, Jessica, et al. “Assessing data quality in the age of digital social research: A systematic review.” Social Science Computer Review (2024): 08944393241245395.

Jul 31 Presentation by Peer Saleth

Lasser, Jana, et al. “From alternative conceptions of honesty to alternative facts in communications by US politicians.” Nature human behaviour 7.12 (2023): 2140-2151.

Aug 7 Wrap-up

Aug 15 Final reports due

Measurement and Representation Biases in Digital Trace Data-based Studies

Schedule and Assigned Readings

Suggested Readings

Construct definition

Platform Effects

Data Collection

Data Preprocessing and Modeling