Measurement and Representation Biases in Digital Trace Data-based Studies

This reading-based seminar will cover the latest research on using digital trace data from web and social media platforms like Facebook, Instagram, Wikipedia and others to measure social phenomena such as political attitudes and health behaviours. It will be centred around readings and discussions to understand how representation and measurement errors can creep into research studies using this type of data in conjunction with large-scale computational and data-driven models. The course will also cover methods to quantify and mitigate these errors and demonstrate how to design valid and reliable research studies.

The course assessment will be based on a presentation and final report of a chosen reading (60%), on weekly critiques of other papers (40%), and a bonus of 10% for implementing a part of one or more of the discussed papers.

Schedule and Assigned Readings

April 10th Introduction and kickoff

April 24th How to read and review a research paper AND overview of research with digital traces

May 8th + 15th Social data biases

May 22nd Measurement and Representation Errors

Jun 5 Guest Presentation by Max Pellert

Jun 12 Guest Presentation by Manuel Tonneou

Jun 19 Presentation by Leonard Tiedemann

Jun 26 Guest presentation by Nils Feldhus

Jul 3 Guest presentation by Katrin Weller

Jul 10 Guest presentation by Giordano de Marzo

Background on LLMs and Social Simulations:

Jul 17 Presentation by Theresa Wagner

Jul 24 Guest presentation by Jessica Daikeler

Jul 31 Presentation by Peer Saleth

Aug 7 Wrap-up

Aug 15 Final reports due

Suggested Readings

If you can’t access the full text of any of these, email me for a copy.

Construct definition

Platform Effects

Data Collection

Data Preprocessing and Modeling