In this full-day and fully virtual workshop, we adopt a collaborative perspective in developing these guidelines further. Drawing on principles of participatory design, we hope to include several stakeholders involved through collaborative brainstorming and the design of research documentation. We envision that the end goal of this workshop entails shared documentation that establishes best practices for research with web and social media data.
This workshop is open to all interested in research with web and social media data. Please check back soon for registration details.
Web and Social Media data is of increasing interest to several scientists since it can be used to study the attitudes, behaviours and characterics of people and society. The large scale of available data,combined with increasingly sophisticated and powerful computational tools for analysing it, have made several research avenues possible. Notwithstanding the many potentials of this burgeoning research paradigm, there are also several pitfalls due to data sampling, platform affordances, conceptual confusion in the definition of constructs to be studied, and more. While not all of these limitations can be mitigated, they can be documented to provide an understanding of the limits of a particular study, make it more transparent, and also spread awareness in the research community. To that end, several guidelines and error reporting frameworks have been developed.
Current Guidelines usually include a set of quality criteria, often linked to steps in the research pipeline, that researchers should note and document as the lack of fulfilment of these criteria can lead to systemic and/or random errors. They often target specific parts of the research pipeline such as Data sheets (Gebru et al. 2018), Data statements (Bender et al. 2018) for documenting datasets and Model cards (Mitchell et al. 2019) for (Machine Learning) Model development and deployment. Other guidelines are inspired by survey methodology error frameworks or quality frameworks such as the Total Twitter Error Framework (TTE, Hsieh and Murphy, 2017), the Total Error Framework for Big Data (TEF, Amaya el al, 2020), and the Total Error Framework for Digital Traces of Online Behavior (TED-On, Sen et al, 2021). While these guidelines are crucial for increasing the transparency of studies working with web and social media data, current approaches for guideline development are top-down and prescriptive. Furthermore, several of them were designed for Machine Learning or Natural Language Processing practitioners, which is an important subset of the web and social media research community but still misses important input and insights from social scientists.
Participatory Design is an approach to design attempting to actively involve all stakeholders in the design process to help ensure the result meets their needs and is usable. Participatory design of guidelines and checklists is widely used in various domains such as the medical sciences and HCI and previous research has shown that checklist use increased when stakeholders were included in checklist design and implementation. In this workshop, we hope to provide an in-depth and detailed overview of existing guidelines and best practices for research with web and social media data, by demonstrating how the guidelines can be applied to CSS case studies. Second, we invite participants to apply the guidelines for specific vignettes or their own research studies in collaborative group activities, develop specification sheets for their study using existing guidelines, and develop an understanding of which perspectives are missing from existing guidelines.
Envisioned as the first of a series of workshops, with this workshop we want to advance a long-term conversation to include different voices in the conversations to shape future guidelines and best practice recommendations in the field of Computational Social Science research.
|Session||Type||Time (tentatively Atlanta Time)|
|Opening and Introductions||Plenary||10:00-10:15|
|Introduction to Existing Guidelines for Research
with Web and Social Media Data
|Brainstorming Research Designs||Breakout Rooms + summary in plenary room||11:10-12:30|
|Interactive Discussion of Case Studies that use Web and
Social Media Data through the Lens of Various Guidelines
|Applying Guidelines for Documenting Limitations
of Research Designs
|Breakout Rooms + summary in plenary room||15:00-16:30|
|Final Group Discussion, Lessons Learned and Closing||Plenary||16:30-17:00|
Indira Sen is a doctoral candidate in Computational Social Science at GESIS, Leibniz Institute for the Social Sciences in Cologne, Germany. She is interested in understanding biases in inferential studies from digital traces, with a focus on natural language processing.
Dr. Fabian Flöck is a post-doctoral researcher at the Computational Social Science department at GESIS and team leader of the ‘Data Science’ team. He is interested in open and transparent data science, natural language processing, human computation, and collaborative production processes.
Dr. Katrin Weller is an information scientist working at the Computational Social Science department at GESIS and team leader of “Social Analytics and Services”. Her research focus is on social media, new types of research data and data preservation, scholarly communication & altmetrics, web users and communication structures.