Tutorials

Technology-Assisted Review for High Recall Retrieval 

Eugene Yang (Human Language Technology Center of Excellence, Johns Hopkins University, USA)
Jeremy Pickens (OpenText, USA)
David D. Lewis (Redgrave Data, USA)

Length: Half day (morning)

Human-in-the-loop (HITL) IR workflows are being applied to an increasing range of tasks in the law, medicine, social media, and other areas. These tasks differ from ad hoc retrieval in their focus on high recall, and differ from text categorization in their need for extensive human judgment. These tasks also differ from both in their industrial scale and, often, their use of teams of multiple reviewers. In the research literature, these tasks have been variously referred to as review, moderation, annotation, or high recall retrieval (HRR) tasks. Technologies applied to these tasks have also been referred to by many names, but technology-assisted review (TAR) has emerged as a consensus term, so these tasks are also referred to as TAR tasks. The growth in the deployment of TAR systems, combined with the many open research problems in this area, suggest this is an appropriate time for a TAR tutorial at a major IR conference. Such a tutorial would also serve as background for attendees of the TAR workshop that has been approved for ECIR 2022. Aims and Learning Objectives: This tutorial will introduce students to the key application areas, technologies, and evaluation methods in technology assisted review. After taking the tutorial, attendees will be able to recognize real-world applications appropriate for TAR technology, apply well-known information retrieval and machine learning approaches to TAR problems,  design basic TAR workflows, identify levers for cost minimization in real-world TAR tasks, apply standard TAR evaluation measures, find publications on TAR technology, evaluation methods, HCI issues, ethical implications, and open problems in a range of literatures.

Website: https://tar-tutorial.github.io/

From Fundamentals to Recent Advances: A Tutorial on Keyphrasification

Rui Meng (Salesforce Research, USA)
Debanjan Mahata (Moody’s Analytics, USA)
Florian Boudin (LS2N, Universit´e de Nantes, France)

Length: Half day (morning, ONLINE ONLY)

Keyphrases represent the most important information of text as a list of phrases, which often serves as a surrogate for efficiently summarizing text documents. With the advancement of deep neural networks, recent years have witnessed rapid development in automatic identification of keyphrases. The performance of keyphrase extraction methods has been greatly improved by the progresses made in natural language understanding, and natural language generation techniques enabling models to predict relevant phrases not mentioned in the text. We name the task of summarizing texts with phrases keyphrasification. In this half-day tutorial, we provide a comprehensive overview of keyphrasification as well as hands-on practice with popular models and tools. This tutorial covers important topics ranging from basics of the task to the advanced topics and applications. By the end of the tutorial, participants will have better understanding of 1) classical and state-of-the-art keyphrasification methods, 2) current evaluation practices and their issues, and 3) current trends and future directions in keyphrasification research.

Website: https://keyphrasification.github.io/

Online Advertising Incrementality Testing: Practical Lessons, Paid Search and Emerging Challenges

Joel Barajas (Amazon, USA)
Narayan Bhamidipati (Yahoo Research, USA )
James G. Shanahan (Church and Duncan Group Inc, and UC Berkeley, USA)

Length: Half day (afternoon)

Online advertising has historically been approached as an ad to-user matching problem within sophisticated optimization algorithms. As the research and ad tech industries have progressed, advertisers have increasingly emphasized the causal effect estimation of their ads (incrementality) using controlled experiments (A/B testing). With low lift effects and sparse conversion, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in measurement precision. Similarly, the correct interpretation of results addressing a business goal requires significant data science and experimentation research expertise. This is a practical tutorial in the incrementality testing landscape, including: The business need, Literature solutions and industry practices,  Designs in the development of testing platforms,  The testing cycle, case studies, and recommendations,  Paid search effectiveness in the marketplace,  Emerging privacy challenges for incrementality testing and research solutions We provide first-hand lessons based on the development of such a platform in a major combined DSP and ad network, and after running several tests for up to two months each over recent years. With increasing privacy constraints, we survey literature and current practices. These practices include private set union and differential privacy for conversion modeling, and geo-testing combined with synthetic control techniques.

Website: https://joel-barajas.github.io/ecir2022-incrementality-testing/

Information extraction from social media: A hands-on tutorial on tasks, data, and open source tools

Shubhanshu Mishra (Twitter, Inc., USA)
Rezvaneh (Shadi) Rezapour (Drexel’s College of Computing and Informatics, USA)
Jana Diesner (The iSchool at University of Illinois Urbana-Champaign, USA)

Length: Full-day (ONLINE ONLY)

This hands-on tutorial introduces the participants to working with social media data, which are an example of Digital Social Trace Data (DSTD). The DSTD abstraction allows us to model social media data with rich information associated with social media text, such as authors, topics, and time stamps. We introduce the participants to several Python-based, open-source tools for performing Information Extraction (IE) on social media data. Furthermore, the participants will be familiarized with a catalogue of more than 30 publicly available social media corpora for various IE tasks such as named entity recognition (NER), part of speech (POS) tagging, chunking, super sense tagging, entity linking, sentiment classification, and hate speech identification. We will also show how these approaches can be expanded to word in a multi-lingual setting. Finally, the participants will be introduced to the following applications of extracted information: combining network analysis and text-based signals to rank accounts, and correlation between sentiment and user-level attributes in existing corpora.

Website: https://socialmediaie.github.io/tutorials/ECIR2022/