Keynote talks

Keynote 1 (Mon)

Isabelle Augenstein

University of Copenhagen

Accountable and Robust Automatic Fact Checking [slides]

Abstract: 

The past decade has seen a substantial rise in the amount of mis- and disinformation online, from targeted disinformation campaigns to influence politics, to the unintentional spreading of misinformation about public health. This development has spurred research in the area of automatic fact checking, a knowledge-intensive and complex reasoning task. Most existing fact checking models predict a claim’s veracity with black-box models, which often lack explanations of the reasons behind their predictions and contain hidden vulnerabilities. The lack of transparency in fact checking systems and ML models, in general, has been exacerbated by increased model size and by “the right…to obtain an explanation of the decision reached” enshrined in European law. This talk presents some first solutions to generating explanations for fact checking models. It then examines how to assess the generated explanations using diagnostic properties, and how further optimising for these diagnostic properties can improve the quality of the generating explanations. Finally, the talk examines how to systemically reveal vulnerabilities of black-box fact checking models.

Isabelle Augenstein is an Associate Professor at the University of Copenhagen, Department of Computer Science, where she heads the Copenhagen Natural Language Understanding research group as well as the Natural Language Processing section. Her main research interests are fact checking, low-resource learning, and explainability. Prior to starting a faculty position, she was a postdoctoral researcher at University College London, and before that a PhD student at the University of Sheffield. She currently holds a DFF Sapere Aude Research Leader fellowship on ‘Learning to Explain Attitudes on Social Media’, and is a member of the Young Royal Danish Academy of Sciences and Letters.

Website

Keynote 2 (Tue)

Peter A. Flach

University of Bristol

Empirical evaluation of predictive models: A matter of scales and means [slides]

Abstract:

Evaluation of predictive models is of primary importance in machine learning and information retrieval. Empirical evaluation is an act of measurement, and as such very common but perhaps not as straightforward as one might expect. For example, it is almost always the case that there is a discrepancy between what is of interest (e.g., mathematical ability of a student, or population performance of a model) and what is directly measurable (performance of the student or model on a specific set of questions or labelled data points, with many contextual aspects influencing performance). One would also often like to have a causal account of what is observed: e.g., an explanation why one student or algorithm outperforms another, possibly in counterfactual form (if the test were manipulated in a particular way, the performance difference would disappear). 

Even more fundamental issues arise when one considers measurement scales and how to combine different quantities into aggregate measurements. We are all familiar with this: e.g., classification accuracy can be seen as a weighted arithmetic mean of true positive and negative rates, with class prevalences as weights; F1-score is commonly defined as the harmonic mean of precision and recall; and some people prefer the geometric mean to aggregate precision and recall. Choosing a different mean amounts to a change of scale (e.g., the log of the geometric mean is the arithmetic mean of the logs) and as such admissible, even if the change of scale requires justification. But mixing different means (and hence scales) can easily lead to incoherence: taking expectations involves the arithmetic mean, which implies for example that the area under the precision-recall curve bears no direct relationship to an aggregate F1-measure (while the area under the ROC curve does relate to aggregate accuracy by traversing the operating points on the curve in a particular way). 
In previous work (Flach & Kull, NIPS 2015) we have discussed this in more detail and proposed Precision-Recall-Gain curves which are almost entirely “ROC-like”. In particular, the area under the PRG curve can be interpreted as an aggregate F1-score. In this talk we will revisit this work, and discuss new results that relate AUPRG to a weighted ranking score, thereby providing a well-founded alternative to measures such as normalised discounted cumulative gain.

Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading scholar in the evaluation and improvement of machine learning models using ROC analysis and calibration, he has also published on mining highly structured data, and on the methodology of data science. His books include Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012).

From 2010 until 2020, Prof Flach was Editor-in-Chief of the Machine Learning journal, one of the two top journals in the field that has been published for over 25 years by Kluwer and now Springer. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol. He is a founding board member and current President of the European Association for Data Science. He is a Fellow of the Alan Turing Institute for Data Science and Artificial Intelligence.

Website

Keynote 3 (Wed)

Ivan Vulić

University of Cambridge & PolyAI
2021 Karen Spärck Jones Award winner

Towards Language Technology for a Truly Multilingual World? [slides]

Abstract:

Language technology tools such as Google Translate or virtual assistants (Siri, Alexa) were components of collective SciFi-inspired imagination not many years ago. Today, they are an essential driver of the digital AI transformation, used by hundreds of millions of people. A key challenge in multilingual NLP and IR is developing general language-independent architectures that will be equally applicable to any language. However, this ambition is hindered by the large variation in 1) structural and semantic properties of the world’s languages, as well as 2) raw and task data scarcity for many different languages, tasks, and application domains. As a consequence, existing language technology is still largely limited to a handful of resource-rich languages, leaving the vast majority of the world’s 7,000+ languages and their speakers behind, thus amplifying the problem of the “digital language divide”. In this talk, I will introduce and discuss the importance of addressing multilingualism and bringing language technology also to minor and low-resource languages and communities. I will introduce a range of recent techniques, breakthroughs and lessons learned that aim to deal with such large cross-language variations and low-data learning regimes. I will also demonstrate that low-resource languages, despite very positive research trends and results achieved in recent years, still lag behind major languages in terms of performance, resources, overall representation in NLP/IR research and other key aspects, and will outline several crucial challenges for future research in this area.

Ivan Vulić is a Senior Research Associate in the Language Technology Lab, University of Cambridge and a Senior Scientist at PolyAI. He holds a PhD in Computer Science from KU Leuven awarded summa cum laude. His core expertise is in representation learning, cross-lingual learning, human language understanding, distributional, lexical, multi-modal, and knowledge-enhanced semantics in monolingual and multilingual contexts, transfer learning for enabling cross-lingual NLP applications such as conversational AI in low-resource languages, and machine learning for (cross-lingual) NLP. He has published more than 100 papers at top-tier NLP and IR conferences and journals, and won several best paper awards (most recently at EACL 2021). Ivan serves as an area chair and regularly reviews for all major NLP and Machine Learning conferences and journals. He has given numerous invited talks and tutorials at academia and industry.

Website