As a finalist for the Savage Award (Applications) 2022, I was invited to give a talk at the ISBA 2022 World Meeting in Montreal. Since I was unable to travel to Montreal, I have published a pre-recorded video of my talk here.
Title: Statistical Approaches for Entity Resolution under Uncertainty
Abstract: When real-world entities are referenced in data, their identities are often obscured. This presents an obstacle for data cleaning and integration, as references to an entity may be scattered across multiple records or sources, without a means to identify and consolidate them. Entity resolution (ER) seeks to address this problem by linking references to the same entity based on imprecise information. ER has diverse applications: from construction of knowledge bases in the life sciences, to data sharing between government agencies, and integration of data silos in the enterprise. In this talk, I explore statistical approaches for managing uncertainty in ER applications. In the first part of the talk, I present work on improving the scalability and flexibility of Bayesian models for ER, which naturally allow for uncertainty quantification of ER predictions. This work was done in collaboration with the Australian Bureau of Statistics and US Census Bureau. In the second part of the talk, I present work on evaluating ER accuracy via an adaptive importance sampling framework. This allows practitioners to obtain more precise accuracy estimates, with a reduced ground truth label budget.
Session: Savage Award (Applications), June 29 2022, 3:30 pm EDT