Debiasing AI predictions for causal inference without fresh ground truth data
Event date: 2025-06-12
Event location:

Welcome to this week’s Learning Machines seminar.
This seminar is a collaboration between RISE and Climate AI Nordics – climateainordics.com.
Title: Debiasing AI predictions for causal inference without fresh ground truth data
Speaker: Markus Pettersson, Chalmers University of Technology
Abstract: Machine learning models trained on Earth observation data, particularly satellite imagery, have recently shown impressive performance in predicting household-level wealth indices, potentially addressing chronic data scarcity in global development research. While these predictions exhibit strong predictive power, they inherently suffer from shrinkage toward the mean, resulting in attenuated estimates of causal treatment effects and thus limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), require additional fresh ground-truth data at the downstream causal inference stage, severely restricting their applicability in data-poor environments.
In this paper, we introduce and rigorously evaluate two novel correction methods—linear calibration correction and Tweedie's correction—that substantially reduce prediction bias without relying on newly collected labeled data. Our methods operate on out-of-sample predictions from pre-trained models, treating these models as black-box functions. Linear calibration corrects bias through a straightforward linear transformation derived from held-out calibration data, while Tweedie's correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from predicted outcomes.
Through analytical exercises and experiments using Demographic and Health Survey (DHS) data, we demonstrate that both proposed methods outperform existing data-free approaches, can achieve significant reductions in attenuation bias and thus providing more accurate, actionable, and policy-relevant estimates. Our approach represents a generalizable, lightweight toolkit that enhances the reliability of causal inference when direct outcome measures are limited or unavailable.
About the speaker: Markus Pettersson is a PhD Student in Data Science and AI at Chalmers University of Technology supervised by Adel Daoud. His research spans machine learning applications in diverse domains, from satellite imagery for poverty estimation to privacy in synthetic data generation. He has co-authored papers presented at leading conferences such as IJCAI and IFL, contributing to advancements in deep learning, remote sensing, and sports analytics. Markus is particularly interested in the intersection of AI and social good, leveraging data-driven methods to address global challenges. His recent work highlights the potential of time series satellite data for understanding socioeconomic conditions at a neighborhood level in Africa.
Location: This is an online seminar. Connect using Zoom.
Date: 2025-06-12 15:00
Upcoming seminars:
- 2025-06-19: Lily Xu, Columbia University
- 2025-09-11: Georges Le Bellier, Conservatoire National des Arts et Métiers
- 2025-09-25: Sigrid Passano Hellan, NORCE Norwegian Research Centre
- 2025-10-23: Nora Gourmelon, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
- All seminars are 15:00 CET.
More information and coming seminars: https://ri.se/lm-sem
– The Learning Machines Team