Debiasing AI predictions for causal inference without fresh ground truth data

Event date: 2025-06-12

Event location:

Welcome to this week’s Learning Machines seminar.

This seminar is a collaboration between RISE and Climate AI Nordics – climateainordics.com.

Title: Debiasing AI predictions for causal inference without fresh ground truth data

Speaker: Markus Pettersson, Chalmers University of Technology

Abstract: Machine learning models trained on Earth observation data, particularly satellite imagery, have recently shown impressive performance in predicting household-level wealth indices, potentially addressing chronic data scarcity in global development research. While these predictions exhibit strong predictive power, they inherently suffer from shrinkage toward the mean, resulting in attenuated estimates of causal treatment effects and thus limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), require additional fresh ground-truth data at the downstream causal inference stage, severely restricting their applicability in data-poor environments.

In this paper, we introduce and rigorously evaluate two novel correction methods—linear calibration correction and Tweedie's correction—that substantially reduce prediction bias without relying on newly collected labeled data. Our methods operate on out-of-sample predictions from pre-trained models, treating these models as black-box functions. Linear calibration corrects bias through a straightforward linear transformation derived from held-out calibration data, while Tweedie's correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from predicted outcomes.

Through analytical exercises and experiments using Demographic and Health Survey (DHS) data, we demonstrate that both proposed methods outperform existing data-free approaches, can achieve significant reductions in attenuation bias and thus providing more accurate, actionable, and policy-relevant estimates. Our approach represents a generalizable, lightweight toolkit that enhances the reliability of causal inference when direct outcome measures are limited or unavailable.

About the speaker: Markus B. Pettersson is a PhD student working at the intersection of machine learning and earth observation, with a focus on large-scale poverty mapping and its applications in development research. His work explores how satellite imagery and data-driven models can be used to estimate socioeconomic conditions in data-scarce regions, and how these maps can support causal analysis in policy and intervention design.

Location: This is an online seminar. Connect using Zoom.

Date: 2025-06-12 15:00

Upcoming seminars:

2025-06-19: Lily Xu, Columbia University
2025-09-11: Georges Le Bellier, Conservatoire National des Arts et Métiers
2025-09-25: Sigrid Passano Hellan, NORCE Norwegian Research Centre
2025-10-23: Nora Gourmelon, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
All seminars are 15:00 CET.

More information and coming seminars: https://ri.se/lm-sem

– The Learning Machines Team