LabFam seminar series: Integration of Household Survey Data through Statistical matching: where we stand
Speaker: Marcello D’Orazio, Italian National Institute of Statistics (Istat)
Statistical matching (aka data fusion or synthetic matching) denotes a wide set of statistical techniques aimed at exploiting data collected in independent (sample) surveys referred to the same target population; the purpose is that of investigating the relationship between variables not jointly observed in the same survey. These techniques date back to the ‘60s, but they became very popular around 2000 (Rässler’s monograph published in 2002; D’Orazio at al monograph in 2006); they represent a way to respond to increasing user’s demand of new statistical outputs but avoiding the negative implications of enlarging questionnaires of exiting surveys (to collect a wider set of data) in terms of response burden and accuracy of collected data. Most of the statistical techniques proposed for data fusion purposes are adaptions of methods developed to deal with missing values in surveys; they include both parametric and nonparametric methods that can serve for estimating a target parameter (correlation or regression coefficient) or just creating a “synthetic” data source at microdata level.
Despite the efforts, many of the applications of data fusion techniques to integrate independent surveys not designed to be integrated a posteriori turned out unsuccessful; mainly because of the many unmet underlying assumptions and constraints. In National Statistics Offices the lesson learnt led to redesign some of the household surveys having in mind also the integration purposes.
The webinar will give an overview of the most popular statistical matching techniques and the underlying assumptions, that play a key role in their application. The webinar will also highlight the critical points in designing a statistical matching application.
About the speaker: Marcello D’Orazio is a senior researcher in Statistics Methodology at the Italian National Institute of Statistics (Istat). Research topics include sampling, imputation of missing values, statistical matching (data fusion and, more in general, data integration), statistical learning techniques.