Does Regression Produce Representative Estimates of Causal Effects?
It is well-known that, with an unrepresentative sample, the estimate of a causal effect may fail to characterize how effects operate in the population of interest. What is less well under- stood is that conventional estimation practices for observational studies may produce the same problem even with a representative sample. Specifically, causal effects estimated via multiple regression differentially weight each unit’s contribution. The “effective sample” that regres- sion uses to generate the causal effect estimate may bear little resemblance to the population of interest. The effects that multiple regression estimate may be nonrepresentative in a similar manner as are effects produced via quasi-experimental methods such as instrumental variables, matching, or regression discontinuity designs, implying there is no representativeness basis for preferring multiple regression on representative samples over quasi-experimental methods. We show how to estimate the implied “multiple regression weights” for each unit, thus allowing re- searchers to visualize the characteristics of the effective sample. Knowing the effective sample is crucial, because it allows one to relate effect estimates to sample characteristics. We then discuss alternative approaches that, under certain conditions, recover representative average causal effects. The requisite conditions cannot always be met.