Examining disparities by race/ethnicity or by sex/gender is essentially disentangling the difference due solely to race/ethnicity or sex/gender from the differences in other dimensions such as age, education, and income. To do so, researchers often control for confounding factors (e.g., education and income) in a regression model, an approach which usually requires a large number of control variables. Although flexible in the sense of capturing every possible disparity, this method has a significant limitation: including many control variables can fail the detection of any disparity because of reduced statistical power.
In this regard, our study will use the most recent development in machine learning—the double/debiased machine learning (ML)—to estimate the disparity in race/ethnicity or sex/gender in the presence of high-dimensional controls. While this double/debiased ML method has been applied to studies focusing on treatment effect analysis, we are arguably the first to apply this method to the study of racial/ethnic (or sex/gender) disparity, on the basis that the effect of a treatment can be reinterpreted as the difference due solely to a single factor.
Specifically, we will use applications and awards for SS(D)I as an example to show how to implement dimension reduction when there are many control variables, to estimate disparities by race/ethnicity and sex/gender. We will use the restricted Health and Retirement Study (HRS) data that are linked to SSA benefits. These data show whether HRS respondents applied for SS(D)I benefits and whether those applications were approved.