Identifying Bias Without Sensitive Attribute Data: Techniques for Inferring Protected Characteristics

Published

March 4, 2020

Last Edited

July 15, 2025

Marissa Gerchick

Former

Fiddler AI

To evaluate whether decisions in lending, health care, hiring and beyond are made equitably across race or gender groups, organizations must know which individuals belong to each race and gender group. However, as we explored in our last post, the sensitive attribute data needed to conduct analyses of bias and fairness may not always be available [1]. To address this problem, numerous techniques have emerged for inferring individuals’ protected characteristics from available data. Learn some techniques for inferring protected characteristics and identifying bias.

Bayesian Improved Surname Geocoding

One example of such a technique is Bayesian Improved Surname Geocoding (BISG), a methodology that uses last names and geographical information to generate race probability estimates [2],[3]. Using a form of Bayes’ Theorem, BISG computes the probability a person belongs to each race group (e.g., White, Black or African American, Asian, etc.) based on demographic information associated with their last name and then updates this probability using demographic information associated with the census block group they live in [3]. The combined approach taken by BISG has been shown to work better than approaches that rely on just last name or geographic information [2],[3],[4].

Intended and actual use of BISG

BISG was developed to examine racial and ethnic disparities in the domain of health care; importantly, it’s developer has noted that BISG was developed not to estimate the race of particular individuals, but rather to look at the possibility of larger group disparities [5]. Nonetheless, BISG was prominently used by the Consumer Financial Protection Bureau (CFPB) in a 2013 lawsuit against Ally Financial [6]. In connection with the Department of Justice’s finding that Ally Financial had overcharged hundreds of thousands of minority customers on auto loans, and lacking data on the race of individual borrowers, the CFPB used BISG to identify customers who were likely to be members of minority racial groups [1].

Although BISG is a prominent technique considered among the best known for inferring race and ethnicity in the absence of sensitive attribute data [1], its accuracy has been questioned, with some researchers highlighting the possibility that it can overestimate racial disparities [4],[7],[8],[9]. In our next few posts, we will analyze a methodology similar to BISG in the context of mortgage lending using a methodology similar to that of Chen et al. (2019). Under the Home Mortgage Disclosure Act, the CFPB collects and publishes data annually on mortgage applicants in the U.S., including their race and the area they live in. By matching that geographical information with data from the U.S. Census Bureau, we can infer the race of each applicant and analyze the performance of such a technique.

References:
[1]: Bogen, M., Rieke, A., & Ahmed, S. (2020, January). Awareness in practice: tensions in access to sensitive attribute data for antidiscrimination. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 492-500).
[2]: Elliott, M. N., Fremont, A., Morrison, P. A., Pantoja, P., & Lurie, N. (2008). A new method for estimating race/ethnicity and associated disparities where administrative records lack self‐reported race/ethnicity. Health services research, 43(5p1), 1722-1736.
[3]: Elliott, M. N., Morrison, P. A., Fremont, A., McCaffrey, D. F., Pantoja, P., & Lurie, N. (2009). Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Services and Outcomes Research Methodology, 9(2), 69.
[4]: Bureau, C. F. P. (2014). Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment. Washington, DC: CFPB, Summer.
[5]: Koren, J. R. (2016). 08. Feds use Rand formula to spot discrimination. The GOP calls it junk science. Los Angeles Times, 8.
[6]: Andriotis, A., & Ensign, R. L. (2015). US Government Uses Race Test for $80 million in Payments. Wall Street Journal, October, 29.
[7]: Baines, A. P., & Courchane, M. J. (2014). Fair lending: Implications for the indirect auto finance market. study prepared for the American Financial Services Association.
[8]: Zhang, Y. (2018). Assessing Fair Lending Risks Using Race/Ethnicity Proxies. Management Science, 64(1), 178-197.
[9]: Chen, J., Kallus, N., Mao, X., Svacha, G., & Udell, M. (2019, January). Fairness under unawareness: Assessing disparity when protected class is unobserved. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 339-348).

Identifying Bias When Sensitive Attribute Data is Unavailable: Techniques for Inferring Protected Characteristics

Bayesian Improved Surname Geocoding

Intended and actual use of BISG