ADI compliance consultants innovate to overcome the challenge of missing ethnicity and race data in a Fair Lending risk assessment.

Assessing the Challenge

Through the course of a Fair Lending risk analysis, ADI identified an issue with the client’s data that potentially affected the calculation of regression models to assess risk. ADI’s client, a high volume mortgage lender with a national footprint, reported missing ethnicity and race data at substantially higher rates than the average mortgage lender, based on public HMDA data. A deeper look into the issue identified a handful of high volume loan officers who resisted the effort to collect ethnicity and race data. The high frequency of missing data, coupled with its lack of randomness in the population, presented challenges that could affect the regression modeling and resulting conclusions regarding the analysis of Fair Lending risk.

Around the time of this project, the Consumer Financial Protection Bureau (CFPB) released a white paper on its methodology for assigning ethnicity and race to applicants in the third-party auto lending market. Since these lenders were not required to collect data, the CFPB defined an approach using surname and geographic probabilities, Bayesian Improved Surname Geocoding (BISG), to estimate the probability that each applicant belonged to a particular ethnicity and race group. These probabilities are used to assign an ethnicity and race proxy as a substitute for missing data in assessing Fair Lending risk.

Designing the Approach

To overcome the challenge of missing ethnicity and race information in mortgage data, ADI consultants collaborated to research and test the CFPB’s methodology to supplement the analysis of applications with missing information. While internal and industry analysis identified concerns with the methodology’s accuracy of predicting and assigning ethnicity and race proxies, ADI’s compliance consultants proceeded to test the method in the context of Fair Lending regression modeling. Using ADI Data ConnectSM, ADI’s compliance consultants designed a process that would:

  • Import the requisite Census surname database into ADI Data ConnectSM;
  • Calculate the ethnicity and race probabilities down to the census block using the 2010 Census Data available in ADI Data ConnectSM;
  • Process the client’s surname data to handle hyphenated surnames, suffixes, etc.;
  • Append the Census surname and geographic ethnicity and race probabilities to the client’s application data;
  • Calculate the BISG probabilities for each application based on the surname and geographic probabilities; and
  • Test the BISG probabilities at varying thresholds in the Fair Lending regression models.

Implementing the Solution

ADI’s team executed the plan and conducted extensive testing on the population of applications with missing data using varying thresholds to assign BISG-derived ethnicity and race proxies. The results of the testing provided a greater level of confidence that the high frequency of missing ethnicity and race data did not adversely affect the assessment of Fair Lending risk. In addition, the client could display in future examinations its proactive focus on the issue of missing ethnicity and race information and its attempt to account for potential problems that may arise from it.

The solution designed and implemented in this case has become the blueprint applied to later Fair Lending analysis projects. Using ADI Data ConnectSM and industry-leading statistical software, ADI consultants can efficiently append BISG probabilities in a matter of minutes and begin prompt testing of the effects of missing information in assessing Fair Lending risk.


About the Author: Jonathon Neil

Jonathon is a Senior Consultant for ADI with expertise in Fair Lending compliance, CRA compliance, data mining, and geographic information systems. You can contact Jonathon at or 703.665.3707.

April 9th, 2015|