Compliance How-To15 min readFebruary 23, 2026

How to Conduct an AI Bias Audit: Step-by-Step Guide

Bias audits are now legally required in multiple jurisdictions—and they're more complex than most employers expect. Here's how to do it right.

DB
Devyn Bartell
Founder & CEO, EmployArmor
Published February 23, 2026

If your company uses AI in hiring, you've likely heard that "bias audits" are required. New York City, California, and Colorado all mandate some form of bias testing for AI hiring tools. But what does a bias audit actually entail? What data do you need? What methodologies are acceptable? How do you interpret the results? And critically—what do you do if the audit reveals discrimination?

This guide walks through the complete bias audit process from initial scoping to publication of results, with practical examples, statistical explanations (in plain English), and decision frameworks for what to do with your findings.

Who This Guide Is For:

  • ✓ HR/Talent leaders responsible for AI hiring compliance
  • ✓ Legal/compliance teams evaluating vendor tools
  • ✓ In-house analysts tasked with conducting audits
  • ✓ Anyone trying to understand what bias audits cost and deliver

What Is a Bias Audit? (Legal Definition)

A bias audit is a statistical analysis that evaluates whether an AI hiring tool produces disparate impact—meaning it disproportionately screens out candidates from protected classes (race, ethnicity, sex, age, disability).

The legal framework comes from two sources:

  • Federal precedent: The "four-fifths rule" from the Uniform Guidelines on Employee Selection Procedures (1978), which the EEOC uses to evaluate employment tests
  • State/local laws: Specific requirements in NYC Local Law 144, California AB 2930, and Colorado's AI Act that mandate bias testing for AI tools

Most laws require analyzing selection rates by race/ethnicity and sexat minimum. Some jurisdictions are expanding to include age, disability status, and intersectional categories (e.g., Black women as a distinct group).

Step 1: Scope the Audit (What Tool, What Data, What Period)

Define the Tool Being Audited

Be precise about what you're testing:

  • Tool name and version: "HireVue Video Interview Platform v8.2"
  • What it evaluates: "Analyzes candidate speech patterns, word choice, and verbal communication skills"
  • How it's used: "Scores are used to rank candidates for hiring manager review; top 30% advance to in-person interviews"
  • Job categories covered: "Customer service representatives, sales associates"

Important: If you use the same AI tool across multiple job families with different selection criteria, you may need separate audits for each.

Determine the Audit Period

NYC requires audits based on data from the 12 months preceding the audit. California and Colorado have similar annual windows.

Example: For an audit conducted in February 2026, you'd analyze candidate data from March 2025 - February 2026.

Minimum sample size: NYC requires at least 500 candidates evaluated in the relevant period for robust statistical analysis. If you have fewer, you may need to expand the time window or combine multiple job categories (with caution—combining dissimilar roles can skew results).

Identify Required Demographic Data

You need candidate demographic data to perform the analysis. Required categories:

  • Race/Ethnicity: Typically using EEOC categories (Hispanic/Latino, White, Black/African American, Asian, American Indian/Alaska Native, Native Hawaiian/Pacific Islander, Two or More Races)
  • Sex: Male, female, and (increasingly) non-binary options

The demographic data problem: Most employers don't collect race/ethnicity data from applicants (it's optional under EEOC rules). If you lack this data, you have three options:

  1. Prospective data collection: Start collecting demographic data (via voluntary self-identification) and wait 12 months to conduct the audit
  2. Statistical inference: Use name-based or zip-code-based proxies to estimate demographics (controversial and less reliable)
  3. Vendor-supplied data: If your AI vendor has access to demographic distributions from their broader user base, they may be able to provide pooled analysis (check if your jurisdiction allows this)

Step 2: Collect and Prepare Data

Data Elements You Need

For each candidate in your audit period, collect:

  • Unique candidate identifier (anonymize names for privacy)
  • Job title/category applied for
  • Date of application
  • Whether the AI tool was used to evaluate them (yes/no)
  • AI tool output (score, ranking, pass/fail recommendation)
  • Selection outcome (advanced to next round, hired, rejected)
  • Demographic data (race/ethnicity, sex)

Data Cleaning and Validation

Common data quality issues to address:

  • Missing demographic data: Decide whether to exclude those candidates or use imputation (document your methodology)
  • Inconsistent job categories: Normalize job titles into consistent categories
  • Multiple applications: Determine how to handle candidates who applied multiple times (count once? count each application?)
  • Incomplete hiring outcomes: Track candidates through the entire process to determine final selection

Step 3: Calculate Selection Rates

Selection rate = (Number of candidates selected from a group) / (Total number of candidates in that group)

Example Calculation

Let's say you evaluated 1,000 candidates for customer service roles using an AI video interview tool:

Sample Data:

  • White candidates: 400 evaluated → 160 advanced (40% selection rate)
  • Black candidates: 250 evaluated → 50 advanced (20% selection rate)
  • Hispanic candidates: 200 evaluated → 60 advanced (30% selection rate)
  • Asian candidates: 150 evaluated → 75 advanced (50% selection rate)

Sex breakdown:

  • Male candidates: 450 evaluated → 180 advanced (40% selection rate)
  • Female candidates: 550 evaluated → 165 advanced (30% selection rate)

Step 4: Calculate Impact Ratios

Impact ratio compares the selection rate of each demographic group to the group with the highestselection rate.

Impact ratio = (Selection rate of Group A) / (Selection rate of highest-performing group)

Applying the Four-Fifths Rule

The EEOC's "four-fifths rule" (also called the 80% rule) states that disparate impact is indicated when the selection rate for a protected group is less than 80% of the rate for the highest-performing group.

Using our example above:

  • Highest selection rate: Asian candidates at 50%
  • Black candidates: 20% selection rate → 20% / 50% = 0.40 impact ratio (40%)
  • Hispanic candidates: 30% / 50% = 0.60 impact ratio (60%)
  • White candidates: 40% / 50% = 0.80 impact ratio (80%)

Interpretation:

  • ✅ White candidates: 0.80 ratio = passes the four-fifths rule (exactly at threshold)
  • ❌ Hispanic candidates: 0.60 ratio = fails (below 80%)
  • ❌ Black candidates: 0.40 ratio = severe disparate impact

For sex:

  • Highest: Male candidates at 40%
  • Female candidates: 30% / 40% = 0.75 impact ratio (75%)
  • ❌ Fails the four-fifths rule

⚠️ Critical Point

Failing the four-fifths rule doesn't automatically mean the tool is illegal—but it triggers the need for job-relatedness and business necessity analysis. You must demonstrate that the tool is validly predictive of job performance and that no less discriminatory alternative exists.

Step 5: Statistical Significance Testing

Beyond the four-fifths rule, you should test whether observed differences are statistically significant—meaning they're unlikely to have occurred by random chance.

Common Statistical Tests

  • Chi-square test: Tests whether selection rates differ significantly across demographic groups
  • Fisher's exact test: More accurate for small sample sizes
  • Z-test for proportions: Compares two groups' selection rates

What "statistically significant" means:

Typically, a p-value less than 0.05 indicates statistical significance—meaning there's less than a 5% probability the observed difference occurred by chance. If your analysis shows both (1) failure of the four-fifths rule AND (2) statistical significance, you have strong evidence of disparate impact.

Note: Unless you have a statistics background, this is where you likely need an industrial-organizational psychologist or external auditor.

Step 6: Intersectional Analysis (Emerging Requirement)

Increasingly, regulators expect analysis of intersectional categories—combinations of race and sex (e.g., Black women, Hispanic men, Asian women).

Why? A tool might show no overall sex-based impact but could discriminate specifically against women of color while favoring white women. Single-axis analysis misses this.

Example intersectional breakdown:

  • White men: 45% selection rate
  • White women: 38% selection rate
  • Black men: 25% selection rate
  • Black women: 15% selection rate ← Most severe impact
  • Hispanic men: 32% selection rate
  • Hispanic women: 28% selection rate

This analysis reveals that Black women face compounded discrimination—worse outcomes than Black men, white women, or any other group.

Step 7: Document Findings and Prepare Report

Required Report Elements (NYC LL144 Standard)

Your bias audit report must include:

  • Audit date: When the analysis was performed
  • Selection rates: For each race/ethnicity and sex category
  • Impact ratios: For each category compared to the highest-performing group
  • Sample size and composition: How many candidates were analyzed, demographic breakdown
  • Methodology: Statistical tests used, any data limitations or exclusions
  • Independent auditor certification: Statement that the audit was conducted by an independent party

Optional But Recommended

  • Trend analysis: How do current results compare to previous audits?
  • Context and interpretation: Plain-language explanation of what the numbers mean
  • Recommendations: If disparate impact is found, what mitigation steps are proposed?

Step 8: Decide What to Do With the Results

This is the hardest part. If your audit reveals disparate impact, you have several options:

Option 1: Stop Using the Tool

Pros: Eliminates legal risk immediately

Cons: Loses efficiency gains, may disrupt hiring workflows

When to choose: Impact is severe, tool isn't critical to operations, or vendor can't/won't remediate

Option 2: Modify the Tool to Reduce Impact

What this involves:

  • Work with vendor to adjust algorithms, weightings, or features
  • Remove factors that drive disparate impact (e.g., certain speech pattern analyses)
  • Re-audit after modifications to verify impact reduction

Pros: Retains tool functionality while addressing discrimination

Cons: May reduce tool effectiveness, vendor may not cooperate, costly

Option 3: Validate Job-Relatedness and Business Necessity

Legal standard: Under Title VII, a selection tool that produces disparate impact is lawful if it's demonstrably job-related and consistent with business necessity, AND no less discriminatory alternative exists.

What this requires:

  • Criterion validity study: Statistical evidence that the tool predicts actual job performance (requires collecting performance data on hired employees)
  • Content validity analysis: Demonstration that what the tool measures directly relates to essential job functions
  • Alternative analysis: Evidence that you explored other tools/methods with less impact

Cost: Validation studies can cost $50,000-$250,000+

Outcome: Even with validation, you may face legal challenges. Courts are skeptical of AI validation claims.

Option 4: Accept the Risk and Publish

The scenario: You believe the tool is valuable, impact is moderate, and you're prepared to defend it legally.

Risks:

  • Published audit results can be used as evidence in EEOC complaints or lawsuits
  • Regulatory scrutiny and investigations
  • Reputational damage if media picks up the story

When to choose: Rarely advisable without validation study and strong legal counsel support

Step 9: Publish Results (Where Required)

NYC, California, and some other jurisdictions require public disclosure of bias audit results.

Publication Best Practices

  • Create a dedicated transparency page: yourcompany.com/ai-hiring-transparency
  • Link from careers page and job postings
  • Use clear, accessible language (don't just dump statistical tables)
  • Update whenever new audits are completed
  • Include audit date and next scheduled audit

Sample Publication Format

AI Hiring Tool Bias Audit Results

Tool: HireVue Video Interview Platform

Audit Date: January 15, 2026

Audit Period: February 2025 - January 2026

Independent Auditor: [Auditor Name/Firm]

Summary of Findings:

This audit analyzed 1,247 candidates evaluated for customer service positions. Selection rates and impact ratios are presented below.

[Statistical tables]

Full audit report available upon request: [email]

Step 10: Establish Ongoing Monitoring

A single audit is not sufficient. Best practices for ongoing compliance:

  • Annual re-audits: Required by most laws; schedule 12 months from initial audit
  • Quarterly check-ins: Review selection rate data between audits to catch emerging issues early
  • Trigger-based re-audits: If you make material changes to the AI tool (algorithm updates, new features), conduct a new audit before deploying
  • Vendor monitoring: Require vendors to alert you to any changes that could affect bias audit results

Who Should Conduct the Audit?

In-House vs. External Auditor

Legal requirement: Most laws require an "independent" auditor—someone not directly involved in developing or using the tool.

In-house options:

  • Industrial-organizational psychologist on staff
  • HR analytics team member not involved in day-to-day hiring
  • Legal/compliance team with statistical training

External auditor benefits:

  • Stronger independence claim (better defensibility)
  • Expertise in employment testing validation
  • Awareness of evolving regulatory standards
  • Liability protection (auditor assumes some risk)

Finding a Qualified Auditor

Look for professionals with:

  • Ph.D. in industrial-organizational psychology or related field
  • Experience with EEOC Uniform Guidelines validation
  • Prior AI bias audit experience (ask for references)
  • Professional certification (SIOP member, licensed psychologist)
  • Errors & omissions insurance

Cost Expectations

Budget for bias audits varies widely based on complexity:

  • Simple audit (single tool, one job category, 500-1000 candidates): $15,000-$30,000
  • Moderate complexity (multiple job categories, larger sample): $30,000-$75,000
  • Complex audit (multiple tools, many job categories, validation study): $75,000-$250,000+

Ongoing costs: Annual re-audits are typically 30-50% less expensive than initial audits (methodologies and systems are already established).

Common Pitfalls to Avoid

❌ Using Vendor-Supplied Audits Without Verification

Some vendors provide "bias audit reports" based on pooled data across all their clients. These may not satisfy legal requirements, which typically require audits based on your specific applicant pool.

❌ Conducting Audits on Development/Test Data

Audits must use real-world candidate data from your actual hiring process, not simulated or test datasets.

❌ Ignoring Intersectional Analysis

Single-axis analysis (race only, sex only) can mask severe discrimination against intersectional groups. Include it even if not explicitly required yet.

❌ Failing to Document Data Limitations

If you have missing data, small sample sizes, or other limitations, document them transparently. Trying to hide limitations creates legal risk.

❌ Publishing Without Legal Review

Before publishing audit results showing disparate impact, have employment counsel review. The publication itself can trigger legal exposure.

How EmployArmor Simplifies Bias Audits

EmployArmor streamlines the entire bias audit process:

  • Auditor matching: We connect you with qualified, independent auditors based on your tool and industry
  • Data preparation: Automated extraction and formatting of candidate data from your ATS
  • Audit management: Track audit progress, deadlines, and deliverables
  • Results publication: Generate compliant public disclosure pages from audit reports
  • Ongoing monitoring: Quarterly selection rate dashboards to spot issues between annual audits

Simplify Your Bias Audit Process

Get connected with qualified auditors and manage compliance in one platform

Start Your Audit →

Frequently Asked Questions

How often must bias audits be conducted?

Most laws require annual audits. However, you should also re-audit whenever you make material changes to an AI tool (algorithm updates, new features, expanded use cases).

Can we use the same audit for multiple jurisdictions?

Generally yes, if the audit meets the most stringent requirements across all applicable jurisdictions. For example, an audit that satisfies NYC LL144 will typically also satisfy California and Colorado requirements.

What if we don't have 500+ candidates in a 12-month period?

You can expand the time window (e.g., 18-24 months) or combine similar job categories. Document why you made these choices. Note that very small samples reduce statistical power and make it harder to detect discrimination.

Do we need separate audits for each AI tool we use?

Yes. Each distinct AI tool or algorithm requires its own bias audit. Using the same ATS vendor for multiple job categories may require separate audits if the tools function differently.

What if candidates don't provide demographic data?

If response rates are low, you may need to use statistical inference methods or wait longer to build a sufficient sample. Some jurisdictions allow proxy methods (name-based ethnicity prediction), but these are controversial and less reliable.

Related Resources

Disclaimer: This content is for informational purposes only and does not constitute legal advice. Employment laws vary by jurisdiction and change frequently. Consult a qualified employment attorney for guidance specific to your situation. EmployArmor provides compliance tools and resources but is not a law firm.

Ready to get compliant?

Take our free 2-minute assessment to see where you stand.