If your company uses AI in hiring, you've likely heard that "bias audits" are required. New York City, California, and Colorado all mandate some form of bias testing for AI hiring tools. But what does a bias audit actually entail? What data do you need? What methodologies are acceptable? How do you interpret the results? And critically—what do you do if the audit reveals discrimination?
This guide walks through the complete bias audit process from initial scoping to publication of results, with practical examples, statistical explanations (in plain English), and decision frameworks for what to do with your findings.
Who This Guide Is For:
- ✓ HR/Talent leaders responsible for AI hiring compliance
- ✓ Legal/compliance teams evaluating vendor tools
- ✓ In-house analysts tasked with conducting audits
- ✓ Anyone trying to understand what bias audits cost and deliver
What Is a Bias Audit? (Legal Definition)
A bias audit is a statistical analysis that evaluates whether an AI hiring tool produces disparate impact—meaning it disproportionately screens out candidates from protected classes (race, ethnicity, sex, age, disability).
The legal framework comes from two sources:
- Federal precedent: The "four-fifths rule" from the Uniform Guidelines on Employee Selection Procedures (1978), which the EEOC uses to evaluate employment tests
- State/local laws: Specific requirements in NYC Local Law 144, California AB 2930, and Colorado's AI Act that mandate bias testing for AI tools
Most laws require analyzing selection rates by race/ethnicity and sexat minimum. Some jurisdictions are expanding to include age, disability status, and intersectional categories (e.g., Black women as a distinct group).
Step 1: Scope the Audit (What Tool, What Data, What Period)
Define the Tool Being Audited
Be precise about what you're testing:
- Tool name and version: "HireVue Video Interview Platform v8.2"
- What it evaluates: "Analyzes candidate speech patterns, word choice, and verbal communication skills"
- How it's used: "Scores are used to rank candidates for hiring manager review; top 30% advance to in-person interviews"
- Job categories covered: "Customer service representatives, sales associates"
Important: If you use the same AI tool across multiple job families with different selection criteria, you may need separate audits for each.
Determine the Audit Period
NYC requires audits based on data from the 12 months preceding the audit. California and Colorado have similar annual windows.
Example: For an audit conducted in February 2026, you'd analyze candidate data from March 2025 - February 2026.
Minimum sample size: NYC requires at least 500 candidates evaluated in the relevant period for robust statistical analysis. If you have fewer, you may need to expand the time window or combine multiple job categories (with caution—combining dissimilar roles can skew results).
Identify Required Demographic Data
You need candidate demographic data to perform the analysis. Required categories:
- Race/Ethnicity: Typically using EEOC categories (Hispanic/Latino, White, Black/African American, Asian, American Indian/Alaska Native, Native Hawaiian/Pacific Islander, Two or More Races)
- Sex: Male, female, and (increasingly) non-binary options
The demographic data problem: Most employers don't collect race/ethnicity data from applicants (it's optional under EEOC rules). If you lack this data, you have three options:
- Prospective data collection: Start collecting demographic data (via voluntary self-identification) and wait 12 months to conduct the audit
- Statistical inference: Use name-based or zip-code-based proxies to estimate demographics (controversial and less reliable)
- Vendor-supplied data: If your AI vendor has access to demographic distributions from their broader user base, they may be able to provide pooled analysis (check if your jurisdiction allows this)
Step 2: Collect and Prepare Data
Data Elements You Need
For each candidate in your audit period, collect:
- Unique candidate identifier (anonymize names for privacy)
- Job title/category applied for
- Date of application
- Whether the AI tool was used to evaluate them (yes/no)
- AI tool output (score, ranking, pass/fail recommendation)
- Selection outcome (advanced to next round, hired, rejected)
- Demographic data (race/ethnicity, sex)
Data Cleaning and Validation
Common data quality issues to address:
- Missing demographic data: Decide whether to exclude those candidates or use imputation (document your methodology)
- Inconsistent job categories: Normalize job titles into consistent categories
- Multiple applications: Determine how to handle candidates who applied multiple times (count once? count each application?)
- Incomplete hiring outcomes: Track candidates through the entire process to determine final selection
Step 3: Calculate Selection Rates
Selection rate = (Number of candidates selected from a group) / (Total number of candidates in that group)
Example Calculation
Let's say you evaluated 1,000 candidates for customer service roles using an AI video interview tool:
Sample Data:
- White candidates: 400 evaluated → 160 advanced (40% selection rate)
- Black candidates: 250 evaluated → 50 advanced (20% selection rate)
- Hispanic candidates: 200 evaluated → 60 advanced (30% selection rate)
- Asian candidates: 150 evaluated → 75 advanced (50% selection rate)
Sex breakdown:
- Male candidates: 450 evaluated → 180 advanced (40% selection rate)
- Female candidates: 550 evaluated → 165 advanced (30% selection rate)
Step 4: Calculate Impact Ratios
Impact ratio compares the selection rate of each demographic group to the group with the highestselection rate.
Impact ratio = (Selection rate of Group A) / (Selection rate of highest-performing group)
Applying the Four-Fifths Rule
The EEOC's "four-fifths rule" (also called the 80% rule) states that disparate impact is indicated when the selection rate for a protected group is less than 80% of the rate for the highest-performing group.
Using our example above:
- Highest selection rate: Asian candidates at 50%
- Black candidates: 20% selection rate → 20% / 50% = 0.40 impact ratio (40%)
- Hispanic candidates: 30% / 50% = 0.60 impact ratio (60%)
- White candidates: 40% / 50% = 0.80 impact ratio (80%)
Interpretation:
- ✅ White candidates: 0.80 ratio = passes the four-fifths rule (exactly at threshold)
- ❌ Hispanic candidates: 0.60 ratio = fails (below 80%)
- ❌ Black candidates: 0.40 ratio = severe disparate impact
For sex:
- Highest: Male candidates at 40%
- Female candidates: 30% / 40% = 0.75 impact ratio (75%)
- ❌ Fails the four-fifths rule
⚠️ Critical Point
Failing the four-fifths rule doesn't automatically mean the tool is illegal—but it triggers the need for job-relatedness and business necessity analysis. You must demonstrate that the tool is validly predictive of job performance and that no less discriminatory alternative exists.
Step 5: Statistical Significance Testing
Beyond the four-fifths rule, you should test whether observed differences are statistically significant—meaning they're unlikely to have occurred by random chance.
Common Statistical Tests
- Chi-square test: Tests whether selection rates differ significantly across demographic groups
- Fisher's exact test: More accurate for small sample sizes
- Z-test for proportions: Compares two groups' selection rates
What "statistically significant" means:
Typically, a p-value less than 0.05 indicates statistical significance—meaning there's less than a 5% probability the observed difference occurred by chance. If your analysis shows both (1) failure of the four-fifths rule AND (2) statistical significance, you have strong evidence of disparate impact.
Note: Unless you have a statistics background, this is where you likely need an industrial-organizational psychologist or external auditor.
Step 6: Intersectional Analysis (Emerging Requirement)
Increasingly, regulators expect analysis of intersectional categories—combinations of race and sex (e.g., Black women, Hispanic men, Asian women).
Why? A tool might show no overall sex-based impact but could discriminate specifically against women of color while favoring white women. Single-axis analysis misses this.
Example intersectional breakdown:
- White men: 45% selection rate
- White women: 38% selection rate
- Black men: 25% selection rate
- Black women: 15% selection rate ← Most severe impact
- Hispanic men: 32% selection rate
- Hispanic women: 28% selection rate
This analysis reveals that Black women face compounded discrimination—worse outcomes than Black men, white women, or any other group.
Step 7: Document Findings and Prepare Report
Required Report Elements (NYC LL144 Standard)
Your bias audit report must include:
- Audit date: When the analysis was performed
- Selection rates: For each race/ethnicity and sex category
- Impact ratios: For each category compared to the highest-performing group
- Sample size and composition: How many candidates were analyzed, demographic breakdown
- Methodology: Statistical tests used, any data limitations or exclusions
- Independent auditor certification: Statement that the audit was conducted by an independent party
Optional But Recommended
- Trend analysis: How do current results compare to previous audits?
- Context and interpretation: Plain-language explanation of what the numbers mean
- Recommendations: If disparate impact is found, what mitigation steps are proposed?
Step 8: Decide What to Do With the Results
This is the hardest part. If your audit reveals disparate impact, you have several options:
Option 1: Stop Using the Tool
Pros: Eliminates legal risk immediately
Cons: Loses efficiency gains, may disrupt hiring workflows
When to choose: Impact is severe, tool isn't critical to operations, or vendor can't/won't remediate
Option 2: Modify the Tool to Reduce Impact
What this involves:
- Work with vendor to adjust algorithms, weightings, or features
- Remove factors that drive disparate impact (e.g., certain speech pattern analyses)
- Re-audit after modifications to verify impact reduction
Pros: Retains tool functionality while addressing discrimination
Cons: May reduce tool effectiveness, vendor may not cooperate, costly
Option 3: Validate Job-Relatedness and Business Necessity
Legal standard: Under Title VII, a selection tool that produces disparate impact is lawful if it's demonstrably job-related and consistent with business necessity, AND no less discriminatory alternative exists.
What this requires:
- Criterion validity study: Statistical evidence that the tool predicts actual job performance (requires collecting performance data on hired employees)
- Content validity analysis: Demonstration that what the tool measures directly relates to essential job functions
- Alternative analysis: Evidence that you explored other tools/methods with less impact
Cost: Validation studies can cost $50,000-$250,000+
Outcome: Even with validation, you may face legal challenges. Courts are skeptical of AI validation claims.
Option 4: Accept the Risk and Publish
The scenario: You believe the tool is valuable, impact is moderate, and you're prepared to defend it legally.
Risks:
- Published audit results can be used as evidence in EEOC complaints or lawsuits
- Regulatory scrutiny and investigations
- Reputational damage if media picks up the story
When to choose: Rarely advisable without validation study and strong legal counsel support
Step 9: Publish Results (Where Required)
NYC, California, and some other jurisdictions require public disclosure of bias audit results.
Publication Best Practices
- Create a dedicated transparency page: yourcompany.com/ai-hiring-transparency
- Link from careers page and job postings
- Use clear, accessible language (don't just dump statistical tables)
- Update whenever new audits are completed
- Include audit date and next scheduled audit
Sample Publication Format
AI Hiring Tool Bias Audit Results
Tool: HireVue Video Interview Platform
Audit Date: January 15, 2026
Audit Period: February 2025 - January 2026
Independent Auditor: [Auditor Name/Firm]
Summary of Findings:
This audit analyzed 1,247 candidates evaluated for customer service positions. Selection rates and impact ratios are presented below.
[Statistical tables]
Full audit report available upon request: [email]
Step 10: Establish Ongoing Monitoring
A single audit is not sufficient. Best practices for ongoing compliance:
- Annual re-audits: Required by most laws; schedule 12 months from initial audit
- Quarterly check-ins: Review selection rate data between audits to catch emerging issues early
- Trigger-based re-audits: If you make material changes to the AI tool (algorithm updates, new features), conduct a new audit before deploying
- Vendor monitoring: Require vendors to alert you to any changes that could affect bias audit results
Who Should Conduct the Audit?
In-House vs. External Auditor
Legal requirement: Most laws require an "independent" auditor—someone not directly involved in developing or using the tool.
In-house options:
- Industrial-organizational psychologist on staff
- HR analytics team member not involved in day-to-day hiring
- Legal/compliance team with statistical training
External auditor benefits:
- Stronger independence claim (better defensibility)
- Expertise in employment testing validation
- Awareness of evolving regulatory standards
- Liability protection (auditor assumes some risk)
Finding a Qualified Auditor
Look for professionals with:
- Ph.D. in industrial-organizational psychology or related field
- Experience with EEOC Uniform Guidelines validation
- Prior AI bias audit experience (ask for references)
- Professional certification (SIOP member, licensed psychologist)
- Errors & omissions insurance
Cost Expectations
Budget for bias audits varies widely based on complexity:
- Simple audit (single tool, one job category, 500-1000 candidates): $15,000-$30,000
- Moderate complexity (multiple job categories, larger sample): $30,000-$75,000
- Complex audit (multiple tools, many job categories, validation study): $75,000-$250,000+
Ongoing costs: Annual re-audits are typically 30-50% less expensive than initial audits (methodologies and systems are already established).
Common Pitfalls to Avoid
❌ Using Vendor-Supplied Audits Without Verification
Some vendors provide "bias audit reports" based on pooled data across all their clients. These may not satisfy legal requirements, which typically require audits based on your specific applicant pool.
❌ Conducting Audits on Development/Test Data
Audits must use real-world candidate data from your actual hiring process, not simulated or test datasets.
❌ Ignoring Intersectional Analysis
Single-axis analysis (race only, sex only) can mask severe discrimination against intersectional groups. Include it even if not explicitly required yet.
❌ Failing to Document Data Limitations
If you have missing data, small sample sizes, or other limitations, document them transparently. Trying to hide limitations creates legal risk.
❌ Publishing Without Legal Review
Before publishing audit results showing disparate impact, have employment counsel review. The publication itself can trigger legal exposure.
How EmployArmor Simplifies Bias Audits
EmployArmor streamlines the entire bias audit process:
- Auditor matching: We connect you with qualified, independent auditors based on your tool and industry
- Data preparation: Automated extraction and formatting of candidate data from your ATS
- Audit management: Track audit progress, deadlines, and deliverables
- Results publication: Generate compliant public disclosure pages from audit reports
- Ongoing monitoring: Quarterly selection rate dashboards to spot issues between annual audits
Simplify Your Bias Audit Process
Get connected with qualified auditors and manage compliance in one platform
Start Your Audit →Frequently Asked Questions
How often must bias audits be conducted?
Most laws require annual audits. However, you should also re-audit whenever you make material changes to an AI tool (algorithm updates, new features, expanded use cases).
Can we use the same audit for multiple jurisdictions?
Generally yes, if the audit meets the most stringent requirements across all applicable jurisdictions. For example, an audit that satisfies NYC LL144 will typically also satisfy California and Colorado requirements.
What if we don't have 500+ candidates in a 12-month period?
You can expand the time window (e.g., 18-24 months) or combine similar job categories. Document why you made these choices. Note that very small samples reduce statistical power and make it harder to detect discrimination.
Do we need separate audits for each AI tool we use?
Yes. Each distinct AI tool or algorithm requires its own bias audit. Using the same ATS vendor for multiple job categories may require separate audits if the tools function differently.
What if candidates don't provide demographic data?
If response rates are low, you may need to use statistical inference methods or wait longer to build a sufficient sample. Some jurisdictions allow proxy methods (name-based ethnicity prediction), but these are controversial and less reliable.
Related Resources
- Complete AI Hiring Compliance Guide 2026
- Do I Need an AI Bias Audit?
- First NYC LL144 Enforcement Actions
- 2026 AI Hiring Laws: What Changed
Disclaimer: This content is for informational purposes only and does not constitute legal advice. Employment laws vary by jurisdiction and change frequently. Consult a qualified employment attorney for guidance specific to your situation. EmployArmor provides compliance tools and resources but is not a law firm.