Module 12: Machine Learning in Society - Ethics and Responsibility
Introduction
In May 2016, a team of journalists at ProPublica published an investigation that would reshape how the world thinks about algorithms. They analyzed a risk assessment tool called COMPAS, used in courtrooms across America to predict whether defendants would commit future crimes. Their finding: the algorithm was biased against Black defendants, labeling them as high risk at nearly twice the rate of white defendants—even when controlling for prior crimes.
The COMPAS controversy opened a floodgate of questions that data scientists are still grappling with: Can algorithms be fair? Who is harmed when they’re not? What responsibility do we bear for the systems we build? How do we balance efficiency with equity, innovation with precaution?
This module confronts these questions directly. Machine learning is not just a technical discipline—it’s a force reshaping society, from who gets hired to who gets medical care to who gets surveilled. Data scientists are not neutral technicians; we are making choices, often hidden in code, that affect millions of lives.
Part 1: When Algorithms Discriminate
The Faces of Algorithmic Bias
Bias in machine learning takes many forms:
Historical Bias: When training data reflects past discrimination. If a hiring algorithm learns from historical hires at a company that discriminated against women, it will learn to discriminate against women.
Representation Bias: When certain groups are underrepresented in training data. Facial recognition systems trained primarily on white faces perform poorly on Black faces.
Measurement Bias: When the proxy we measure differs from what we care about. Using arrest records as a proxy for crime rates encodes policing patterns, not crime patterns.
Aggregation Bias: When a single model is used for groups with different characteristics. A medical algorithm calibrated on average patients may fail for specific populations.
Evaluation Bias: When benchmark datasets don’t represent real-world populations. ImageNet’s categories and images reflect Western perspectives.
Case Study: Amazon’s Hiring Algorithm
In 2018, Reuters revealed that Amazon had developed a machine learning tool to screen job applicants—and then scrapped it when they discovered it was biased against women.
The algorithm was trained on resumes submitted to Amazon over 10 years—a period when the tech industry (and Amazon) was overwhelmingly male. The system learned to penalize resumes containing the word “women’s” (as in “women’s chess club”) and downgraded graduates of all-women’s colleges.
Amazon tried to remove the bias by eliminating explicitly gendered terms. It didn’t work. The algorithm found other proxies. They eventually abandoned the project entirely.
The lesson: bias isn’t a bug you can simply debug. It’s baked into data, into features, into the very definition of what we’re optimizing for.
Case Study: Healthcare Allocation
In 2019, a study published in Science revealed that a widely used healthcare algorithm exhibited significant racial bias. The algorithm, used by hospitals to identify patients who need extra care, was much less likely to refer Black patients than equally sick white patients.
The root cause was subtle. The algorithm used healthcare costs as a proxy for healthcare needs. But Black patients in America, facing systemic barriers, historically spent less on healthcare even when equally ill. The algorithm learned that Black patients had lower “need”—when really they had less access.
Replacing the biased label with a better measure of actual health needs reduced the racial disparity by 84%.
Part 2: Fairness - A Moving Target
What Does “Fair” Mean?
There is no single definition of algorithmic fairness. Researchers have proposed dozens of mathematical definitions, and many are mutually incompatible. Here are the major ones:
Demographic Parity (Statistical Parity): Predictions should be independent of protected group membership. Equal rates of positive predictions across groups.
\[P(\hat{Y}=1 | A=0) = P(\hat{Y}=1 | A=1)\]Equalized Odds: True positive rates and false positive rates should be equal across groups. If the algorithm flags 80% of actual re-offenders in one group, it should flag 80% in all groups.
Predictive Parity: Among those predicted positive, equal proportions should actually be positive across groups (equal positive predictive value).
Individual Fairness: Similar individuals should receive similar predictions. A person shouldn’t be treated worse just because of their group membership.
Counterfactual Fairness: A decision is fair if it would have been the same had the individual belonged to a different protected group.
The Impossibility Theorems
A disturbing mathematical result: when base rates differ between groups (when one group actually has higher rates of the outcome), it is mathematically impossible to satisfy multiple fairness criteria simultaneously.
Chouldechova’s Theorem (2017): If a classifier has equal predictive parity across groups, and the base rates differ, then equalized odds cannot hold.
Kleinberg-Mullainathan-Raghavan (2016): Except in trivial cases, it is impossible to simultaneously satisfy calibration, balance for the positive class, and balance for the negative class.
This means fairness involves trade-offs. We cannot have everything. We must choose which definition matters most for a given application—a fundamentally ethical, not technical, choice.
Fairness Through Awareness vs. Fairness Through Blindness
Two opposing approaches:
Fairness Through Blindness: Remove protected attributes (race, gender) from the model. The intuition: if the algorithm doesn’t see race, it can’t discriminate.
The problem: other features (zip code, name, purchasing patterns) can serve as proxies. Removing race doesn’t prevent racial discrimination.
Fairness Through Awareness: Explicitly include protected attributes and constrain the model to be fair with respect to them.
The problem: this requires collecting sensitive data, and some jurisdictions prohibit using protected characteristics in decisions.
Neither approach is a complete solution. Fairness requires thoughtful consideration of context, not a mechanical procedure.
Part 3: Privacy in the Age of Data
The Collapse of Anonymity
Data scientists often promise “anonymity”—removing names and identifiers before analysis. But true anonymization is nearly impossible.
Netflix Prize (2006): Netflix released 100 million movie ratings, anonymized by removing user names. Researchers at UT Austin showed they could re-identify users by cross-referencing with public IMDB ratings.
AOL Search Data (2006): AOL released 20 million search queries from 650,000 users, identified only by numbers. New York Times reporters identified “User 4417749” as a 62-year-old widow in Georgia by analyzing her searches.
Location Data: Studies show that just four spatio-temporal points (places and times) are enough to uniquely identify 95% of people in a dataset of 1.5 million.
The lesson: removing obvious identifiers is not enough. Our data is our fingerprint.
Differential Privacy
Differential Privacy, developed by Cynthia Dwork and colleagues, offers a rigorous mathematical guarantee: the output of an analysis should be nearly the same whether or not any single individual is included in the dataset.
The key mechanism: add carefully calibrated random noise to query results. The noise is enough to mask any individual’s contribution while preserving overall statistical patterns.
Apple uses differential privacy for usage analytics. The US Census uses it for population data. Google uses it for Chrome usage statistics.
The trade-off: more privacy means more noise, which means less accuracy. Differential privacy formalizes this trade-off mathematically.
Surveillance and Power
Beyond individual privacy lies systemic surveillance:
Mass Surveillance: Government collection of communications, movements, associations Corporate Surveillance: Tech companies tracking online behavior, purchases, locations Workplace Surveillance: Employers monitoring productivity, communications, physical movement
Machine learning amplifies surveillance by making pattern detection automatic. What was once impossible at scale—identifying every face in a crowd, analyzing every phone call—is now trivial.
This creates power asymmetries. Those who control the data and algorithms hold enormous power over those who are observed. Data science can be a tool of liberation or oppression.
Part 4: Accountability and Transparency
The Black Box Problem
Deep learning models can have billions of parameters. Their decision processes are not easily explained—even by their creators. When a medical AI recommends treatment, or a criminal justice AI recommends detention, who understands why?
This opacity creates problems:
- Legal: Many jurisdictions require explanations for consequential decisions
- Trust: People reasonably distrust decisions they don’t understand
- Debugging: How do you fix a biased system you don’t understand?
- Accountability: Who is responsible when opaque systems fail?
Explainable AI (XAI)
Researchers have developed techniques to interpret black-box models:
Feature Importance: Which input features most influenced the prediction?
LIME (Local Interpretable Model-agnostic Explanations): Approximate the complex model locally with a simple, interpretable model.
SHAP (SHapley Additive exPlanations): Use game theory to attribute prediction contributions to features.
Attention Visualization: For transformers, show which parts of the input the model “attends” to.
Counterfactual Explanations: What minimal change to the input would change the prediction?
The Limits of Explanation
But explanations have limits:
- Faithfulness: Does the explanation accurately reflect the model’s reasoning, or is it a post-hoc rationalization?
- Completeness: Can any explanation capture a model with billions of parameters?
- Comprehension: Can non-experts understand even “simple” explanations?
- Gaming: If explanations are public, can people manipulate inputs to exploit them?
Some researchers argue we should move from “explain this model” to “design models that are inherently interpretable”—even if that means accepting lower accuracy.
Part 5: AI Governance and Regulation
The Regulatory Landscape
Governments worldwide are developing frameworks for AI governance:
EU AI Act (2024): The first comprehensive AI regulation. Categorizes AI systems by risk level. “Unacceptable risk” (like social scoring) is banned. “High risk” (like hiring, credit) requires conformity assessments, documentation, and human oversight.
US Executive Order on AI (2023): Requires safety testing and transparency for powerful AI systems. Establishes standards for government AI use.
China’s AI Regulations: Requires algorithmic recommendations to offer opt-outs, mandates security reviews for generative AI.
GDPR (EU): Includes a “right to explanation” for automated decisions (Article 22).
Professional Ethics
Beyond law, data scientists have professional responsibilities:
ACM Code of Ethics: “Avoid harm,” “Be honest and trustworthy,” “Be fair,” “Respect privacy”
Data Science Oath: Various proposed oaths modeled on the Hippocratic Oath, emphasizing responsibility and humility
Organizational Standards: Companies like Google, Microsoft, and Meta have published AI ethics principles (with varying degrees of enforcement)
The Challenge of Self-Regulation
Industry self-regulation has a mixed record. Ethics boards have been disbanded (Google’s AI ethics board lasted one week). Principles have been ignored when they conflicted with profit. “Ethics washing”—publicly promoting ethics while continuing harmful practices—is common.
This suggests the need for external regulation, independent audits, and meaningful consequences for harm.
Part 6: Societal Impacts
Labor and Automation
Machine learning enables automation of tasks once thought to require human intelligence:
- Legal document review
- Medical diagnosis
- Customer service
- Content moderation
- Driving vehicles
This creates both opportunities and disruptions:
- Increased productivity and new products
- Job displacement and wage pressure
- Shifts in required skills
- Potential for either greater equality or greater inequality
Filter Bubbles and Polarization
Recommendation algorithms optimize for engagement—keeping you on the platform. This can create:
Filter Bubbles: Showing you content similar to what you’ve already seen, limiting exposure to diverse perspectives
Radicalization Pipelines: Recommendation systems that progressively suggest more extreme content because extreme content drives engagement
Misinformation Spread: Algorithms that amplify false but engaging content
Environmental Impact
Training large AI models consumes enormous energy:
- GPT-3 training: estimated 1,287 MWh, equivalent to ~500 tons of CO2
- Running inference at scale adds continuous energy demand
- Data centers require cooling, often using water
Responsible AI development must consider environmental sustainability.
Part 7: What Data Scientists Can Do
Building Ethical Practice
Before Building:
- Ask: Should this system exist? Who benefits? Who might be harmed?
- Consult affected communities
- Consider alternatives to automated systems
During Development:
- Audit training data for bias
- Test on diverse populations
- Document assumptions and limitations
- Include fairness metrics alongside accuracy
After Deployment:
- Monitor for disparate impact
- Create channels for feedback and complaints
- Maintain ability to correct or disable systems
- Accept responsibility for failures
The Importance of Diversity
Homogeneous teams build blind spots into systems. Diversity of background, experience, and perspective helps identify harms that might otherwise be missed.
The data science field has significant diversity problems—particularly in race, gender, and socioeconomic background. Addressing this is both an ethical imperative and a practical necessity for building better systems.
Resistance and Refusal
Sometimes the ethical choice is not to build. Data scientists have:
- Refused to work on military AI (Google employee protests over Project Maven)
- Leaked information about harmful systems
- Organized collectively for ethical practices
Technical workers have power. Using it responsibly sometimes means saying no.
DEEP DIVE: The COMPAS Controversy - When Algorithms Judge
The Algorithm in the Courtroom
Eric Loomis was arrested in La Crosse, Wisconsin, in 2013. He had been driving a car used in a drive-by shooting—he wasn’t the shooter, but he had a criminal history. At sentencing, the judge consulted a risk assessment score from a proprietary algorithm called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions).
COMPAS, developed by Northpointe (now Equivant), analyzed 137 features about a defendant—criminal history, age, employment, education, housing stability, substance abuse, and more—and produced scores predicting the likelihood of recidivism (committing another crime) and violent recidivism.
Eric Loomis’s COMPAS scores were high. The judge sentenced him to six years in prison, explicitly citing the algorithm’s assessment.
Loomis appealed, arguing that the use of a secret proprietary algorithm violated his due process rights. In 2016, the Wisconsin Supreme Court ruled against him. Judges could use COMPAS as long as it wasn’t the sole factor in sentencing. The algorithm could influence how long someone spent in prison—but neither the defendant nor the court could inspect its workings.
ProPublica’s Investigation
In May 2016, ProPublica published “Machine Bias,” an investigation into COMPAS. Their methodology was straightforward:
- Obtain COMPAS scores for 7,000 defendants in Broward County, Florida
- Track who actually committed new crimes within two years
- Analyze errors across racial groups
Their findings were stark:
Among defendants who did not re-offend:
- Black defendants: 44.9% were labeled high risk (false positives)
- White defendants: 23.5% were labeled high risk
Among defendants who did re-offend:
- Black defendants: 28% were labeled low risk (false negatives)
- White defendants: 47.7% were labeled low risk
In summary: Black defendants who wouldn’t re-offend were nearly twice as likely to be wrongly labeled dangerous. White defendants who would re-offend were nearly twice as likely to be wrongly labeled safe.
Northpointe’s Response
Northpointe pushed back, arguing that their algorithm was fair by a different definition: predictive parity. Among defendants labeled high risk, similar percentages of Black and white defendants actually re-offended. The algorithm was equally accurate in both groups.
Both claims were true. Both definitions of fairness were valid. But they conflicted—and as the impossibility theorems proved, they could not both be satisfied when base rates differed.
The real question wasn’t which definition was “correct.” It was which definition mattered more in this context—and who got to decide.
The Debate Deepens
The COMPAS controversy sparked an explosion of research and debate:
Base Rates and Fairness: Black defendants did have higher re-offense rates in the data. But why? Historic discrimination, differential policing, economic inequality—the data reflected societal injustice. Using such data to predict outcomes can perpetuate the very inequalities it encodes.
What Does “Recidivism” Mean?: COMPAS predicted rearrest, not actual reoffending. Rearrest rates reflect policing patterns as much as behavior. Black communities are more heavily policed, meaning Black individuals are more likely to be rearrested for equivalent behavior.
Accuracy vs. Fairness: COMPAS was reasonably accurate (~65% overall, comparable to expert predictions). But accuracy doesn’t guarantee fairness. A system can be accurate on average while causing systematic harm to specific groups.
Transparency: COMPAS was proprietary. Defendants couldn’t challenge scores based on errors in the algorithm. Courts couldn’t audit it. This secrecy undermined due process.
The Human Alternative
What’s the alternative to algorithmic risk assessment? Often, it’s judicial intuition—which is also biased, inconsistent, and opaque. Studies have shown that judges are harsher before lunch, influenced by irrelevant information, and subject to racial bias.
The question isn’t “algorithms vs. humans” but rather: How do we design systems—whether algorithmic or human—that are as fair and accurate as possible?
Some jurisdictions have moved to simpler, transparent tools based on just a few objective factors. Others have abandoned risk assessment entirely for certain decisions. There is no consensus.
Lessons for Data Scientists
The COMPAS story teaches critical lessons:
-
Fairness is not a technical property: Different definitions of fairness encode different values. Choosing among them is an ethical and political choice, not a mathematical one.
-
Data encodes history: When historical data reflects discrimination, models trained on that data will learn discrimination. You cannot simply “remove bias” after the fact.
-
Context matters: A risk score that might be acceptable for allocating social services could be unacceptable for determining prison sentences. The stakes matter.
-
Transparency enables accountability: When algorithms are secret, they cannot be meaningfully challenged. Due process requires the ability to contest decisions.
-
Deployment is not the end: Systems must be continuously monitored for disparate impact, not just validated once at development.
-
Affected communities must have voice: Decisions about algorithmic fairness cannot be made only by developers. Those affected—defendants, communities—must have a say.
LECTURE PLAN: Ethics in Machine Learning - Power, Bias, and Responsibility
Learning Objectives
By the end of this lecture, students will be able to:
- Identify sources and types of algorithmic bias
- Explain multiple definitions of fairness and their trade-offs
- Analyze real-world case studies of algorithmic harm
- Apply ethical frameworks to data science decisions
- Propose interventions to make systems more fair
Lecture Structure (90 minutes)
Opening Hook (10 minutes)
The Sentencing Algorithm
- Present the Eric Loomis case
- Show the COMPAS controversy in headlines
- Ask: “Should an algorithm influence how long someone goes to prison?”
- Poll: Initial reactions
Part 1: Algorithmic Bias (18 minutes)
What is Bias? (5 minutes)
- Technical definition vs. social definition
- Historical, representation, measurement bias
- Key insight: bias isn’t just prejudice—it’s systematic error
Case Studies of Harm (8 minutes)
- Amazon hiring: trained on biased history
- Healthcare allocation: wrong proxy variable
- Facial recognition: performance disparities
- Credit scoring: protected classes and proxies
Where Bias Comes From (5 minutes)
- Training data reflecting historical discrimination
- Proxy variables and redlining
- Optimization objectives that ignore fairness
- Homogeneous development teams
Part 2: Defining Fairness (20 minutes)
The Challenge of Definition (5 minutes)
- Ask students: “What would make an algorithm fair?”
- Collect intuitions, show they conflict
- There is no single definition of fairness
Major Fairness Definitions (10 minutes)
- Demographic parity: equal rates across groups
- Equalized odds: equal error rates
- Predictive parity: equal accuracy among positives
- Individual fairness: similar treatment for similar individuals
- Mathematical formulation of each
The Impossibility Result (5 minutes)
- Present Chouldechova’s theorem
- When base rates differ, definitions conflict
- This means fairness requires choices, not just calculation
- Discuss implications: who decides?
Part 3: The COMPAS Deep Dive (15 minutes)
ProPublica’s Investigation (7 minutes)
- Methodology: obtain scores, track outcomes, analyze by race
- Findings: disparate false positive and false negative rates
- Visualize the data
Northpointe’s Response (5 minutes)
- Their fairness definition: predictive parity
- Why both claims were true
- The impossibility theorem in practice
Lessons (3 minutes)
- Context and stakes matter
- Transparency enables accountability
- Affected communities must have voice
Part 4: Privacy and Power (12 minutes)
The End of Anonymity (4 minutes)
- Netflix, AOL, location data re-identification
- Your data is your fingerprint
- Aggregation enables identification
Differential Privacy (4 minutes)
- The mathematical guarantee
- Adding noise to protect individuals
- Trade-off: privacy vs. accuracy
Surveillance and Power Asymmetries (4 minutes)
- Mass surveillance capabilities
- Corporate data collection
- Who watches whom?
Part 5: What Can We Do? (10 minutes)
Before, During, After (5 minutes)
- Before: Should this exist? Who is affected?
- During: Audit data, test on diverse groups, document
- After: Monitor, accept responsibility, enable feedback
Structural Changes (3 minutes)
- Diversity in development teams
- External audits and regulation
- Affected community involvement
The Power of Refusal (2 minutes)
- Sometimes the answer is no
- Examples of tech worker resistance
Wrap-Up (5 minutes)
- Return to opening: What have we learned?
- Fairness requires choices, not just code
- Data scientists have responsibility
- The technical is political
- Discussion questions for reflection
Materials Needed
- COMPAS data visualizations
- Case study slides with images
- Interactive fairness demo (showing trade-offs)
- Headlines and news clips
Discussion Questions
- If base rates differ between groups, which type of fairness should we prioritize? Who should decide?
- Should companies be required to publish algorithmic impact assessments?
- Is there a meaningful difference between human bias and algorithmic bias?
- What would need to change for you to trust an algorithm with sentencing decisions?
HANDS-ON EXERCISE: Auditing an Algorithm for Fairness
Overview
In this exercise, students will:
- Analyze a dataset for potential sources of bias
- Train a classification model
- Audit the model for disparate impact across groups
- Explore fairness-accuracy trade-offs
- Propose and test interventions
Prerequisites
- Python 3.8+
- Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn, fairlearn
Setup
# Install required packages
# pip install pandas numpy scikit-learn matplotlib seaborn fairlearn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (accuracy_score, confusion_matrix,
classification_report)
import warnings
warnings.filterwarnings('ignore')
# Set style
plt.style.use('seaborn-v0_8-whitegrid')
Part 1: Loading and Exploring the Data (15 minutes)
# We'll use the Adult Income dataset (predicting >$50K income)
# This dataset has known fairness issues
from sklearn.datasets import fetch_openml
# Load dataset
adult = fetch_openml(name='adult', version=2, as_frame=True)
df = adult.frame
print("Dataset shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nTarget distribution:")
print(df['income'].value_counts(normalize=True))
Task 1.1: Examine demographic distributions
# Examine key demographic features
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Race distribution
sns.countplot(data=df, x='race', ax=axes[0, 0])
axes[0, 0].set_title('Race Distribution')
axes[0, 0].tick_params(axis='x', rotation=45)
# Sex distribution
sns.countplot(data=df, x='sex', ax=axes[0, 1])
axes[0, 1].set_title('Sex Distribution')
# Income by race
pd.crosstab(df['race'], df['income'], normalize='index').plot(
kind='bar', ax=axes[1, 0]
)
axes[1, 0].set_title('Income by Race')
axes[1, 0].tick_params(axis='x', rotation=45)
# Income by sex
pd.crosstab(df['sex'], df['income'], normalize='index').plot(
kind='bar', ax=axes[1, 1]
)
axes[1, 1].set_title('Income by Sex')
plt.tight_layout()
plt.show()
# Calculate base rates
print("\nBase rate (>$50K) by group:")
print(df.groupby('sex')['income'].apply(lambda x: (x == '>50K').mean()))
print(df.groupby('race')['income'].apply(lambda x: (x == '>50K').mean()))
Part 2: Data Preparation (15 minutes)
# Prepare features
# We'll keep sex and race for fairness analysis but explore whether to use in model
# Select features
feature_cols = ['age', 'workclass', 'education', 'education-num',
'marital-status', 'occupation', 'relationship',
'capital-gain', 'capital-loss', 'hours-per-week',
'native-country']
# Create binary target
df['income_binary'] = (df['income'] == '>50K').astype(int)
# Store protected attributes
protected_race = df['race']
protected_sex = df['sex']
# Encode categorical features
df_model = df[feature_cols].copy()
for col in df_model.select_dtypes(include=['object', 'category']).columns:
df_model[col] = LabelEncoder().fit_transform(df_model[col].astype(str))
X = df_model.values
y = df['income_binary'].values
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Also split protected attributes
_, race_test, _, sex_test = train_test_split(
protected_race, protected_sex, test_size=0.2, random_state=42
)
print(f"Training set: {len(X_train)}")
print(f"Test set: {len(X_test)}")
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Part 3: Training and Evaluating a Model (15 minutes)
# Train a logistic regression model
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train_scaled, y_train)
# Predictions
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]
# Overall accuracy
print("Overall Model Performance:")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['<=50K', '>50K']))
Part 4: Fairness Audit (30 minutes)
def fairness_audit(y_true, y_pred, protected_attribute, attribute_name):
"""
Compute fairness metrics across groups.
"""
groups = protected_attribute.unique()
results = []
for group in groups:
mask = protected_attribute == group
if mask.sum() < 10: # Skip very small groups
continue
y_true_group = y_true[mask]
y_pred_group = y_pred[mask]
# Calculate metrics
n = mask.sum()
positive_rate = y_pred_group.mean() # Selection/prediction rate
accuracy = accuracy_score(y_true_group, y_pred_group)
# True positive rate (recall)
true_positives = ((y_pred_group == 1) & (y_true_group == 1)).sum()
actual_positives = (y_true_group == 1).sum()
tpr = true_positives / actual_positives if actual_positives > 0 else 0
# False positive rate
false_positives = ((y_pred_group == 1) & (y_true_group == 0)).sum()
actual_negatives = (y_true_group == 0).sum()
fpr = false_positives / actual_negatives if actual_negatives > 0 else 0
results.append({
'Group': group,
'N': n,
'Base Rate': y_true_group.mean(),
'Positive Pred Rate': positive_rate,
'Accuracy': accuracy,
'TPR': tpr,
'FPR': fpr
})
results_df = pd.DataFrame(results)
print(f"\nFairness Audit for {attribute_name}:")
print(results_df.round(4).to_string(index=False))
return results_df
# Audit by sex
sex_audit = fairness_audit(y_test, y_pred, sex_test, "Sex")
# Audit by race
race_audit = fairness_audit(y_test, y_pred, race_test, "Race")
Task 4.1: Calculate fairness ratios
def calculate_fairness_ratios(audit_df, baseline_group):
"""
Calculate disparity ratios relative to a baseline group.
"""
baseline = audit_df[audit_df['Group'] == baseline_group].iloc[0]
ratios = []
for _, row in audit_df.iterrows():
ratios.append({
'Group': row['Group'],
'Prediction Rate Ratio': row['Positive Pred Rate'] / baseline['Positive Pred Rate'],
'TPR Ratio': row['TPR'] / baseline['TPR'],
'FPR Ratio': row['FPR'] / baseline['FPR']
})
ratios_df = pd.DataFrame(ratios)
print("\nFairness Ratios (1.0 = parity):")
print(ratios_df.round(4).to_string(index=False))
# 80% rule (common legal threshold)
print("\n80% Rule Assessment (ratio should be >= 0.8):")
for _, row in ratios_df.iterrows():
pred_ratio = row['Prediction Rate Ratio']
status = "PASS" if pred_ratio >= 0.8 else "FAIL"
print(f" {row['Group']}: {pred_ratio:.4f} ({status})")
return ratios_df
# Calculate ratios
sex_ratios = calculate_fairness_ratios(sex_audit, 'Male')
race_ratios = calculate_fairness_ratios(race_audit, 'White')
Task 4.2: Visualize disparities
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Prediction rates by sex
ax = axes[0, 0]
sex_audit.plot(x='Group', y='Positive Pred Rate', kind='bar', ax=ax, legend=False)
ax.axhline(y=sex_audit['Positive Pred Rate'].mean(), color='red', linestyle='--',
label='Average')
ax.set_title('Positive Prediction Rate by Sex')
ax.set_ylabel('Rate')
ax.tick_params(axis='x', rotation=0)
# TPR and FPR by sex
ax = axes[0, 1]
x = np.arange(len(sex_audit))
width = 0.35
ax.bar(x - width/2, sex_audit['TPR'], width, label='TPR')
ax.bar(x + width/2, sex_audit['FPR'], width, label='FPR')
ax.set_xticks(x)
ax.set_xticklabels(sex_audit['Group'])
ax.legend()
ax.set_title('TPR and FPR by Sex')
# Prediction rates by race
ax = axes[1, 0]
race_audit.plot(x='Group', y='Positive Pred Rate', kind='bar', ax=ax, legend=False)
ax.set_title('Positive Prediction Rate by Race')
ax.set_ylabel('Rate')
ax.tick_params(axis='x', rotation=45)
# TPR and FPR by race
ax = axes[1, 1]
x = np.arange(len(race_audit))
ax.bar(x - width/2, race_audit['TPR'], width, label='TPR')
ax.bar(x + width/2, race_audit['FPR'], width, label='FPR')
ax.set_xticks(x)
ax.set_xticklabels(race_audit['Group'], rotation=45)
ax.legend()
ax.set_title('TPR and FPR by Race')
plt.tight_layout()
plt.show()
Part 5: Fairness Interventions (20 minutes)
# Using Fairlearn library for fairness-aware learning
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
from fairlearn.metrics import MetricFrame, selection_rate
# Fit a fairness-constrained model
# This uses demographic parity constraint
constraint = DemographicParity()
mitigator = ExponentiatedGradient(
LogisticRegression(max_iter=1000),
constraints=constraint
)
# Need to provide protected attribute during training
mitigator.fit(X_train_scaled, y_train,
sensitive_features=sex_test.iloc[:len(X_train_scaled)].reset_index(drop=True))
# Predictions from fair model
y_pred_fair = mitigator.predict(X_test_scaled)
# Compare
print("Original Model:")
print(f" Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f" Male selection rate: {y_pred[sex_test.reset_index(drop=True) == 'Male'].mean():.4f}")
print(f" Female selection rate: {y_pred[sex_test.reset_index(drop=True) == 'Female'].mean():.4f}")
print("\nFair Model (Demographic Parity):")
print(f" Accuracy: {accuracy_score(y_test, y_pred_fair):.4f}")
print(f" Male selection rate: {y_pred_fair[sex_test.reset_index(drop=True) == 'Male'].mean():.4f}")
print(f" Female selection rate: {y_pred_fair[sex_test.reset_index(drop=True) == 'Female'].mean():.4f}")
Task 5.1: Explore the fairness-accuracy trade-off
# Vary constraint strength and observe trade-off
# This is conceptual; actual implementation varies
# Post-processing: adjust thresholds per group
def threshold_adjustment(y_prob, protected, thresholds):
"""
Apply different thresholds to different groups.
"""
y_pred_adjusted = np.zeros(len(y_prob))
for group, threshold in thresholds.items():
mask = protected == group
y_pred_adjusted[mask] = (y_prob[mask] >= threshold).astype(int)
return y_pred_adjusted
# Find thresholds that equalize rates
from scipy.optimize import minimize_scalar
def find_equalizing_threshold(y_prob, protected, target_rate):
"""Find threshold that achieves target selection rate."""
results = {}
for group in protected.unique():
mask = protected == group
probs = y_prob[mask]
def diff(t):
return abs((probs >= t).mean() - target_rate)
result = minimize_scalar(diff, bounds=(0, 1), method='bounded')
results[group] = result.x
return results
# Target the overall selection rate
target = y_pred.mean()
thresholds = find_equalizing_threshold(y_prob, sex_test.reset_index(drop=True), target)
print(f"Thresholds to achieve {target:.4f} selection rate:")
print(thresholds)
y_pred_adjusted = threshold_adjustment(y_prob, sex_test.reset_index(drop=True), thresholds)
print(f"\nAdjusted selection rates:")
for group in ['Male', 'Female']:
mask = sex_test.reset_index(drop=True) == group
print(f" {group}: {y_pred_adjusted[mask].mean():.4f}")
Challenge Questions
-
Trade-offs: What was the accuracy cost of achieving demographic parity? Is this trade-off acceptable?
-
Which Definition?: If you had to choose between equalizing prediction rates and equalizing error rates, which would you choose for this application? Why?
-
Feature Decisions: We excluded sex from the model features but the model still shows gender disparities. Why? What should be done?
-
Historical Context: The training data reflects historical inequities. If we train on this data, we perpetuate those inequities. What alternatives exist?
-
Stakeholder Perspectives: How might different stakeholders (employers, job applicants, regulators) view these fairness trade-offs differently?
Expected Outputs
Students should submit:
- Exploratory analysis showing demographic distributions and base rates
- Fairness audit of baseline model with disparities quantified
- Implementation of at least one fairness intervention
- Comparison of original and fair models on accuracy and fairness metrics
- Written reflection on trade-offs and appropriate fairness definitions
Evaluation Rubric
| Criteria | Points |
|---|---|
| Data exploration and bias identification | 15 |
| Correct fairness metric calculation | 20 |
| Fairness audit visualization and interpretation | 20 |
| Intervention implementation | 20 |
| Trade-off analysis and reflection | 15 |
| Code quality and documentation | 10 |
| Total | 100 |
Recommended Resources
Books
Technical
- Fairness and Machine Learning by Barocas, Hardt, Narayanan - The comprehensive textbook (free online)
- Interpretable Machine Learning by Christoph Molnar - Free online, excellent coverage
- The Ethical Algorithm by Kearns and Roth - Accessible introduction
- Weapons of Math Destruction by Cathy O’Neil - Case studies of algorithmic harm
Philosophical and Social
- Algorithms of Oppression by Safiya Noble - Race and search algorithms
- Race After Technology by Ruha Benjamin - How technology perpetuates racism
- Automating Inequality by Virginia Eubanks - Algorithms and poverty
- The Alignment Problem by Brian Christian - AI safety and values
Academic Papers
- Angwin et al. (2016). “Machine Bias” - ProPublica’s COMPAS investigation
- Chouldechova (2017). “Fair Prediction with Disparate Impact” - The impossibility theorem
- Kleinberg et al. (2017). “Inherent Trade-Offs in the Fair Determination of Risk Scores”
- Buolamwini & Gebru (2018). “Gender Shades” - Facial recognition disparities
- Obermeyer et al. (2019). “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations”
- Bender et al. (2021). “On the Dangers of Stochastic Parrots” - LLM risks
Video Lectures and Talks
- Joy Buolamwini: “How I’m fighting bias in algorithms” - TED Talk
- Cathy O’Neil: “The era of blind faith in big data must end” - TED Talk
- Kate Crawford: “The Trouble with Bias” - NeurIPS keynote
- Timnit Gebru: Various talks on AI ethics and racial justice
- Arvind Narayanan: “21 Fairness Definitions” - Tutorial
Tools and Libraries
- Fairlearn (https://fairlearn.org/) - Microsoft’s fairness toolkit
- AI Fairness 360 (https://aif360.mybluemix.net/) - IBM’s comprehensive toolkit
- What-If Tool (https://pair-code.github.io/what-if-tool/) - Google’s exploration tool
- SHAP (https://shap.readthedocs.io/) - Model explanations
- Aequitas (http://aequitas.dssg.io/) - Bias audit toolkit
Organizations and Resources
- Partnership on AI (https://partnershiponai.org/)
- AI Now Institute (https://ainowinstitute.org/)
- Algorithmic Justice League (https://www.ajl.org/)
- Data & Society (https://datasociety.net/)
- ACM FAccT Conference - Academic fairness, accountability, transparency
References
-
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). “Machine Bias.” ProPublica.
-
Chouldechova, A. (2017). “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.” Big Data, 5(2), 153-163.
-
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). “Inherent Trade-Offs in the Fair Determination of Risk Scores.” Proceedings of ITCS.
-
Obermeyer, Z., et al. (2019). “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science, 366(6464), 447-453.
-
Buolamwini, J., & Gebru, T. (2018). “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” Conference on Fairness, Accountability and Transparency.
-
Dwork, C., et al. (2012). “Fairness Through Awareness.” Proceedings of ITCS.
-
Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org.
-
O’Neil, C. (2016). Weapons of Math Destruction. Crown Publishing.
-
Benjamin, R. (2019). Race After Technology. Polity Press.
-
Noble, S.U. (2018). Algorithms of Oppression. NYU Press.
-
Eubanks, V. (2018). Automating Inequality. St. Martin’s Press.
-
Dwork, C., & Roth, A. (2014). “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science, 9(3-4).
Module 12 confronts the ethical dimensions of machine learning—the biases, harms, and power dynamics that arise when algorithms make consequential decisions about human lives. Through the COMPAS controversy, we learn that fairness is not a technical property but a contested value, and that data scientists bear responsibility for the systems they build.