CS 4910 - Special Topics in Computer Science: Algorithm Audits
Machine Learning and Artificial Intelligence increasingly drive decisions that influence our daily lives, from advertising, to school admissions, hiring, and healthcare. Promises of increased efficiency came at a cost of transparency, making it difficult to answer some fundamental questions: Are these systems fair and just? Are they less biased and discriminatory than humans, or do they perpetuate the societal inequities? Are they being used to entrench sexist and/or racist policies? Do they work equally well for different groups of users? Do they amplify harmful content, close us in Filter Bubbles?
What are algorithm audits?
An audit is a tool for investigating whether a certain complex system is behaving and performing according to some standard. An algorithm audit would look at a Machine Learning/Artificial Intelligence product to measure the extent to which certain harms are happening. For example: one can audit a personalized advertising system to verify whether it perpetuates housing segregation by showing different housing opportunities to people of different races.
There are many approaches to algorithm audits, but in this course we will focus on external audits. We will use them to identify problems in deployed ML/AI products without any insider access and without the need for the audited company to allow it. See an example of an external algorithm audit done recently by the instructors of this course:
To get a more concrete feel for the programming aspect of this class, feel free to explore this Jupyter Notebook. You can run this code to partially replicate an audit of face detection/classification products, which showed that the commercial algorithms perform significantly worse for Black women than they do for white men.
Learning objectives
In this course you will learn how to design and implement Algorithm Audits that investigate potential harms of online services. Performing audits requires multiple skills which we will learn during the course:
- Experiment design:
Designing audits that measure the effects of interests and control noise sources
Minimize potential harms of audits to all stakeholders
Legal bounds of algorithm audits
Beyond audits: potential harms that cannot be measured through audits
- Technical skills:
Collecting web data
Reverse engineering web APIs
Simulating user activity through headless and headed automated browsing
- Analytical skills:
Machine Learning performance and error analysis
Rudimentary statistics (significance tests, regression, ranked lists comparisons)
Data visualization
Course content
In addition to lectures there will be:-
In-class practical excercises (later submitted as homework)
Guest lectures and Q&As by algorithm audit practitioners
Discussions on published auditing work
Final group project during which each team will design and perform an audit
Prerequisites
Good working knowledge of Python is required (at least DS3000 / CS3500), as well as strong interest in algorithm audits. Some understanding of ML/AI as well as algorithms and data structures will be helpful but not strictly required.
Main instructor
The course will be taught by me, Dr. Piotr Sapiezynski. I am an Associate Reasearch Scientist and I specialize in research on algorithm audits. I strongly believe in the importance of sharing the skills, findings, and ideas with audiences outside of academic conferences: I've taught DS2000 and DS3000, I've presented to Congress, and I've collaborated with journalists and non-governmental organizations. The class will also feature guest lectures by incredible researchers and practicioners and I hope they inspire you as much as they inspire me.