piotr sapiezynski - algorithm auditing, fairness, transparency

2024

On the Use of Proxies in Political Advertising

Piotr Sapiezynski*, Levi Kaplan*, Alan Mislove, Aleksandra Korolova. CSCW, 2024

During the 2016 U.S. Presidential campaign Russian Intelligence ran Facebook with conflicting messages to sow discord and discourage Democratic participation. To limit advertiser's ability to micro-target audiences, Facebook removed targeting criteria that explicitly referred to race, gender, and other sensitive attributes of users. In this paper we show that major advertisers in the U.S. get around this limitation by using proxy attributes. Our major contribution is the method for objectively measuring how effective different criteria are, without relying on their names.

PDF | BibTeX | Short write-up
Fairness in Online Ad Delivery

Joachim Baumann, Piotr Sapiezynski, Christoph Heitz, Anikó Hannák. FAccT, 2024

In a departure from my usual type of work, this paper is entirely based on simulations. In a not-a-departure from my usual work, we simulate an online ad delivery algorithm and measure the influence of enforcing different fairness constraints. We show that achieving statistical parity would often come a much higher cost enforcing predictive parity or equality of opportunity. We stress that it should be the platforms that cover this cost, not advertisers or users as would happen by default.
PDF | BibTeX

2023
Problematic Advertising and its Disparate Exposure on Facebook

Muhammad Ali, Angelica Goetzen, Alan Mislove, Elissa M Redmiles, Piotr Sapiezynski. USENIX Security, 2023.

Online ads can be a vehicle through which malicious actors disseminate problematic content, such as scams or clickbait. Ad delivery algorithms may even be helping such advertisers find vulnerable populations. In this paper, we study Facebook (again!) and investigate key gaps in our understanding of problematic online advertising: (a) What categories of ads do people find problematic? (b) Are there disparities in the distribution of problematic ads to viewers? and if so, (c) Who is responsible -- advertisers or advertising platforms? Based on ad data donated by a representative panel of users, we find that older people and minority groups are especially likely to be shown such ads. Further, given that 22% of problematic ads had no specific targeting from advertisers but are still shown more to the vulnerable, we infer that ad delivery algorithms (advertising platforms themselves) played a significant role in the biased distribution of these ads.

PDF | BibTeX
Detrimental network effects in privacy: A graph-theoretic model for node-based intrusions

Florimond Houssiau, Piotr Sapieżyński, Laura Radaelli, Erez Shmueli, Yves-Alexandre de Montjoye. Cell - Patterns, 2023.

Proportionality (ensuring that the data collected are relevant for the purposes of the processing) is a key tenet of modern data protection laws, such as the EU General Data Protection Regulation. Evaluating proportionality when "small-scale" data are collected can already be difficult. This only be-comes harder when entering the realm of "big data" and, in particular, (big) networked data. Indeed, a lot of data collected today intrinsically relate to more than one person. This includes social network data, messaging data, and close proximity data. From a data protection perspective, this means that even though data about only a handful of people are collected, information about many more people might be included in the dataset. This is what happened with Cambridge Analytica: through 270,000 accounts they collected data about 68.0M people. So far, we have not had a tool to estimate the number of people affected by networked data collection. In this paper, we propose and validate such a tool.

PDF | BibTeX

2022
Measurement and Analysis of Implied Identity in Ad Delivery Optimization

Levi Kaplan, Nicole Gerzon, Alan Mislove, Piotr Sapiezynski. Internet Measurement Conference (IMC), 2022. The Best Long Paper Award

FB algorithms classify race, gender, and age of people in ad images and make delivery decisions based on the predictions. Images of Black folks are shown more to B users; images of children are delivered more to women; images of young women reach older men. The results hold for stock images or real people, StyleGAN generated images where we control for confounding factors, and realistic job ads (the same job will end up reaching different people depending on the demographics implied in the image). Even if such delivery decisions reflect population-level interests, they can contribute to further stereotyping and could act counter to advertiser’s intentions of promoting gender diversity.

PDF | BibTeX | Data | press coverage
Transparency and Targeting of Political Advertising: Public Hearing before the European Parliament's Committee on Internal Market and Consumer Protection (IMCO)

Piotr Sapiezynski, oral testimony transcript. July 11, 2022.

On July 11, 2022 I appeared before the European Parliament's Committee on Internal Market and Consumer Protection (IMCO) to talk about our research on political ad delivery in the context of a newly proposed legislation on transparency and targeting of political ads. As you can see, the ad delivery problem is not mentioned in the name of the legislation so I took the opportunity to argue for the importance of recognizing the distinct roles that advertisers and platforms play and holding both accountable.

PDF | BibTeX | Video
All Things Unequal: Measuring Disparity of Potentially Harmful Ads on Facebook

Muhammad Ali, Angelica Goetzen, Piotr Sapiezynski, Elissa Redmiles, and Alan Mislove In Proceedings of the Workshop on Technology and Consumer Protection (ConPro'22).

This paper is a significant departure from how we had been doing Facebook research. Here we are no longer advertisers running ads and analysing their delivery. Instead, we hired a diverse panel of Facebook users and collected the ads that they see, along with the targeting criteria that were used to reach them. We show that potentially harmful ads are not unevenly distributed among users: a small fraction of users get a diet that disproportionatelly features potentially harmful content.

PDF | BibTeX
Algorithms that "Don't See Color": Comparing Biases in Lookalike and Special Ad Audiences

Piotr Sapiezynski, Avijit Ghosh, Levi Kaplan, Aaron Rieke, Alan Mislove. AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, AIES 2022

Following accusations of allowing discrimination on their ad platform, Facebook settled with the civil rights groups and agreed to introduce a number of changes to the platform. Among them, they introduced a tool called Special Audiences, that allows advertisers to reach users "similar" to their customers (or employees) but without considering age, gender, race, etc. In this report we show that simply not looking at these protected attributes doesn't change anything - the created audiences have nearly the same level of bias as the source audience. Following the revelations in the paper Meta was sued by the Department of Justice, had to remove Special Audiences functionality altogether, and was handed the highest possible fine for housing discrimination.

PDF | BibTeX | press coverage
From Home Base to Swing States: The Evolution of Digital Advertising Strategies during the 2020 US Presidential Primary

NaLette Brodnax, Piotr Sapiezynski. Political Research Quarterly, 2022

We analyze the advertising strategies of US presidential election campaigns during the 2020 primary cycle. We show that campaigns employed a new strategy of targeting voters in candidates’ home states during the "invisible primary": home state targeting is a key strategy for all campaigns, rather than just for politicians with existing political and financial networks. We also find that as the first wave of state caucuses and primary elections approach, campaigns shift digital ad expenditures to states with early primaries such as Iowa and New Hampshire and, to a lesser extent, swing states.

PDF | BibTeX

2021
Ad Delivery Algorithms: The Hidden Arbiters of Political Messaging

Muhammad Ali*, Piotr Sapiezynski*, Aleksandra Korolova, Alan Mislove, Aaron Rieke. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021

Political speech is paid, not free. On Facebook it also cost different amounts to advertise different political opinions to the same people. Showing liberal ads to conservatives (or conservative ads to liberals) can cost three times more than showing an "aligned" ad to the same audience. Further, when a political advertiser tries to show their ad to a broad audience, Facebook will show it predominantly to people who already agree with the message instead.

PDF | BibTeX | press coverage
Emergence of network effects and predictability in the judicial system

Enys Mones, Piotr Sapieżyński, Simon Thordal, Henrik Palmer Olsen, Sune Lehmann. Nature - Scientific Reports, 2021

We analyze the patterns of citations between cases in the Court of Justice of the European Union using network science methods. We show that over time the complex network of citations evolves in a way which improves our ability to predict new citations. Investigating the factors which enable prediction over time, we find that the content of the case documents plays a decreasing role, whereas both the predictive power and significance of the citation network structure itself show a consistent increase over time. Finally, our analysis enables us to validate existing citations and recommend potential citations for future cases within the court.

PDF | BibTeX

2020
The Fallibility of Contact Tracing Apps

Piotr Sapiezynski, Johanna Pruessing, and Vedran Sekara. arXiv preprint, 2020

As corporations, academics, governments, and civil society discuss the right way to implement contact tracing apps, we noticed recurring implicit assumptions. The proposed solutions are designed for a world where Internet access and smartphone ownership are a given, people are willing and able to install these apps, and those who receive notifications about potential exposure to the virus have access to testing and can isolate safely. In this work we challenge these assumptions. We warn about the potential consequences of over-extending the existing state and corporate surveillance powers and describe a multitude of scenarios where contact tracing apps will not help regardless of access or policy. We call for a comprehensive and equitable policy response that prioritizes the needs of the most vulnerable, protects human rights, and considers long term impact instead of focusing on technology-first fixes.

PDF | BibTeX
Inferring transportation mode from smartphone sensors: Evaluating the potential of Wi-Fi and Bluetooth

Andreas Bjerre-Nielsen, Kelton Minor, Piotr Sapieżyński, Sune Lehmann, David Dreyer Lassen. PLOS ONE, 2020

We show that using information from pervasive Wi-Fi access points and Bluetooth devices can enhance GPS and geographic information to improve transportation detection on smartphones. Wi-Fi information also improves the identification of transportation mode and helps conserve battery since it is already collected by most mobile phones.

PDF | BibTeX

2019
Interaction data from the Copenhagen Networks Study

Piotr Sapiezynski, Arkadiusz Stopczynski, David Dreyer Lassen, Sune Lehmann. Nature Scientific Data, 2019

We released the multi-layer temporal network connecting more than 700 students over a period of four weeks. The dataset was collected via smartphones as part of the Copenhagen Networks Study and it includes physical proximity, metadata for calls and text messages, as well as a static Facebook friendship graph, and gender information. My collaborators and I already published multiple papers based on the data, now we're happy to finally share it with the rest of the scientific community!

PDF | BibTeX | data descriptor
Discrimination through Optimization: How Facebook's Ad Delivery Can Lead to Biased Outcomes

Muhammad Ali*, Piotr Sapiezynski*, Miranda Bogen, Aleksandra Korolova, Alan Mislove, Aaron Rieke. CSCW 2019

Most of research in discriminatory advertisting concerned abuse of targeting features: excluding Black and Latino people from seeing housing ads, excluding older workers from job ads, etc. In this work, we showed that even if the advertisers want to show their ads to a diverse audience, Facebook will preferentially present them to users who Facebook predicts to be more interested. As a result, women see different job ads than men (supermarket cashiers and janitors vs. AI specialists and lumberjacks), while white and Black people are presented with different housing opportunities.

PDF | BibTeX | press coverage
Auditing Offline Data Brokers via Facebook's Advertising Platform

Giridhari Venkatadri, Piotr Sapiezynski, Elissa M Redmiles, Alan Mislove, Oana Goga, Michelle Mazurek, Krishna P Gummadi. WWW 2019

Facebook does not only know the information you share in your profile or your browsing history. We find that on top of that Facebook has been buying information about more than 90% of their US users from data brokers. At least 40% of it (including financial information) is not at all accurate, potentially affecting not just the ads you see but also credit decisions, etc.

PDF | BibTeX
Quantifying the Impact of User Attention on Fair Group Representation in Ranked Lists

Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, Christo Wilson. Companion Proceedings of WWW 2019

We interact with ranked lists everyday through web search results, job postings, or dating services. Arguably, a fair representation of a group (for example women among job applications) requires that this group gets enough attention as a whole. That attention depends both on where they are in the ranking and on how much of that ranking is actually seen. In this paper we model the interplay between these two factors.

PDF | BibTeX
Investigating sources of PII used in Facebook’s targeted advertising

Giridhari Venkatadri, Elena Lucherini, Piotr Sapiezynski, Alan Mislove. Proceedings on Privacy Enhancing Technologies, 2019

Facebook nudged their users to enable two-factor autheticantion by providing their phone numbers "for additional security". In turn, the advertisers can now use this phone number to target these users with ads. Even if the users didn't enable their 2FA but went with the default option of using FB Messanger for text messagning - their phone number is now targetable. Worst of all - even if you never gave your phone number to Facebook for any reason but any one of your friends had your phone number in their phone book and allowed Facebook access - your phone number is now targetable.

PDF | BibTeX | press coverage

2018
Academic performance and behavioral patterns

Valentin Kassarnig, Enys Mones, Andreas Bjerre-Nielsen, Piotr Sapiezynski, David Dreyer Lassen, Sune Lehmann. EPJ Data Science, 2018

Based on data collected from smartphones and Facebook, we find that for a big part of students academic performance of their friends is more predictive of their own performance than attendance is (but not for all of them, see our other paper). Showing up for classes consistently and not playing with phones during lectures are still most predictive individual features.

PDF | BibTeX
Evidence for a conserved quantity in human mobility

Laura Alessandretti, Piotr Sapiezynski, Vedran Sekara, Sune Lehmann, Andrea Baronchelli. Nature Human Behaviour, 2018

Humans have been shown to have fixed maxiumum capacity for the number of people they can maintain active ackquaintanceships with (because of mental, not time constraints) known as the "Dunbar number". In this work we show that such a capacity exists also for the number of active physical locations - for example, when you find a new restaurant, you tend to stop going to one of your previous favorites.

PDF | BibTeX | press coverage

2017
Evidence of Complex Contagion of Information in Social Media: An Experiment Using Twitter Bots

Bjarke Mønsted, Piotr Sapieżyński, Emilio Ferrara, Sune Lehmann. PLOS One, 2017

The spread of diseases follows a simple contagion model - everytime you're exposed to a virus or bacteria, there's a certain probability of getting sick. It has been hypothesised that spread of information and trends follows a complex contaigion model, in which you need multiple sources of exposure to pick it up. Using a coordinated group of Twitter bots we disseminated positive messages to real people and showed they are more likely to retweet our content if they're exposed to it from multiple sources compared to just being exposed multiple times from the same source.

PDF | BibTeX | press coverage
Inferring person-to-person proximity using WiFi signals

Piotr Sapiezynski, Arkadiusz Stopczynski, David Kofoed Wind, Jure Leskovec, Sune Lehmann. ACM Interactive, Mobile, Wearable, and Ubiquitous Technologies, 2017

We find it's possible to reliably infer whether two people are in close proximity by comparing which WiFi routers their phones see. At the time of writing, 80% of Android apps had access to the nearby WiFi routers at all times, posing a massive privacy risk.

PDF | BibTeX
Academic performance prediction in a gender-imbalanced environment

Piotr Sapiezynski, Valentin Kassarnig, Christo Wilson, Sune Lehmann, Alan Mislove. FATREC Workshop on Responsible Recommendation at RecSys, 2017

Our other paper on predicting academic performance from individual behavior and social network shows that social ties are predictive of one's grades. In this paper we show that it's mostly the case for men (majority) in the dataset, but not for women (minority). Any machine learning algorithm by default optimizes for overal performance, and as an effect women get worse predictions than men. We suggest achieving parity through selecting such combinations of features that lead to a more balanced performance.

PDF | BibTeX
The Role of Gender in Social Network Organization

Ioanna Psylla, Piotr Sapiezynski, Enys Mones, Sune Lehmann. PLOS ONE, 2017

We observe population level differences between men and women in the Copenhagen Networks Study, especially with respect to their social networks: women are much more likely to be friends mostly with other women and, as a minority, are on the periphery of the university network.

PDF | BibTeX
Multi-scale spatio-temporal analysis of human mobility

Laura Alessandretti, Piotr Sapiezynski, Sune Lehmann, Andrea Baronchelli. PLOS ONE, 2017

We show that the distributions of distances and waiting times in between consecutive locations human mobility trances are best described by log-normal and gamma distributions, respectively, and that natural time-scales emerge from the regularity of human mobility.

PDF | BibTeX

2016
Inferring Stop-Locations from WiFi

David Kofoed Wind, Piotr Sapiezynski, Magdalena Furman, Sune Lehmann. PLOS ONE, 2016

Your smartphone scans for WiFi every couple of seconds, usually even if you disable it. In this paper we show that this data reveals clearly, second by second, whether you're stationary or in motion. Whole it sounds simplistic, the stop-location detection is used for many location-related analyses like extracting points of interest, transportation modes, schedules, etc.

PDF | BibTeX

2015
Temporal Fidelity in Dynamic Social Networks

Arkadiusz Stopczynski, Piotr Sapiezynski, Alex "Sandy" Pentland, Sune Lehmann. The European Physics Journal B, 2015

We are moving towards using smartphones to trace proximity events that drive epidemic spreading. In this work we show that very low-level decisions about how often to detect contacts, and how to process the data has immense impact on the results of epidemic modeling. We use real Bluetooth data about proximity among 500 people.

PDF | BibTeX
Tracking Human Mobility using WiFi signals

Piotr Sapiezynski, Arkadiusz Stopczynski, Radu Gatej, Sune Lehmann. PLOS One, 2015

We find that time series of WiFi scans contain a strong latent location signal. Because humans are very stable and repetitive in their mobility, each person spends the vast majority of their time next to just a few routers - knowing the location of these few, we can infer a persons location during most of the day These results reveal a great opportunity for using ubiquitous WiFi routers for high-resolution outdoor positioning, but also significant privacy implications of such side-channel location tracking.

PDF | BibTeX | press coverage
Opportunities and Challenges in Crowdsourced Wardriving

Piotr Sapiezynski, Radu Gatej, Alan Mislove, Sune Lehmann. Internet Measurement Conference, 2015

The estimates of a phone's physical location is usually obtained using the Global Positioning System (GPS), or by calculated based on proximity of WiFi access points with known location. Most of the research regarding the creation of databases that hold locations of WiFi routers was based on data collected both artificially and over short periods of time (e.g., during a one-day drive around a city). In contrast, most in-use databases are collected by mobile devices automatically, and are maintained by large mobile OS providers. We address this situation using the deployment of over 800 mobile devices to real users over a 1.5 year period. We identify a number of challenges in using such data to build a WiFi localization database (e.g., mobility ofaccess points), and introduce techniques to mitigate them. We also explore the level of coverage needed to accurately estimate a user’s location, showing that only a small subset of the database is needed to achieve high accuracy.

PDF | BibTeX

2014
Measuring large-scale social networks with high resolution

Arkadiusz Stopczynski, Vedran Sekara, Piotr Sapiezynski, Andrea Cuttone, Mette My Madsen, Jakob Eg Larsen, Sune Lehmann. PLOS ONE, 2014
"The Big Paper" summarizing the setup for the PhD projects of Arek, Vedran, Andrea, Mette, and mine. 1000 phones handed out to 800+ undergrads at DTU in the (still) largest ever deployment of the kind. Measuring location traces, interactions offline and online, and gathering rich psychological profiles. Dozens of papers were written based on the data we collected in this study. You can see all of mine on this page but additionally you should check these out:
- The scales of human mobility by L Alessandretti, U Aslak, S Lehmann in Nature (2020).
- How Physical Proximity Shapes Complex Social Networks by A Stopczynski, AS Pentland, S Lehmann in Nature Scientific Reports (2018).
- Understanding predictability and exploration in human mobility by A Cuttone, S Lehmann, MC González in EPJ Data Science (2018).
- Optimizing targeted vaccination across cyber–physical networks: an empirically based mathematical simulation study by E Mones, A Stopczynski, AS Pentland, N Hupert, S Lehmann in Journal of The Royal Society Interface (2018).
- Class attendance, peer similarity, and academic performance in a large field study by V Kassarnig, A Bjerre-Nielsen, E Mones, S Lehmann, DD Lassen in PLOS ONE (2017).
- Fundamental structures of dynamic social networks by V Sekara, A Stopczynski, S Lehmann in PNAS (2016).
PDF | BibTeX | press coverage
2013
Measuring personalization of web search

Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, Christo Wilson. Proceedings of the 22nd International Conference on World Wide Web, 2013

My first "algorithm audit" study, a year before the term "algorithm audits" was coined by Christian Sandvig. We explored the potential for "filter bubbles" in Google Search results and found very little evidence for them, beyod location-based personalization. A number of approaches for controlling noise that we introduced in the paper became standard practice for many studies that followed.

PDF | BibTeX | press coverage

Full publication list is available on my Google Scholar profile.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013