Automata and Computability in Computer Security

For our automata and complexity final, Dhyey Shah, Mina Zakhary, Zachery Grey Davis, and I made a video explaining weird machines and exploits through FORCEDENTRY. We emphasized how concepts we covered in class applies to computer security. Check out the slides here.

Semgrep Rules for Machine Learning

I wrote some Semgrep rules to help people develop more secure ML software and avoid getting into a real pickle. Read this blog post to learn more about dicey practices in the ML ecosystem, featuring Big Pickle and an unreliable RNG.

The New Jim Code in New York City High Schools

I worked with two of my classmates on an ArcGIS storymap that expands upon The Markup and The City’s investigation of algorithmic bias in NYC HS admissions. Click here to open the storymap in a new window.

Fairies, Cybernetics, Plays, and Cyborgs

An Analysis of the Comic Correctives of William Shakespeare’s A Midsummer Night’s Dream and William Gibson’s Neuromancer

This blog post was adapted from my term paper for AP English Literature.

Cyborgs entering sophisticated virtual reality systems in a cyberpunk future would seem to be completely disparate from crafty fairies fooling young lovers in Classical Athens. However, these images are both representations of the incongruities that underlie William Gibson’s Neuromancer and William Shakespeare’s A Midsummer Night’s Dream. Whether the authors are combatting the rigidity of the patriarchy or the inhumanity of cybernetic enhancement, incongruities serve as a vehicle for self-reflection and repairing the status quo, working within the comic corrective frame proposed by Kenneth Burke (Renegar & Dionisopoulos 327). Whereas William Shakespeare develops a comic corrective hinged upon the dialectical tension between the past and the present with A Midsummer Night’s Dream, William Gibson’s Neuromancer is a comic corrective reliant upon the dialectical tension between the present and the future, a divergence that results in contrasting conceptions of humanity, mimesis, and transformation.

Russian literary critic and semiotician Mikhail Bakhtin observed the institution of the carnival as a model for the subversion of authority through its parody of the ruling class. This interpretation of A Midsummer Night’s Dream has been rejected by the majority of scholars (Derrin 425). Shakespeare epitomizes the carnival through the world of the fairies. However, instead of parodying the ruling class as Bakhtin’s theory supposes, the fairies parallel it. When the fairy king, Oberon, battles his wife, Titania, over the changeling child, he parallels Theseus and Egeus’s exertion of patriarchal dominance in requiring Hermia to choose between death and a forced marriage (Shakespeare 2.2.15-25). Resembling the hierarchal, traditional Athens, the carnival world of the fairies is characterized by struggles of greed, power, and jealousy. The carnival is as much in need of repair as Athens itself. Furthermore, Bakhtin’s notion of subversion ignores the context of A Midsummer Night’s Dream. As a playwright writing for royalty and the aristocracy, Shakespeare produces conservative works that seek to repair, but not revolt against, the status quo, reinforced by the resolution of his plays.

Bakhtin’s flawed interpretation was corrected and refined by American literary theorist and philosopher Kenneth Burke (Renegar & Dionisopoulos 325). Bakhtin conceived of meaning as a perpetual mechanism of negotiation between individuals in a certain society, a concept that constitutes dialogism. Meaning is plural and open to interpretation. Heteroglossia, the intertextual nature of novels and other narratives, was espoused by Bakhtin to be a subversive force that resists the unifying agents operating within most cultures. This force fuels Bakhtin’s conceptualized metanarrative. A fellow proponent of metanarrative and dialogism, Burke diverged from Bakhtin in that he clearly acknowledged the emergence of contradictions as a result (Henderson 4). He proposed the comic corrective frame to illustrate the role of contradictions in fostering alternative vision. Contradictions, also referred to as incongruities or unseemliness, highlights dialectical tension while providing distance in order to promote critical reflection. The carnival is merely a technique for the exhibition of contradiction. Moreover, Burke recognizes the comic corrective frame as a tool for both subversion and repair of the status quo, facilitating the application of this frame to narratives written by conservative figures like Shakespeare (Derrin 427).

A Midsummer Night’s Dream “draws upon … [the tradition of] Athens as a way to visualize breakings apart and remakings of the social order structured by fresh ideals that can frustratingly seem only ever to be partially realizable,” clearly examining the dialectical tension between the past and a changing present (426). The values of Athens’s status quo are implied by the contradictions and incongruities of the play. Consider that the sodden fields and fogs described by Titania and the characterizations of the artisans are English in nature, and have no basis in any historical record of Athens (Shakespeare 2.1.66-102). Therefore, these are also the values of Shakespeare’s Elizabethan England.

Humor is produced by instances of failed distinction-making, and is paralleled by the ongoing political negotiations of consistency within the play. For example, Lysander, in attempting to elucidate his rationale for permission to marry Hermia, humorously quips that Egeus should marry Demetrius, the preferred suitor, himself. Lysander moves on to state that he is as “well derived” and “possessed” as Demetrius, but has “the love of Hermia” (Shakespeare 1.1.99-110). In doing so, Lysander calls attention to the arbitrary parental love Egeus has bestowed upon Demetrius that not only discounts empirical assets, but a responsibility to love and beauty within the Athenian authority. After a venture into the world of the fairies, Theseus, the epitome of the Athenian authority, endorses Lysander and Hermia’s marriage, rectifying Athens’s disregard for love and suggesting a rectification in England (Shakespeare 4.1164-174).

As opposed to relying upon the dialectical tension between the past and a changing present, Gibson fosters corrective criticism by showcasing a world that is simultaneously present and future in Neuromancer (Renegar & Dionisopoulos 339). Reflecting the postmodern times the novel was written in, Neuromancer is a piece of writerly fiction that illustrates the consequences of technological fascination and proliferation, establishing a singular entelechial extension requiring contextual dialogue on behalf of the audience in contrast to Shakespeare who presented multiple evolutions contained purely within his version of Athens. The incongruities and contradictions of this novel rely upon the accelerating and counter-cultural nature of technology, displaying both the wildest dreams of the futurists and the most frightening nightmares of the Luddites. Irony and casuistic stretching, introducing new principles while theoretically remaining faithful to old principles, develop these incongruities. For example, Gibson extends the idea of cybernetic enhancements to fabricate cyborgs. The most notable cyborg is Molly, the protagonist’s partner. With razor-sharp retractable bladers, vision-enhancing mirrored lenses sealed upon her eyes, and artificially heightened reflexes, Molly is portrayed as a formidable mercenary, emphasizing the power of technology. However, as the narrative develops, the audience learns about Molly’s past as a “meat puppet”, a prostitute preyed upon specifically for their enhancements (Gibson 250). Parallelling the contradictory nature of technology, Molly’s enhancements both amplify and degrade her.

Similar to Shakespeare, Gibson repairs the status quo, preserving the power and authority of technology by elaborating upon Wintermute’s (a powerful artificial intelligence) unifying capabilities, which included reuniting Case, the protagonist, with his dead girlfriend, Linda Lee. Wintermute also distanced Case from Molly, a carnival symbol that can be likened to the fairies in A Midsummer Night’s Dream (Gibson 303). While previous researchers reached a consensus that Neuromancer is a form of Burke’s comic corrective frame, the effectiveness of Gibson’s technique is a contentious issue. Many misinterpretations of Neuromancer have promulgated, likely due to the removal of comic distance caused by the fruition of Gibson’s ideas, including the concept of virtual reality and the Internet (Csicsery-Ronay 225).

Lauded for their lyricism, Gibson and Shakespeare bolster the functionality of their comic correctives through their language. In A Midsummer Night’s Dream, Titania orchestrates layered naturalistic metaphors in perfect iambic pentameter (Shakespeare 2.1.66-102); meanwhile, the artisans spoke in malapropisms and terse statements (Shakespeare 1.2.1-35). Equivalently, in Neuromancer, the decorative kinesthetic prose illuminating the opulence of the elite Tessier-Ashpool family stands in stark contrast to the profane terms describing street hustlers (Gibson 299). These social disparities forge incongruities. Furthermore, both writers impute meaning to the natural world through human-contrived mechanisms. While Shakespeare compares nature to a book, Gibson follows the modern tradition of the Newtonian mechanistic metaphor, ultimately operating to emphasize incongruities (Gough 9).

While both works rely upon fusions of humanity, A Midsummer Night’s Dream merges man and nature through the fairies, while Neuromancer merges man and machine through cyborgs. The magic and enchantment associated with the fairies parallel the technological entelechy of cyborgs. The difference between nature and machine is an extension of the difference between the past and future, developing from the divergence in dialectical tensions. Shakespeare clearly embodies the Romanticism of his time period when elucidating the disposition of the fairies, while Gibson established “Neuromanticism”, a new fascination with technological advancement, through his cyborgs (Gough 14). In the work of Donna Haraway, the cyborg is celebrated as way of escaping human, and most particularly gender limitations (15). Similarly, fairies represent the chaotic carnival escape. However, in the spirit of the conservative comic corrective, neither Gibson nor Shakespeare embraces the progressive nature of these creations. The latent subservience to the status quo is present in the fates of Neuromancer’s Molly and A Midsummer Night’s Dream’s Titania. As a cyborg, Molly is forced into prostitution, a gendered patriarchal punishment for her cybernetic enhancements (Gibson 133). Meanwhile, the fairy queen, Titania, not only is requested to submit to Oberon’s authority on multiple occasions, but is humiliated when he doses her with a love potion, causing her to fall in love with a lower-class artisan with the head of a donkey (Shakespeare 3.1.64-70).

Transformation and change are at the heart of Neuromancer and A Midsummer Night’s Dream; uncontrollable, accelerating forces regulate the events of the narratives. Neuromancer pursues entelechial extension with its choice of technology, while A Midsummer Night’s Dream wields love to connect the changes in the present with the past. The carnival exposes more incongruities within these forces in both narratives. Neuromancer’s action sequences in the Atlanta portion of cyberspace (Gibson 87) are as anarchic and turbulent as the mix-ups and altercations between the four young lovers in A Midsummer Night’s Dream (Shakespeare 3.2.65-74). These forces are the subject and object of the comic corrective. From the gestalt of cybernetic networks to virtual reality spectacles of rape, Neuromancer confronts the sublime nature of technology. However, Shakespeare adopts a more positive perspective on love, chiding the authority of Athens for not recognizing its significance. While both narratives conclude with the success of each force, Shakespeare’s play ends on a positive note, implying the inherent goodness of love, which exists in opposition to Neuromancer’s overtones of tragedy when the artificial intelligence acting at the antagonists fulfills its objective.

Shakespeare’s mimesis is expressed through a play within the play, while Gibson, once again, relies upon technology. Shakespeare was a member and shareholder of the Globe theater and had many patrons for his works, resulting in his choice of theater as an art form indicating the necessity to connect to the larger audience and extend a long-lasting tradition. Gibson’s depiction of technology as art is, contextually, memorable and unique, emphasizing his future-focused mindset. Shakespeare creatively situates the moon to portray the tragic play Pyramus and Thisbe as reality, creating a comic distance in order to reveal the true tragedy of love and life. Moreover, the artisan’s foibles in producing the play are essential in the evolution of multiple incongruities, including Theseus’s acceptance of imagination and love as well as Puck’s disposition (Hutton 292). However, “almost every character in Neuromancer is an artist of some kind (Csicsery-Ronay 233).” By treating technology as art, Gibson creates a direct reflection of reality, free from the complex distancing mechanisms employed by Shakespeare, a gauge of the ramifications of technology itself.

Kenneth Burke’s comic corrective frame explains the apparatus utilized by William Shakespeare and William Gibson in Neuromancer and A Midsummer Night’s Dream, respectively. Portrayals of mimesis, humanity, and transformation serve as manifestations of the differing dialectical tensions each comic corrective is hinged upon, ultimately assisting the author in repairing the status quo and cultivating reflection.


Csicsery-Ronay, Istvan. “The Sentimental Futurist: Cybernetics and Art in William Gibson’s Neuromancer.” Critique-studies in Contemporary Fiction. Vol. 33, p. 221-240. 1992, doi:10.1080/00111619.1992.9937885.

Derrin, Daniel. “The Humorous Unseemly: Value, Contradiction, and Consistency in the Comic Politics of Shakespeare’s A Midsummer Night’s Dream.” Shakespeare, vol. 11, no. 4, 2014, pp. 425–445., doi:10.1080/17450918.2014.925962.

Gibson, William. Neuromancer. Ace Books., 1990. Print.

Gough, Noel. “Neuromancing the Stones: Experience, Intertextuality, and Cyberpunk Science Fiction.” Journal of Experiential Education. 1992. Doi: 10.1177/105382599301600303.

Henderson, Greig. “Dialogism Versus Monoloism: Burke, Bakhtin, and the Languages of Social Change.” KB Journal, vol. 13, 2017.

Hutton, Virgil. “A Midsummer Night’s Dream: Tragedy in Comic Disguise.” Studies in English Literature, 1500-1900, vol. 25, no. 2, 1985, p. 289., doi:10.2307/450724.

Shakespeare, William, 1564-1616. A Midsummer Night’s Dream. New York :Signet Classic, 1998. Print.

Renegar, Valerie R., and George N. Dionisopoulos. “The Dream of a Cyberpunk Future? Entelechy, Dialectical Tension, and the Comic Corrective in William Gibson’s Neuromancer.” Southern Communication Journal, vol. 76, no. 4, 2011, pp. 323–341., doi:10.1080/1041794x.2010.500342.

Privacy, Machine Learning, and Monopolies

This blog post was adapted from my term paper for Computer Science Ethics. Private machine learning is a fast-paced field, and my views are likely to have evolved since the writing of this essay. I focused more on problems with the private machine learning space, avoiding discussions about potentially effective solutions- be it human-centered computing approaches or specific policy interventions.

Facebook’s targeted advertising revealed that a user was gay to his parents (Skeba & Baumer, 2020). The NYPD curated an image database of minors to compare with surveillance footage from crime scenes (Levy & Schneier, 2020). Abusive parents have used an app to detect if their child is discussing sexual matters online (Skeba & Baumer, 2020). Each of these scenarios depict harmful privacy violations engendered by the rapid adoption of machine learning in virtually every industry. Effectively engaging with these scenarios entails moving beyond examining those isolated incidents and understanding the widespread lack of privacy in machine learning. This phenomena can be explained by interrogating the power dynamics that underlie the evolution and development of machine learning. This is justified as the privacy of a system or lack thereof is a rearrangement of power and is, therefore, a political decision. (Rogaway, 2015). Machine learning has been largely driven by a few key companies considered to be tech monopolies, often referred to as Big Tech (Jurowetzki et al., 2021). The fundamental lack of privacy in machine learning is an expression of the monopolistic nature of these tech conglomerates and their power over the general public, particularly marginalized communities. Current efforts to further machine learning privacy frequently obscure the broader conditions that maintain injustice, empowering these companies rather than the public.

Privacy is largely defined as the capacity for an individual to choose what information is shared about them and how it is used. This definition has evolved over time, and there are multiple perspectives and notions that are subsumed by this definition. For instance, Nissenbaum contends that privacy must be analyzed within specific contexts and can be measured by conformity to cultural norms governing appropriate flows of information. This is known as contextual integrity (Nissenbaum, 2004). In contrast, Skeba and Baumer (2020) view privacy as a function of Floridi’s measure of informational friction - the amount of work required for an agent to alter or access the information of another agent. Despite these differences of opinion, there is a consensus that privacy cannot be fulfilled by anonymity alone, a fallacy that appears in many failed or deliberately weak attempts to preserve privacy (Desfontaines, 2020).

Privacy protects the public, but is particularly important for marginalized and otherwise vulnerable communities. The right to privacy is often contextualized as a precursor to more traditional threats to being, including blackmail, abuse, and imprisonment (Levy & Schneier, 2020), or as an integral precondition for the rights to free speech and assembly (Skeba & Baumer, 2020), rights recognized by and provided paramount importance in the United States Constitution and similar legislative codes. With the rise of targeted advertising, privacy also disrupts what is known as the right to future tense. To elaborate, mass manipulation through advertisements that target distinct groups can influence the actions of an individual, violating their right to future tense (Srinivasan, 2018). These negative effects are disproportionately felt by vulnerable and marginalized groups such as people of color and victims of domestic abuse (Skeba & Baumer, 2020; Levy & Schneier, 2020). Furthermore, privacy is predicated on consent. Marginalized individuals are the most likely to be forcibly subject to the technology, not informed about the implications of its use, or otherwise exist in situations where they cannot freely and authentically provide consent (Madden et al., 2017). Networked privacy problems intensifies the impact of privacy. Individuals in marginalized groups are in networks by association wherein the actions of one individual has a larger impact on the actions of another in the same marginalized group, especially when considering the utility of aggregated data in machine learning systems - a phenomena that will be elaborated upon (Madden et al., 2017).

Privacy keeps queer people safe in places where their identities are outlawed or persecuted (Skeba & Baumer, 2020), prevents domestic abusers from tracking their victims (Levy & Schneier, 2020), and protects different classes of people from stigma, surveillance, ostracization, abuse, incarceration, and other forms of harm (Liang et al., 2020). Facial recognition best exemplifies how a lack of privacy in machine learning specifically harms marginalized communities. Initially, critiques of facial recognition focused on discriminatory outcomes where these systems misclassified people of color at much higher rates, resulting in several false arrests (Stevens & Keyes, 2021). Inclusive representation was lauded as a panacea, but this only subjects marginalized groups to increased surveillance as it encourages collecting more data from them. More recent critiques of facial recognition demonstrate how it can be used to reinforce racist overpolicing and commercial exploitation as well as the fact that the data collection and algorithmic development establishing facial recognition is rooted in the exploitation and dehumanization of people of color (Stevens & Keyes, 2021). Improving the accuracy of these systems on marginalized groups, namely people of color, continues to harm these individuals by sustaining broader injustices.

Prior to examining the lack of privacy in machine learning, it is pertinent to specify who uses these algorithms. Since machine learning is a pervasive technology that is embedded into applications like facial recognition (as opposed to being directly purchased), not only are consumers not necessarily aware of the inclusion of machine learning, but those subject to this software are members of the public who may not have consented to use of this technology (Knowles & Richards, 2021). In the case of facial recognition, the onus of privacy invasions is primarily on the consumers: law enforcement and relevant commercial entities. Considering the responsibility of privacy and whether those subject to the technology have consented to its use is an analysis of the broader system wherein the technology exists.

Machine learning technologies are infrastructure assemblages, consisting of data, algorithms, and the broader system; analysis should occur upon each constituent part (Stevens & Keyes, 2021). The field of machine learning exhibits a fundamental lack of privacy in every component. Birhane and Prabu investigated problematic practices in ImageNet, the open-source vision dataset pivotal to the growth and success of machine learning. Amongst other issues, they discovered verifiably pornographic images that can be used to re-identify and blackmail women, highlighted down-stream effects in other models and datasets from privacy violations within ImageNet, and identified open datasets built on false conceptions of informed consent and anonymization. More significantly, they posit that the release of ImageNet for machine learning contributed to a culture of surveillance and widespread data collection without accounting for privacy and consent (Birhane & Prabhu, 2021).

Privacy is also intrinsically lacking within the algorithms themselves. Attacks on the privacy of machine learning algorithms include model extraction, model inversion, and membership inference. Model extraction enables an attacker to create a copycat of a model and, therefore, access the inferences of the system (Jagielski et al., 2020). Consider the consequences of attackers obtaining access to a copycat of the NYPD’s facial recognition system; it would be analogous to a leak of confidential, legal data. An individual can launch a model inversion attack to reveal data the system has been trained upon or launch membership inference to identify if a data point was in the training data of a model (Albert et al., 2020). To illustrate, these attacks would allow an attacker to gather images that the NYPD’s facial recognition tool has seen or determine if an individual was inside of the system’s training data. This would potentially reveal information about an individual’s relationship with law enforcement (i.e., whether they were previously arrested, incarcerated, suspected, etc.), disproportionately harming marginalized communities, most saliently people of color (Albert et al., 2020). Efforts to improve the privacy of these algorithms using techniques such as homomorphic encryption and differential privacy often require refashioning these algorithms entirely (Liu et al., 2021). There do not exist robust, well-known, scalable defenses to these attacks. There is also limited tooling to test models for these vulnerabilities, demonstrating the inadequacy of the current state of privacy for machine learning algorithms (Gupta & Galinkin, 2020; Hussain, 2020). Moreover, the right to be forgotten is an integral privacy stipulation in multiple regulations, but ensuring machine learning algorithms can forget specific data is still an open problem (Cao et al., 2015).

One final question remains in analyzing the failure of machine learning privacy: how does a lack of privacy in machine learning express itself in relation to the broader system? As previously stated, the public consists of individuals who did not consent to the use of machine learning, but are still subject to it, violating privacy by virtue of violating the principle of consent. Machine learning inherently complicates privacy by inferring information from data that may seem innocuous whether it is sexuality from Facebook likes or illnesses from sounds of coughs (Imran et al., 2020), rendering definitions of explicitly private personal information obsolete. Machine learning’s intrinsic affordance of scale bolsters this erosion of privacy. Humans cannot match the speed and volume of data processing (Stevens & Keyes, 2021). Most consequentially, machine learning actualizes surveillance and tracking technologies that aid human right violations such as mass incarceration and systematic genocide (Albert et al., 2020).

An analysis of the systemic failures of machine learning privacy is incomplete without mention of surveillance capitalism, conceptualized as the proliferation of violations of the right to future tense, a consequence of the erosion of privacy. Under surveillance capitalism, machine learning instruments Big Tech into a tool for private (and eventually state) surveillance. The more data these companies collect, the better they are at predicting behavior, which, in turn, allows them to seize more power in the world, cementing their status as monopolies (Zuboff, 2019). Doctorow disagrees with this assessment, emphasizing that believing in the effectiveness of these machine learning algorithms is fallacious. He maintains that monopolies establish surveillance, but that this technology did not form these monopolies. Rather, monopolies were the precondition for surveillance (Doctorow, 2021).

Machine learning’s irresponsible approach to privacy is frequently viewed as a technical problem, ignoring the circumstances of its creation, use, and deployment. This deficiency in machine learning should not be construed independently of its evolution inside of technological monopolies. Machine learning lacks privacy because technological monopolies prioritize profits over the well-being of their users. In fact, the very structure of machine learning reflects the nature of these monopolies (Dotan & Milli, 2020). By focusing on compute-rich and data-rich environments, this technology promotes the centralization of power, the core principle behind monopolies. To elaborate, compute power limits access to those with the financial means to obtain GPUs and similar apparatus and deepens the dependency on companies and organizations able to produce and obtain compute at scale. Similarly, data-rich environments privileges large companies and organizations able to collect sufficient amounts, simultaneously encouraging further degradation of the privacy of individuals (Dotan & Milli, 2020). Crucial to machine learning is optimization, a technique that arguably reduces complex questions of society, politics, and governance into economic problems, typically ignoring concerns from the general public and marginalized groups - the primary criticism of a monopoly (Kulynych et al., 2020). The infringement of privacy is seen as an externality of optimization similar to how monopolistic corporations will see privacy violations as an expense in light of legal and social repercussions (Swire, 1997).

The connection between privacy deterioration and technological monopolies is substantiated by Swire’s (1997) holistic framework on the forces affecting the protection of personal information commercially in combination with historical context on the relationship between the monopolistic companies under contention to privacy. Swire defines a market failure as an instance where a company provides less privacy than the consumer desires. According to Swire (1997), this is a product of information asymmetry and bargaining costs. A company will know more about the extent of their data collection and processing than their users. An individual user also does not have the authority and power to effectively negotiate with the company or hold them accountable. The companies are therefore incentivized to profit off of extracting more information since lawsuits and leaks are less probable and cheaper (Swire, 1997). Materializating this observation, Facebook was fined five billion dollars by the Federal Trade Commision for privacy violations. Although this was the largest fine ever levied against a company by the FTC, this was a trivial cost to Facebook (Federal Trade Commission, 2019). Srinivasan (2018) examines Facebook’s evolution over time into one of the largest monopolies in history and the epitome of surveillance capitalism. She explains that Facebook’s erosion of privacy was made possible by its lack of competitors. Whenever they were faced with competition, Facebook incorporated privacy concerns into their marketing and listened to concerns, shedding them as each competitor was eliminated. As their status as a monopoly calcified, their privacy failures became more egregious, implying monopolistic behavior drives a degradation in privacy (Srinivasan, 2018). By virtue of their positions as monopolies, Facebook, Google, and other Big Tech companies engaged in broad-scale commercial surveillance (Birhane & Prabhu, 2021), collecting data critical to the evolution and growth of machine learning, resulting in the current insufficient state of private machine learning.

Efforts to improve the privacy of machine learning are manifold. This begs the question: Are current efforts beneficial to the greater public and the communities most vulnerable to threats posed by the erosion of privacy? Unfortunately, there are numerous flaws with these endeavors that result in them further empowering large technology companies. First, they are frequently centered at the data and algorithms level as the systems perspective can incriminate the technology companies that drive machine learning. For instance, with regards to the lack of privacy in open-source datasets (Birhane & Prabhu, 2021), there is a natural inclination to rely on private datasets with less controversies around consent and re-identification. This action serves to strengthen companies with the means to create and maintain useful, private datasets. A systems perspective might inquire into the ethics of establishing a machine learning system in and of itself, a line of inquiry a monopoly might hope to quell.

This section on privacy is relatively out of date. It is pessimistic on differential privacy and technical approaches to machine learning privacy, favoring policy solutions. However, I believe that approaches such as differential privacy are vital- just not a panacea.

Second, private machine learning research often ignores specific classes of threats to vulnerable groups in favor of mathematical or purely technical measures of privacy (Rogaway, 2015). These measures mitigate privacy leakage in a limited number of situations. This includes differential privacy, federated learning, and encryption schemes. Differential privacy limits the re-identifiability of an individual within a dataset or machine learning model. However, it decreases model accuracy and is less effective on outliers or minorities in the data (Bagdasaryan et al., 2019). A large technology company like Facebook or Google would stand to benefit from the proliferation and normalization of differential privacy. Once again, the trade-offs of differential privacy can be offset with the vast quantities of data these companies can collect, entrenching their monopoly. The other techniques suffer from similar limitations. Many feasible privacy guarantees have schemes and threat models that assume a trustworthy machine learning provider, yet another technique privileging these companies (Rogaway, 2015). These measures drift from traditional, robust privacy lenses such as contextual integrity and information friction, making “privacy” more compatible with optimization and easier to integrate into economic problems. This does not significantly or directly benefit marginalized groups. A differentially private facial recognition system may prevent re-identifiability, but it is still used for surveillance and control. An individual may still have their data leaked from the facial recognition system due to their status as an outlier. The information asymmetry obscuring the information flows of the individual remain as do the bargaining costs that prevent the individual from holding the company accountable. These definitions of privacy fail to address the sociotechnical and behavioral dimensions of privacy across different social contexts.

Members of the public are subject to pervasive and embedded machine learning. They do not always consent to this usage and do not have the means to audit the privacy of these systems. Hence, they must trust “AI-as-an-institution” or the structural assurances society provides on the privacy of this software. The monopolistic nature of tech companies have ensured that distrust in the privacy of machine learning has not been a barrier to adoption (Knowles & Richards, 2021). Nonetheless, there have been industry self-regulation and government regulation acts that have attempted to improve machine learning privacy. Swire contended that industry self-regulation in light of monopolies would still result in market failure (1997). Monopolies establish the norms that self-regulation efforts abide by. Indeed, the ethics codes set by these tech companies focus on consumer rights, an issue that often does not consider the rights of the general public and vulnerable groups that are subject to the application, but are not the consumers (Washington & Kuo, 2020).

These monopolies also stand to gain from government regulation. Swire states that powerful tech companies can lobby for regulations that fortify monopolies by requiring access to privacy infrastructure or enforcing requirements only feasible for large companies (1997). Often, laws are over-broad or under-broad. If they are over-broad, privacy is loosely defined and companies with the resources to litigate and establish norms benefit (Swire, 1997). The most prominent privacy regulation is the European Union’s General Data Protection Regulation (GDPR). GDPR has two major flaws. First, it places the responsibility of privacy violation detection and advocacy onto individuals, maintaining the information asymmetry and bargaining costs that empower Big Tech. This “notice-and-consent framework” ignores situations where individuals may not authentically provide consent (Skeba & Baumer, 2020). Second, GDPR specifies certain classes of protected information, a protection rendered futile by machine learning’s ability to infer private information from supposedly banal data sources (Skeba & Baumer, 2020). Technical privacy metrics, self-regulation initiatives, and government regulations fail to interrogate the power imbalance between technology companies and the common people. Thereby, they help these companies consolidate power and ostensibly do not provide much benefit for the individuals most at risk.

Privacy was not built into machine learning from the onset, expressing the nature of the large tech conglomerates leading the field, companies that prioritize profit over privacy and subsequent harm to vulnerable groups. Technical and structural measures to improve machine learning privacy risk strengthening these monopolies. They do not address the power imbalance between companies and the greater public, particularly vulnerable groups, that is core to the erosion of privacy. Effective endeavors to advance machine learning privacy must empower the individuals subject to this technology.

References: Albert K., Penney J., Schneier B., & Kumar R. (2020). Politics of Adversarial Machine Learning. In Towards Trustworthy ML: Rethinking Security and Privacy for ML Workshop, Eighth International Conferenceon Learning Representations (ICLR).

Bagdasaryan, E., Poursaeed, O., & Shmatikov, V. (2019). Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems, 32, 15479-15488. Retrieved April 23, 2021, from

Birhane, A., & Prabhu, V. (2021). Large Image Datasets: A Pyrrhic Win for Computer Vision?. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 1537-1547). Retrieved April 23, 2021, from

Desfontaines, D. (2020). Lowering the cost of anonymization. (Doctoral dissertation, ETH Zurich). Retrieved April 23, 2021, from Doctorow, C. (2021). How to destroy surveillance capitalism. Retrieved April 23, 2021, from

Dotan, R., & Milli, S. (2020). Value-Laden Disciplinary Shifts in Machine Learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 294).

Federal Trade Commission. (2019, July 24). FTC Imposes $5 Billion Penalty and Sweeping New Privacy Restrictions on Facebook [Press Release]. Retrieved April 23, 2021, from

Gupta, A., & Galinkin, E. (2020). Green Lighting ML: Confidentiality, Integrity, and Availability of Machine Learning Systems in Deployment. International Conference on Machine Learning Workshop on Challenges in Deploying and monitoring Machine Learning Systems. Retrieved April 23, 2021, from

Hussain, S. (2020, October 8). PrivacyRaven Has Left the Nest. Retrieved April 23, 2021, from

Imran, A., Posokhova, I., Qureshi, H. N., Masood, U., Riaz, M. S., Ali, K., John, C. N., Hussain, M. I., & Nabeel, M. (2020). AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Informatics in Medicine Unlocked, 20, 100378.

Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., & Papernot, N. (2020). High accuracy and high fidelity extraction of neural networks. In 29th USENIX Security Symposium (USENIX Security 20) (pp. 1345-1362). Retrieved April 23, 2021, from

Jurowetzki R., Hain D., Mateos-Garcia J., & Stathoulopoulos K. (2021). The Privatization of AI Research(-ers): Causes and Potential Consequences – From university-industry interaction to public research brain-drain?. Retrieved April 23, 2021, from

Knowles, B., & Richards, J. (2021). The Sanction of Authority: Promoting Public Trust in AI. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 262–271).

Kulynych, B., Overdorf, R., Troncoso, C., & Gürses, S. (2020). POTs: Protective Optimization Technologies. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 177–188).

Levy, K., & Schneier, B. (2020). Privacy threats in intimate relationships. Journal of Cybersecurity, 6(1).

Liang, C., Hutson, J. A., & Keyes, O. (2020). Surveillance, stigma & sociotechnical design for HIV. First Monday, 25(10).

Liu, B., Ding, M., Shaham, S., Rahayu, W., Farokhi, F., & Lin, Z. (2021). When Machine Learning Meets Privacy: A Survey and Outlook. ACM Computing Surveys (CSUR), 54(2), 1-36.

Madden, M., Gilman, M., Levy, K., & Marwick, A. (2017). Privacy, poverty, and big data: A matrix of vulnerabilities for poor Americans. Washington Law Review, 95, 53. Retrieved April 23, 2021, from

Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79, 119. Retrieved April 23, 2021, from

Rogaway, P. (2015). The Moral Character of Cryptographic Work [Invited Talk]. Asiacrypt, Auckland, New Zealand. Retrieved April 23, 2021, from

Skeba, P., & Baumer, E. (2020). Informational Friction as a Lens for Studying Algorithmic Aspects of Privacy. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2).

Washington, A., & Kuo, R. (2020). Whose Side Are Ethics Codes on? Power, Responsibility and the Social Good. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 230–240).

Cao, Y. & Yang, J. (2015). Towards Making Systems Forget with Machine Unlearning. In 2015 IEEE Symposium on Security and Privacy (pp. 463-480).

Zuboff, S. (2019). Surveillance capitalism and the challenge of collective action. New Labor Forum, 28(1), 10-29.