(Heatmap of users tweeting the N word in the US, from the Geography of Hate project at Humbolt University)

In Do Artifacts have politics? Langdon Winner identifies that certain technologies are democratic or autocratic regardless of the intent of the creators of the technologies. The most well known example that Winner uses to illustrate his point is a set of overpasses that were made in Long Island in the 1930s. These overpasses were purposely built so low that public buses could not pass under them. These had the desired effect that public transportation could not reach certain beaches which were mostly used by rich white folks. The road infrastructure itself excluded access by people of color and working class white people. Even though there are many laws against racial discrimination it would take millions of dollars to change the overpasses so that the road infrastructure entrenches an aspect of racism and classism.

With the recent racist Google Maps hack, the question of racial bias in Big Data and racism in algorithms has come to the fore again. However the Google Map example is closer to the case of consciously rigging the data so that it produces certain results which is more akin to google bombing. Consider the case of Latanya Sweeney which triggered the debate regarding racial bias in algorithms. Dr. Sweeney is an African American and a Professor at Harvard University as well as the director of the Data Privacy Lab at Harvard. Dr. Sweeney observed that search for tradional African American names on Google return suggestions for looking at the person’s criminal record but that was not the case of traditional Caucasian names. This created some ruckus in the media and even Google stepped in and apologized after stating that there was nothing intentional on its part. While Google has fixed this problem, the same cannot be said for other search engines like Yahoo! Here is the screen grab of the search results for Latanya Sweeney with the ad for criminal records highlighted here.


Dr. Sweeney is not the only person at Harvard in her department whose name triggers this ad but it is only triggered by her name and the other African American faculty members in the department. Here is the other example.


If one searchers for the other faculty members in her department then one does not get such links in the ads. Much has been written about this issue in the past and it was supposedly fixed. It seems that Google is not the only one that needs to fix this issue in its backyard. People of color face this issue on a regular basis.

Another simple but telling example is that of searching traits of people. The auto-suggest function in Google reveals what similar search terms people have search for in the past. The following two examples of a religious and a racial group speak for themselves.



Even the Daily Show highlighted the issue last year with their segment on racism. What is going on over here? These searches reveal more about the population of users in Google. More often than not it is by accidental auditing that one discovers these flaws in technological systems. People who argue against any sort of tweaking in the algorithms argue that the algorithms are a mirror of reality. What this aphorism fails to acknowledge is that it is not the physical reality bur rather the social reality that we are mirroring. Social reality by its very definition inherently flawed and biased towards one group or the other/

Adding “objectivity” to any algorithm or systematic analysis would add bias not because the analysis itself or the algorithm that is used to analyze the data is biased but rather the systems (e.g., law enforcement departments or the judicial system) that generate the data have bias. Consider the scenario where African Americans are more likely to be incarcerated for a particular offense but other people are less likely to be charged. Over time the data will show higher crime rate and incarceration rate for African Americans even though it is the bias in the system itself that is leading to this state of affairs. Any algorithm or other type of analysis will reveal this observation. The bias will remain until and unless we add explicit conditions to check against other rates of incarceration for the same crime committed for other groups or segment the data.

Such type of biases can also carry over to other domains like recommendation systems. Consider the infamous case of admissions decisions at the St. George’s Hospital Medical School. On the surface, the idea of having an unbiased system that uses past data of admittance to make decisions about the future makes sense as it would not have the same bias as human beings in decision making. However just the fact that we are using past decisions which could have been made by biased people does not reduce the bias. It rather perpetuates the bias because if minorities were left out in the past because of some systematic bias then even the “unbiased objective” algorithm will be making the same biased decisions.

One way to reduce the bias and tackle this issue is algorithmic auditing. Consider the following illustrative example. Based on historical transaction data a targeted advertisement campaign targets 1,000 users. All of these happen to be white even though the algorithm is using the history of click, usage and purchase patterns to determine which users should be targeted. A question arises, is the algorithm being racist? At a fundamental level of course not because there is nothing explicit in the algorithm that states that it should target or not target a particular set of people. It is the bias in the system (judicial, educational, governmental etc) that leads to the production of data which is then fed into the algorithm.

Why stop here? In the future we may have a scenario where some people who want to introduce automation even in jury decisions and a judge’s decision. Just imagine the result if historical data is used by such a system or algorithm to make its decisions without any tweaking or conditions. We may end up with a scenario where this efficient computing judiciary is as biased if not more biased as compared to its human counterparts. Crime prediction and sentencing thus has the potential to be a socially divisive issue. The flipside of racial bias in data is that one can also use big data to point towards systematic bias in the system. To sum up the argument one can say while algorithms and data may not be racist.


  1. The word technology is used in many ways. Here I use it to mean a class of products or artifacts, and in other instances to use it to refer to the production system which produces those artifacts.

    Winner’s comment that you cite about technologies doesn’t quite work with the overpass example. In that example, technology, both as the concrete construction process or as the resulting structures does not discriminate. However, clearly, the way it was used to produce structures that served a particular purpose, was discriminatory. In contrast some technologies can be discriminatory by their very nature e.g. speech recognition technology discriminates against the dumb, visual display of text discriminates against the blind, sound or speech production discriminates against the deaf. A class of products such as smartphones built with these technologies cannot be used by the types of populations just mentioned (yes, I know the phones can be built to provide haptic feedback but the principle remains) and if only these technologies are used, no phones can be built which serve these populations.

    Data collection by people (or systems) is inherently “discriminatory” because its collection proceeds based on filtering according to certain interests. It is also the case that data collected in one context and then used in another context can produce results that are discriminatory as per some of the examples that you cite.

    The “Why do” search results are based on what users have been searching for, the search system does not appear to be skewing the results. A similar case arises were correlations are used to produce certain results (search results and related ad placement). If the correlations are generated purely statistically and the elements which are correlated are also identified purely statistically i.e. via statistical machine learning, it does not seem reasonable to consider that discriminatory unless we begin allocating intelligence and intent to the algorithms themselves, and AI isn’t here yet. It is possible for machine learning algorithms to have bias built into them but if they are purely data driven, it is the data selection which will introduce bias and not the algorithms.

    BTW, I noticed that in the Claudine Gay search results, despite the presence of the background checking ad, most of the images in the search results were those of white women – hardly evidence of deliberate bias.

    All this furor over algorithms is going to lead to more intrusive algorithms and algorithm tweaking which in itself could become discriminatory. Tuning an algorithm to avoid certain results generated from purely statistical searches introduces deliberate bias, though perhaps in a currently socially acceptable way. But who is to say that such tuning cannot be used in a deliberately nefarious way?

    Another point is that in a large software system it is practically impossible to test all possible execution paths in a practically acceptable length of time. Deploying such a system in large scale use (which in itself is a way of testing at scale) will generate some unexpected results. People can live with that or tune the algorithm to remove those results which opens up the problem I describe in the preceding paragraph.

    People need to be a little more reasonable and a little less sensitive and exercise good judgment. If some results need to be eliminated, it would be better to practice independent, post hoc filtration than to build judgmental filtration into an algorithm itself. The former case is more easily reviewed and corrected, and possibly yields less intrusive algorithms, than the later case.

  2. I’ve never thought about this issue, but as you suggest algorithms could go beyond reflecting the racism which exists in society to actually perpetuate it into the future. And I think that means they could indeed exacerbate it. To this point, one thought that comes to mind for me is not just the fact that there are some racist results. These results are offensive and hurtful of course. But also, these search results will lead people information or opinions that are biased which creates negative perceptions. The negative perceptions create racist attitudes within people, which then leads to more racist social realities. There’s a harmful cycle at play, much in the way racist attitudes lead to racist policies/systems, which leads to marginalization, which leads to negative perceptions of the marginalized, which leads to racist attitudes on the part of those who are not marginalizes (and probably among the marginalized against themselves, i.e. internalization). I think this is what you’re getting at with your example of the overpasses. (great illustration)

    I wonder if the same logic, when used well, could be used to reverse racism/hate, not just in algorithms, but consequently in attitudes (social reality). But I’ll leave the ideas for how to make solutions like that work to the brilliant people like you 🙂

    I wonder if this is already happening in some ways as attitudes shift, the internet actually accelerates our level of acceptance on the one hand and racism on the other, depending on which way attitudes are shifting (e.g. LGBT acceptance and hatred toward muslims on the other). The manipulation of algorithms could also be problematic because of the biases brought into that work by the people doing the manipulating.

    You’ve done some really excellent work here that opens up a lot of really important questions.

    1. What is even scarier is that people can infer other things about you that you do not post online. That is going to be the topic of another post.

Leave a Reply

Your email address will not be published. Required fields are marked *