Here is Waldo: Anonymity in the Age of Big Data

by Muhammad Aurangzeb Ahmad

Data_fingerprint_sqaureThe television series Person of Interest posits the existence of a machine that can monitor every person’s daily activities and can then use this information to predict crimes before they happen. While such a system may be way off in the future, a system that can at least identify the identity of any person may not be that far off. Annonymity used to be private affair, if one wished to remain anonymous then all that one had to do was to lay low and limit one’s interactions with outsiders. It was easier to adopt pseudo-identities, the nature of the internet even facilitated this to a greater extent. I should know this because I have been blogging as a Chinese Muslim for almost 10 years now. New waves of technologies aided by Big Data however are changing nature of anonymity with evermore levels of sophistication needed to be truly anonymous.

Even in the ideal case where John Doe disengages from the digital world i.e., does not own a smart phone, only carries cash, does not use any online service etc, others can still leak information about John e.g., pictures that his friends might put up on social media platforms, post something on Facebook, geo-tag one another etc. Locating a person, determining their likes or dislikes would really depend upon how much information their family and friends are leaking about them. In short you are only as anonymous as your most chatty friend.

In cases where we think that we are not giving away any explicit information about ourselves, much can be inferred from the digital traces that we leave. The manner in which we shop online, respond to messages, play video games etc can reveal a lot about ourselves even when we do not want to reveal anything. In our previous work we have observed that it is possible to predict a person’s gender, age, personality, marital status and even political affiliation by just studying at how they play video games. This is just the top of the iceberg; a case in point is the case where Target’s data analytics were able to infer that a girl is pregnant even though she was able to hide this from her parents.

The main takeaway is that we always reveal something about ourselves even though we may think that we are role playing. In our current (unpublished) work we have even observed that it is possible to predict family relationships (parent, sibling, spouse, offspring etc) with a high degree of accuracy by just studying texting patterns with no access to the content of text message.

Alternatively let us consider the massive amounts of data that large corporations and major retailers like Walmart, Target etc are collecting about their customers. It is now quite easy to cheaply buy data about people from third party sources so that not only does one know what items a person is buying but also where they live, their age, gender and household structure. While some organizations have policies in place that restrict them from collecting and using certain types of data without our consent, this self-imposed restriction is not true for every organization. It is also true that most people do not have time to read through 100 pages of EULA. Combine this with algorithms that can predict missing information about a person and one has a recipe for a system that can figure out what you are going to do next (with in a particular domain) with a high level of accuracy.

But what does this mean for us as individuals and for the society as a whole? It will become increasingly easier to answer the question – Where is Waldo? Not only that but one could even tell you here is Waldo and the list of places that he has been in the last 3 years, his eating habits and his likely future purchases. Before we start chanting alarmist slogans about a dystopian post-privacy era we should also look at the centrifugal forces in the privacy debates. Large corporations also have incentives to not violate their customer’s privacy in order to have a certain level of trust with their customers. Apple’s stance of non-corporation with the government on issues related to customer privacy is a case in point.

While one should be vigilant one should not be alarmist, there was an uproar many years ago when Google announced that they would be adding a search feature to Gmail. It turned out that all the privacy doomsday predictions were unfounded. Some amount of data collection is necessary to offer services like recommendation whether it is in music, movies, food etc. Algorithms can only be as good as the data that is fed to them. Thus, one should not rush to the conclusion that anonymity is over.

The flipside of patterns extracted from Big Data is that these patterns also give one a readymade recipe for behaving in a certain way and remain anonymous. Big Data also makes it easier to fake certain personality traits. Even with very crude profile stuffing Ashley Madison was able to lure thousands of men to buy their membership. This leads us to consider under type of risk to anonymity – data breaches. As the fallout from the Ashley Madison leak suggests one’s indiscretions on the Internet have a way to follow on the offline world with a single torrent dump. More recently, a service has emerged which uses Tinder’s API to notify its paid customers if their partner is cheating on them. These cases should not be shocking or surprising – after all information in the wild can rarely be tamed.

If today’s de-annonymization algorithms look impressive then the future is even more fascinating. Google’s deep learning system can already identify the location of almost any picture with very high level of accuracy, Facebook’s facial recognition system can already beat humans, gait identification algorithms can identify any person by the way that one walks, recovering what was typed by the sound of typing is already an old technology and the list goes on. Each of these technologies is impressive in its own regard but taken together one has the hallmark of a system that can deanonymize almost any person on the planet. If we think that it bad enough that governments and large corporations have access to these type of technologies wait till such systems become open source and become accessible at the palm of your hand. It is certainly not the stuff of Singularity Sky but it does open up vistas for a brave new world for which most of us may not have the time to be ready.