Personal Data for Public Good

by Libby Bishop

ScreenHunter_1937 May. 09 11.08 In a speech in St Petersburg, Alexander von Humboldt called for the first large-scale research project across Russia to investigate the effects of deforestation on climate. The year was 1829. This project embodied the key features of Humboldt's methods: empirical, comparative and collaborative. He was part of the Republic of Letters, an international community of scholars that exchanged the scientific ideas. Earlier, he had made a five-year voyage to the Americas where he identified 2000 new species, discovered the magnetic equator, and explored live volcanoes. Hungry for yet more knowledge, he wanted comparative data from the East as well. Despite being nearly 60 years old, Humboldt travelled 10,000 miles in six months into Russia collecting data on the climate effects of deforestation, irrigation, and silver smelting. He was even expert at data visualisation, as his Naturgemälde (“painting of nature”) demonstrates by showing plant names and zones on Chimboraza, a peak in the Andes that he climbed. This diagram exemplified his approach: data-driven, fusing science and art. Doggedly, he sought to unify and connect, resisting the tendency of Enlightenment science to divide and classify (Wulf 2015). By sharing his data in the belief that knowledge should advance public, not private, interests, he was ahead of his time.

Humboldt would have revelled in the volume and variety of big data available to us today. He faced dangers from jaguars, earthquakes, and altitude sickness; our challenges are of another kind. Humboldt collected data primarily about natural phenomena, and thus he did not have to worry about privacy and risks to research subjects from disclosing their data. Today, much of the data needed to further research in key areas, such as health, is about people. Protecting privacy is indeed a hard problem, but we can look to Humboldt's courage and ingenuity as a model for how to approach our data challenges. As he would have done, we must find and promote ways to deploy personal data for public good while protecting privacy. In Europe, this work will go forward under new rules recently announced governing the protection of personal data.

The General Data Protection Regulation

On 14 April 2016, the European Parliament approved the nearly final text of the General Data Protection Regulation (GDPR), replacing the existing Data Protection Directive. The GDPR is intended to update, harmonise and strengthen data protection law across Europe and beyond. The impact will be global because entities outside the EU targeting EU consumers will be subject to this new Regulation. Moreover, unlike the previous Data Protection Directive, the GDPR takes effect “as is,” not mediated by national laws of implementation.

Earlier drafts of the Regulation had language that would have dramatically curtailed scientific research, but fortunately, this has been modified. In the current version, there are limitations on using data for new purposes different from those for which it was collected, but—crucially—there are exemptions for “scientific and historical research.” There are other changes that will affect research. The phrase “sensitive personal data” has been replaced with “special categories” of data (e.g., race, ethnicity, religion, etc.). Any processing that might reveal these categories is now also covered. Consent remains one key mechanism that can authorise the processing of personal data, so long as that consent is freely given, informed, specific and unambiguous. Consent for future research will be acceptable if it complies with “recognised ethical standards for scientific research.”

The GDPR protects data subjects' rights

The GDPR retains and may even strengthen the rights of data subjects regarding their personal data. These rights remain broadly the same (to access, erasure, etc.) but are more detailed, and stronger restrictions are placed on data controllers for some uses. Subjects must be informed about profiling and any consequences of being profiled, and can object to certain decisions based on automated processing. Subjects can even demand a review by a human if they have been harmed by automated decisions. Currently, controllers' “legitimate interests” can justify processing data for purposes different from the original ones. Under the GDPR, if subjects “do not reasonably expect further processing”, then their rights could prevail over the rights of data controllers. Controllers must conduct data protection impact assessments—similar in spirit to environmental impact assessments—if processing entails a “high risk” to the rights of individuals.

The GDPR – a compromise, or compromised?

While the Regulation trumpets data subjects' fundamental rights, these rights are not absolute. “The right to the protection of personal data is not an absolute right; it must be considered in relation to its function in society and be balanced with other fundamental rights, in accordance with the principle of proportionality.” By harmonising national laws, one of the GDPR's main objectives is to promote economic growth, especially by facilitating “the free flow of personal data within the internal market.” To this end, it adopts an individualistic view of data, namely, by claiming that privacy and trust can be achieved by ensuring that “natural persons should have control of their own personal data.” This is a complex debate (Bishop 2016), but briefly, much personal data, such as social network and genetic data, also discloses the identities of others, making it impossible to draw an unambiguous line between personal and non-personal data (Barocas and Nissenbaum 2014).

An essential alternative

Even if the GDPR does not go as far as some privacy advocates might have wished, it warrants support because it grounds data protection in a fundamental right to privacy, which in turn is based on respect for human dignity. This differentiates it—profoundly—from data protection in the U.S. In fact, it was on the grounds that U.S. law compromises “the essence of the fundamental right to respect for private life” that the Safe Harbour agreement, which enabled U.S.-European data transfers, was invalidated in 2015. The U.S. approach is piecemeal, with different laws for medical, educational, and other data, and relies on industry self-regulation (DeCew 2013). In sharp contrast, the GDPR can impose fines up to 4% of a company's global revenues for some breaches. Even more alarming, there are growing calls in the U.S. to treat personal data as a commodity that individuals ought to be free to sell. Given recent events, we should be sceptical of yet more financialisation as a solution to complex social and economic challenges (Morozov 2015). John Muir, an admirer of Humboldt, warned: “Nothing dollarable is safe, however guarded.”

Innovative methods for sharing data

Biobanks, genetic databases, and data repositories are forging legal, technical and ethical systems for responsibly sharing personal data. The UK Biobank developed an Ethics and Governance Framework with extensive public consultation. The Personal Genome Project: UK takes a different approach, acknowledging that privacy and anonymity cannot be guaranteed. It asks participants to volunteer to open their genetic data based on full knowledge of possible risks. The UK Data Service provides five protocols for people, projects, settings, outputs, and data—the Five Safes—to enable safe and secure access to confidential and sensitive microdata.

Humboldt resisted those who wanted to use science solely for private financial gain. When others sat on their data until their deaths, he shared his. When political leaders commanded his expeditions to seek diamonds, he collected deforestation data. Our challenge is to share his vision—to see data, even personal data, for how it can enhance the public good in crucial areas of health, agriculture, the environment, and urban planning. To achieve this, we must also protect privacy, with vigilance. Failure to do so will erode the already declining trust people have in the institutions handling their data and provoke a backlash against using data for research, even research that would benefit humanity (Royal Statistical Society 2014). This emerging diverse ecosystem of techniques to enable the use of private data for public good honours and extends Humboldt's legacy.

* Quotations are from the GDPR text as of 14 April 2016.