John McQuaid in Undark:
The problems originate in the mundane practices of computer coding. Machine learning reveals patterns in data — such algorithms learn, for example, how to identify common features of “cupness” from processing many, many pictures of cups. The approach is increasingly used by businesses and government agencies; in addition to facial recognition systems, it’s behind Facebook’s news feed and targeting of advertisements, digital assistants such as Siri and Alexa, guidance systems for autonomous vehicles, some medical diagnoses, and more.
To learn, algorithms need massive datasets. But as the applications grow more varied and complex, the rising demand for data is exacting growing social costs. Some of those problems are well known, such as the demographic skew in many facial recognition datasets toward White, male subjects — a bias passed on to the algorithms.
But there is a broader data crisis in machine learning. As machine learning datasets expand, they increasingly infringe on privacy by using images, text, or other material scraped without user consent; recycle toxic content; and are the source of other, more unpredictable biases and misjudgments.
More here.