Voice in the Machine – Artificial Intelligence Unravels the Secrets of Language

by Ed Simon

“But then again, what has the whale to say?” wondered Ishmael in Herman Melville’s 1851 novel Moby-Dick, “therefore the whale has no voice,” but this isn’t accurate. Sperm whales – of which Melville’s titular white whale is one – have an intricate series of clicks and bellows that if not language per se, certainly seem like communication, albeit in a manner foreign to human experience. Furthermore, not only do sperm whales, humpback whales, blue whales, dolphins and other members of the cetacean family make noise, they are capable of imitative sounds, of repeating complex strings of noises – songs and clicks of varying pitch and duration – back to each other. That’s an ability that, to varying degrees, is not just the purview of whales, but of seals and birds, bats and humans (but notably not some of our closest primate cousins, who though capable of understanding us are unable to vocally repeat what we’ve said). Now, a landmark study written by research teams at Carnegie Mellon University and the University of California at Berkely have been able to deploy complex machine learning technologies for the first time in untangling the genetic basis for language acquisition across tremendously varied types of animals.

The Bill and Melinda Gates Center at CMU doesn’t much feel like Ishmael’s ship The Pequod, but it turns out to be the perfect place to, if not discuss what the whale has to say, at least how the whale is saying it (and how bats, seals, and people are saying what they have to as well). Appearing more like a tech campus in Palo Alto than Pittsburgh, with an impressive central spiraling staircase whose shape is equal parts Frank Lloyd Wright and DNA’s double helix, the Gates Center is where I met team leader and professor of computational biology Dr. Andreas Pfenning who is the lead author on the study published in Science, the flagship journal of the American Association for the Advancement of Science. Pfenning explained to me how beyond traditional anatomical and behavioral approaches, the CMU and Berkey study “opens up studying animal communication from a genetic perspective.”

The interdisciplinary field of computational biology, a growing field which Pfenning is a prominent researcher within, applies digital technologies to analyze biological data in ways that would have been impossible even a few years ago. What Pfenning’s team in Pittsburgh and Dr. Michael Yarstev in California did was to apply artificial intelligence technology to analyze – and in some cases predict function across – the genomes of 222 species, including varieties of bats, whales, and primates, who are all capable of being able to imitate complex sounds, discovering that even where their genomes might be radically different, the portions of the genomes which are involved in vocal communication are deployed in similar ways, an example of convergent evolution.

Whales and bats are obviously far apart on the evolutionary tree, and they vocalize for a variety of reasons, be it social cohesion, echolocation, or communication. But despite these variations in why they make such sounds, the segments of their genome associated with these abilities functions in the same way. There are a variety of implications to this discovery, from being able to eventually fully understand the development of language in human beings to elucidating a variety of conditions related to language acquisition, including autism. For example, Pfenning’s CMU-based tech startup Snail Biosciences is beginning to adapt this research to aid in development of targeted gene therapies. Yet one of the most innovative aspects of the study wasn’t just what the teams uncovered, but how they uncovered it.

Pfenning noted that machine learning was used in analyzing massive amounts of data, not just genetic, but also behavior, cellular, and neural information about the individual species in questions, but most impressively they were able to deploy artificial intelligence to predict the outcome of difficult experiments that typically take years to conduct in animals like various whale species without having to use the actual remains of such a creature itself. There is, as Pfenning said, an ethical component to being able to “dissect a simulated brain,” noting that even beyond the logistical difficulty of experimenting on whales, this new computational method also allows researchers access the detailed features of the biology of some endangered species for the first time.  A good thing to, for across that chasm of difference between bats and seals, whales and humans, the unlikeliest new intelligence of computers have been able to discover more that’s similar than might have once been supposed, though even Melville could recognize in the whale a kindred creature “both ponderous and profound,” adding of the animal “Oh! happy that the world is such an excellent listener!” Talker too.

***

Ed Simon is the editor of Belt Magazinean emeritus staff-writer for The Millions, and a columnist at 3 Quarks Daily. The author of over a dozen books, his upcoming title Relic will be released by Bloomsbury Academic in January as part of their Object Lessons series, while Devil’s Contract: The History of the Faustian Bargain will be released by Melville House in July of 2024.