by Espen Sommer Eide
It is a tingling sense of presence in the room, when I finally press play on the generated audio file, and hear my trained deep-learning neural net try to formulate new and never before spoken sentences in a language where the last fluent speaker passed away in 2003. When Edison invented the phonograph, it was soon conceived as a means not primarily to play music, but to hear voices of dead persons. The voices recorded on the phonograph were experienced as sounds without bodies, as spirits in space. Listening intensely to the sound, at first I can hear only static noise, but deep inside it various spectral shapes and pulses are starting to make themselves present. I think this is what it must have felt like for Edison when he played his first ghost-like recording of a human voice.
Two early versions of experiment:
Recently there have been big breakthroughs in the field of artificial intelligence and machine learning. Over a period of just a couple of years, it has found new and novel uses in everything from self-driving cars and medical image processing to automatic translation algorithms, including speech recognition and natural language processing. Companies such as Google, Facebook, Apple, Amazon, Microsoft and the Chinese firm Baidu are currently competing in hunting down and clearing out whole computer science departments at universities around the globe, in order to employ the best heads in the field.
One of the technologies driving this revolution goes by names such as deep learning and deep neural networks. In short, the form of computing that is inspired by the brain and its billions of neurons working in parallel to interpret and act in accordance with its surroundings. What has made this old idea of neural networks make such a comeback is the recent availability of big data – large data sets used in the training of the networks, and also the speed of parallel processing in modern GPU chipsets.
As an artist and electronic musician with a keen interest in language and computing, I came across an article published fall 2016, where a group of Google scientists had turned towards the field of audio to try to improve artificial speech[1]. What triggered my imagination was not the fact that they had succeeded in making computer speech sounding much more natural, but the weird by-products of trying the technology out on musical material and other sounds. I had to try this out myself and I fearlessly installed the necessary software on one of Google's cloud-based computing engines to run the tests. My first experiments were with a collection of water-insect field recordings, and also with my own music to see if it could learn to "sound" like tracks of my musical projects phonophani or alog (possibly putting me out of work in the process!).
Read more »