The Stealthy Sounds of Cocktail Parties

by Gautam Pemmaraju

Tuning of the World

A month or so ago, at a dinner party, my ears locked into a conversation between two men from a reasonable distance away. Drinks and hors d'oeuvres were still being served; glasses clinked in cheering, and the chinaware made all sorts of bright, transient sounds as they were picked up, put down, or met with cutlery. There was soft, unobtrusive music serving as a unifying background, festooned ever so often by exclamations, polite laughter, and other speech. On occasion a chair would be moved— possibly the only abrasive sound to this genteel soundscape. Ice cubes were dropped gingerly into my drink, not enough though for to not make the slightest of sounds (or did I imagine that?). As snatches of sentences reached me from various sources, I was able to telescopically isolate the conversation that I had quite unintentionally, unwittingly, chosen to eavesdrop upon. They spoke of a fairly prominent public figure and his current whereabouts. Perhaps it was his name that had drawn my attention. His libertine ways, peccadilloes so to speak, were the topic of conversation; in particular, there was mention of some frolicking in a bathtub filled with champagne that had apparently made quite a, well, splash. Some boisterous laughter ensued, filling the room with its force. And just like that, the sonic space had changed, and as I gathered my wits, my ears shifted focus seamlessly to a friend who sat down next to me. “Cheers!” she exclaimed and we clinked glasses. It seems now to me a sensory rupture of sorts—my eavesdropping was barely a minute, and yet, it seemed oddly long. That it was a relatively ‘hi-fi' party—marked by a more favourable signal to noise ratio—helped matters for it was easier to pick out individual, pellucid sounds that may have otherwise been masked in a different setting, In hindsight though, it all seems now a trick of the mind, a temporal illusion experienced by a disembodied double and flagged by auditory cues. Much like cinema.

In Radio and TV/Film production, crowd sounds of innumerable varieties have for long been created and in post-production of films the interpolation of these sound effects into the soundtrack is known as ADR, or Automated Dialogue Replacement. Specific recordings would be created for a variety of settings—from a small group in a foyer of a building; coffee shops, busy restaurants, or provincial bars; courthouses or train-station lobbies; a hectic emergency room of a city hospital; the animated sounds of a basketball court; a medium size wedding gathering; children on a school trip to the zoo; the sharp, snippy ambience of a hair salon; not to mention cocktail parties, small, medium, big or any iteration thereof; whatever the setting, the appropriate background sound would have been created. These sound effect recordings included murmur, indistinguishable chatter of people in the background—often ‘acted' out by a group of actors. The sound effect (and the group) came to be generically known as walla during the radio era, on account of the fact that actors were made to repeat the word “walla” to create the illusion of chatter. The indistinct nature of this background ambience was of specific value since, for the purposes of radio, unlike visual media where attention can be drawn to the foreground character through close-ups, the voices of the principle characters had to remain in the fore, distinct and clearly comprehensible. So no really identifiable speech elements were placed in the sonic background. Interestingly and perhaps consistent with the cliché of British silliness, in the UK, the word ‘rhubarb' was used in place of ‘walla' for actors to create background murmur (rhababer in Germany, rabarber in the Netherlands). There is even a meta-narrative screwball tribute to this unique sonic craft in the form of a short film by the legendary British writer and director Eric Sykes in which every bit of dialogue is the word “rhubarb.” Other phrases used to create crowd murmurs include “carrots and peas” and “watermelon, cantaloupe.” These days however, precise scripted dialogue is performed by actors for background sound, given the technological advancements at all levels—from production to projection. Interestingly though, it is singularly curious and thought provoking that we are able to rein in the absurd in such utilitarian ways. The absurdist artifices we contrive are put to work. But perhaps in our conjuring there exist conspiratorial gaps between what we employ and what we discard. To my mind, there is intriguing phenomenological depth to the sounds we produce in a controlled studio setting in order to produce a simulacrum of realistic background ambience, the purpose of which is to ‘fill in' the voids of a fabricated sonic space; to aurally bind all the words that are foregrounded in this etheric auditory realm. It is tempting to be drawn towards the spectral nature of disparate, phantom sounds, floating about asynchronously, mostly unheard. Through the multi-channel surround sound systems of movie halls, these desolate fragments inevitably pass into our ‘real' world, only to slowly decay— leaving phantasmal traces as they fade to an inaudible death.

As always, my takeaway from the cocktail party, aside from the outstanding food, the lovely company and the mildly entertaining gossip, was sheer amazement at how we can very precisely, and selectively, choose to focus on one sound source in a setting of many others. This is referred to as the Cocktail Party Effect. It is a function of ‘binaural' hearing, whereby our ears tackle multiple sonic stimuli individually and collectively. We are able to then localize, pinpoint the direction and position of the source in focus very accurately. We are able to do so primarily by employing two fundamental cues—the difference in amplitude (or loudness, intensity) of the sound coming to each ear, and the difference in the time that the sound reaches each ear. The ‘ear in shadow' receives the sound signal at a different intensity and time than the one receiving the sound directly. We are able to then very precisely locate the sound source. A third, fascinating way we are able to localize sound is by the factoring in the frequency variations of the sound in each ear. The shape of our heads and our external ears, the pinnae, modify the frequency profile of the sound collected in them according to the origin of the sound. Some frequencies of the sound signal are boosted, others attenuated by each ear. So each pinna acts as an individual equaliser.

The British electronics engineer and cognitive scientist Colin Cherry first proposed the Cocktail Party Problem in 1953. The study of the problem seemed to have arisen due to difficulties faced by air traffic controllers in processing the various voices of pilots coming through one loudspeaker. Cherry's seminal experiments were linked to a listener's attention and the ability to separate a set of sound inputs. His tests involved ‘shadowing' of sound recordings and what later came to be termed as ‘dichotic listening'. Two sound streams were played to the listener who was tasked with repeating back the content of one of them, firstly by speaking it out aloud, and in a variant, writing the message down. In a second set of experiments, two different sound streams were played into the two ears of the listener through headphones. The first instance proved to be more difficult, wherein listeners were able to recall only certain phrases or parts of the message, despite repeated auditions. Writing it down proved to be easier, and listeners were able to recall the shadowed stream better. In the second instance, listeners were more successful at separating the shadowed stream from the unattended one, easily choosing one over the other, and switching between them as well. In a further variant, the language of the unattended stream was changed to German but listeners failed to notice. However, when the voice changed from male to female it caught the attention of listeners, and some of them noticed something ‘queer' when the recording was reversed (tape playing backward). Broadly, Cherry's work demonstrated that many variables come into play when listeners separate a chosen sound over background noise—from gender, direction, pitch, to the rate of speech. But as Barry Arons indicates here, “the broad statistical properties of the signal in the rejected ear were recognised, but details such as such as language, individual words, and semantic content were unnoticed.” Discussing the monotonous voice in the unattended ear, the writer, quoting D. A. Norman (Memory and Attention, 1976), points out that aside from being aware of some speech, the listener usually does not recall any more details:

This is what might be called the “what-did-you-say'' phenomenon. Often when someone to whom you were not “listening'' asks you a question, your first reaction is to say, “uh, what did you say?'' But then, before the question is repeated, you can dredge it up yourself from memory. When this experiment was actually tried in my laboratory, the results agreed with our intuitions: there is a temporary memory for items to which we are not attending, but as Cherry, James, and Moray point out, no long-term memory.

The underlying principle behind this pretty remarkable ability that we have, clearly more than just a party trick, is Selective Auditory Attention—basically, selective hearing. While it still remains a somewhat mystifying phenomenon and scientists have not been able to pin it down precisely, it is an exciting area of research. We may be able to filter sounds with great sophistication but how exactly we do this remains not entirely clear till date. The area has wide scope and there is multidisciplinary interest in it—from neurobiology, physiology, cognitive psychology, biophysics, computer science and engineering. Recent research in the area is said to be the ‘first step' towards building an auditory brain-computer interface (BCI) to aid those in wheelchairs, amongst other applications. Over the decades, since Cherry's pioneering research, a number of theories have emerged and several researchers have contributed to the area. This excellent short video summarises the main theories of selective attention.

The second name in the video is particularly interesting due to some of the more publicly accessible work of the researcher—the British cognitive psychologist Diana Deutsch. An expert in the psychology of music, Deutsch has not just conducted and published scholarly work on auditory illusions (see this video), but has also published sound recordings in the area. In a fascinating description of one such illusion, Deutsch writes on how the mind plays tricks on us. A sequence of two alternating tones spaced an octave apart is presented to both ears simultaneously. But while one ear gets the low tone, the other receives the higher one, and vice versa. So it is essentially a “single continuous two-tone chord” that the listener hears through headphones. As Deutsch writes, “One can easily imagine how this pattern should sound if perceived correctly. However, such a precept is very rare. Instead, a variety of paradoxical illusions are obtained.” The listener instead hears “a single intermittent high tone in one ear, which alternates with a single intermittent low tone in the other ear.” (This sort of psychoacoustic oddity is reminiscent of the McGurk-McDonald Effect and binaural beats, about which I have written before here). Intriguingly—for reasons other than perceptual and cognitive science—Deutsch writes at the outset:

…powerful illusions may readily occur when multiple streams of sound emanate from different regions of space.

She points out further that sounds in our natural environment are “subject to considerable and complex changes before they reach our eardrums.” Consequently, we have developed “special purpose mechanisms” that help us grasp these effects and deploy them to our advantage.

It is this idea of illusion, of trickery, and tomfoolery, that is of great interest to me. It challenges the traditional hierarchies of the perceived ‘real', and feed our febrile imagination. This is a most beguiling liminal space in between the real and imagined—the realm of sonic phantasmagoria. What indeed do we hear there?

The Canadian composer, writer and thinker R Murray Schafer's project of acoustic ecology, of recovering the “music of the environment“, is a key artistic intervention in our understanding of the sounds that abound—those that surround us and those we create. How do we conduct this relationship; how do we ease the frictions between the two? There is a ‘narcotic' effect to modern sounds Schafer writes; the musique concrète (see this fun video) has led to a ‘schizophonia'—a rupture between an “original sound and its electroacoustic transmission or reproduction.” The sound and its maker have been ontologically separated. And it is the fear of silence, and indeed of the ultimate silence, death, make propels man “to surround himself with sounds in order to nourish his fantasy of perpetual life.”

In an interview with me last year, the British composer and writer David Toop spoke of the auditory and visual hallucinations that his mother suffered not long before she passed. She apparently heard children singing and saw the floor turn to water. When asked by the doctor if she was frightened by the visions, she replied that she could accept them if only she knew they were not real. “This”, Toop said, “is how we listen to sound.”

Perhaps amidst the sounds of cocktail parties, in between the mirth and schadenfreude, there exist stealthy ghosts that speak of the tragic limits to our earthly lives.

I end here with these words from David Toop's book, Sinister Resonance:

Sound is a haunting, a ghost, a presence whose location in space is ambiguous and whose existence in time is transitory. The intangibility of sound is uncanny— a phenomenal presence both in the head at its point of source and all around, and never entirely distinct from auditory hallucinations.