“This can only be useful work in the domain of blindness, situation impairment, and accessibility in that it may be possible to convey limited Web page information spatially, dynamically, and with a high degree of comprehension at seven (or nine) times faster because of the ability to comprehend highly parallel speech.” One of my ‘A History of HCI in 15 Papers’
It is unlikely that Colin Cherry1 realised the significance the community would place on the small five page paper he sent to the Acoustic Society of America in 1953 . By its tone and disjoint nature it is unlikely that he realised it would become a landmark paper within the auditory display community specifically defining the problem area of multi-talker display systems. Indeed, at first reading the paper seems to be nothing more than a collection of loosely related experiments into auditory cognition in a personal attempt to understand how we perceive sound.
The focus of the paper centres on section 2: “the separation of two simultaneously spoken messages”, in which Cherry first poses the question “how do we recognise what one person is saying when others are speaking at the same time (the “cocktail party problem”)?”. For me the important aspect of this paper is not the results, although these are interesting, but the isolation of the question which has since then defined an area.
In brief, Cherry suggests that when presented with two speech recordings listeners can repeat a specific narrative word by word or phrase by phrase to a reasonably accurate level. In addition, he finds that when the message is composed of simple clichés strung together identification of the entire cliché can be achieved by a listener just hearing a few words from it – although message separation appeared impossible because the listener picked roughly equal numbers of phrases from each simultaneously presented voicing. Further when a listener was subjected to different narratives within each ear it was found to be easy to listen to either separately, however, language in the ear that was not being concentrated on was not comprehended; with caveats: all speech sounds were identified as speech as opposed to an auditory tone; changes to the voice between male and female were always identified; as were changes to the tone; reversed speech was identified as having something wrong with it; however in normal speech, listeners could not identify any word or phrase or indeed decide if the language was English. When the same message was played in both ears but with a variable time delay the listener would realise as the delay decreased (to an order of milliseconds) that the same message was being played into both ears even though they were concentrating on language going to just one ear. Finally, when one message was periodically switched between the ears of a listener concentrating on input to just one ear the listener repeated 100% correctly the faster the period.
What does all this mean? Well we now have between seven and nine people intelligibly multi-talking at the same time  – based on spatial location and voicing. This can only be useful work in the domain of blindness, situation impairment, and accessibility in that it may be possible to convey limited Web page information spatially, dynamically, and with a high degree of comprehension at seven (or nine) times faster because of the ability to comprehend highly parallel speech.
- Cherry, E. (1953). Some Experiments on the Recognition of Speech, with One and with Two Ears The Journal of the Acoustical Society of America, 25 (5) DOI: 10.1121/1.1907229
- Brungart, D., & Simpson, B. (2005). Optimizing the spatial configuration of a seven-talker speech display ACM Transactions on Applied Perception, 2 (4), 430-436 DOI: 10.1145/1101530.1101538
- Cherry fostered research in the technical and theoretical aspects of telecommunication, including digital signal processing, coding theory and global communication. His book On Human Communication (1957) was very influential at the time. He focused on auditory attention, specifically regarding the cocktail party problem. This concerns the problem of following only one conversation while many other conversations are going on around us. He conducted many experiments trying to explain this. His contributions influenced cognitive science (he is often considered to be a pioneer in this field even though he would never describe himself as a cognitive scientist).
2 thoughts on “The Cocktail Party Problem [#accessibility #a11y]”
Pingback: A History of HCI in 15 Papers | Thinking Out Loud…
Pingback: Adapting Interfaces to Suit Our Senses | Thinking Out Loud…