Skip to Content

Specialized Brain Regions Recognize Vocal Cues That Don’t Involve Speech

July 28, 2022

Specific parts of the brain recognize complex cues in human vocal sounds that do not involve speech, such as crying, coughing or gasping—found researchers from the University of Pittsburgh.

In a paper published in PLOS Biology, scientists showed that two areas of the auditory cortex are specialized to recognize human voice sounds that, unlike speech, do not carry linguistic meaning. Rather, they help us react to sound cues that allow people to instantly identify characteristics of the person who is speaking, such as gender, approximate age, mood and even height—all without seeing them.

“Voice perception is similar to how humans recognize different faces,” said senior author Taylor Abel, MD, assistant professor of neurological surgery at Pitt. “Voices that don’t include speech—for example, a baby’s cries, coughing, moaning or exclamations—allow us to gain a lot of information about the person making those vocalizations in the absence of other information about the person.”

Humans live in a world full of sounds, where noises from the environment shape our daily interactions with our surroundings and other people. And even though speech is one of the unique aspects of human communication that does not have direct analogs in the animal world, people do not rely on speech alone to convey auditory information.

Non-speech aspects of voice serve a vital role in our communication toolbox, expanding human ability to express oneself accurately and dynamically. Part of that expression is subconscious, and part of it may be intentionally modulated by the speaker to convey a wide spectrum of emotion, such as happiness, fear or disgust.

Humans are born with the capacity for voice recognition—in fact, babies can recognize their mother’s voice while still in the womb—but that capacity is dynamic, and it continues to evolve throughout adolescence.

Abel, who is a practicing pediatric neurosurgeon specializing in epilepsy, had a unique opportunity to peek at how the human brain responds to voice.

To identify regions of the brain that are responsible for generating seizures in some people with epilepsy, neurosurgeons may implant temporary electrodes into the brain to carefully record its electrical signals. This practice allows physicians to precisely locate the site of the seizure and eventually remove that part of the brain, while sparing the surrounding healthy tissue.

Eight patients with epilepsy consented to participate in a study where Abel and his team used the implanted electrodes to measure which areas of the auditory cortex responded when voice sounds—grunts, yelps, laughs—were presented to the patients.

Using a combination of direct brain recordings and computational modeling, investigators were able to describe in unprecedented detail how voice representation evolves over time and decode when a voice sound had been played based on patterns of neural activity from the auditory cortex.

Researchers found that most of that activity came from two regions in the auditory cortex—folds of the brain’s gray matter known as superior temporal gyrus (STG) and superior temporal sulcus (STS). While prior brain imaging studies showed that the STG and STS are important for voice processing, this study demonstrates that these regions represent voice as a distinct sound category rather than simply representing the physical or acoustic aspects of voice.

This new knowledge about the organization of the voice-recognition system wired in our brains will enable researchers to better understand neurological disorders such as schizophrenia or autism, where voice perception is altered or missing, and even help create better voice assistant devices, which are currently good at recognizing speech but less adept at differentiating between several speakers.

Kyle Rupp, PhD, is lead author on the paper; additional authors are Jasmine Hect, Madison Remick, Avniel Ghuman, PhD, and Bharath Chandrasekaran, PhD, all from Pitt; and Lori Holt, PhD, of Carnegie Mellon University.

This research was supported by the National Institutes of Health (grants R21DC019217-01A1 awarded and 2R01DC013315-07).