People

Study at Cambridge

Awards

Search our site

Quick links

Useful information

Our research sites

Contact details

Online user experience feedback

Matthew H. Davis and Ingrid S. Johnsrude (2003)

Hierarchical Processing in Spoken Language Comprehension

In press at The Journal of Neuroscience, 23(7), p.?-??

Abstract:

Understanding spoken language requires a complex series of processing stages to translate speech sounds into meaning. In this study, we use functional magnetic resonance imaging to explore the brain regions that are involved in spoken language comprehension, fractionating this system into sound-based and more abstract higher-level processes. We distorted English sentences in three acoustically different ways, applying each distortion to varying degrees to produce a range of intelligibility (quantified as the number of words that could be reported) and collected whole-brain echo-planar imaging data from 12 listeners using sparse imaging. The blood oxygenation level-dependent signal correlated with intelligibility along the superior and middle temporal gyri in the left hemisphere and in a less extensive homologous area on the right, the left inferior frontal gyrus (LIFG), and the left hippocampus. Regions surrounding auditory cortex, bilaterally, were sensitive to intelligibility but also showed a differential response to the three forms of distortion, consistent with sound-form-based processes. More distant intelligibility-sensitive regions within the superior and middle temporal gyri, hippocampus, and LIFG were insensitive to the acoustic form of sentences, suggesting more abstract nonacoustic processes. The hierarchical organization suggested by these results is consistent with cognitive models and auditory processing in nonhuman primates. Areas that were particularly active for distorted speech conditions and, thus, might be involved in compensating for distortion, were found exclusively in the left hemisphere and partially overlapped with areas sensitive to intelligibility, perhaps reflecting attentional modulation of auditory and linguistic processes.

To request a pdf version of the paper please email: matt.davis@mrc-cbu.cam.ac.uk

Stimuli:

Three forms of distorted, yet still intelligible stimuli were used in the study. The three forms of distortion (and normal speech) are shown in the spectrograms below.

Click on the spectrogram to hear an example sentence in each form of distortion (and as normal speech):

Normal Speech:

Noise-Vocoded Speech:

Segmented Speech:

Speech in Noise:

The distorted example sentences above are all of medium intelligibility. For each form of distortion, three levels of intelligibility were constructed: low intelligibility (~20% words reported correctly from each sentence), medium intelligibility (~65% words reported correctly), high intelligibility (~90% words reported correctly). The intelligibility of each form of distortion was assessed from a pilot behaviuoral study in which participants had to either type the words heard in each sentence or give a 9 point rating of intelligibility. In both the pilot study and the fMRI study, report scores and subjective ratings were closely correlated.

In the graph below, you can click on a data point to play an example speech sound with that type and level of distortion.

Or click the speaker in the table below:

All the speech distortions were created using Praat software with the kind assistance of Paul Boersma and Chris Darwin. If you would like to find out more about any of the three forms of distortion please contact matt.davis@mrc-cbu.cam.ac.uk.

This page was created on 19th March 2003. Comments and suggestions to matt.davis@mrc-cbu.cam.ac.uk.

Segmented Speech

Vocoded Speech

Speech in Noise

Normal Speech

High Intelligibility

Medium Intelligibility

Low Intelligibility

Signal Correlated Noise