I will finish off by discussing a TRACE model: a computational system designed to replicate the way humans perceive speech. I will discuss it in more details in the next post on Connectionism.
Acoustics of speech
Firstly, we need to understand what speech really is. In terms of acoustics (physics of sound), it is moving air particles, and we measure frequency of these movements in Hz (hertz = cycles/sec). Human speech is a complex sound, which means numerous frequencies are produced at the same time. The slowest one is called fundamental frequency, and it determines the pitch of one's voice (80-200 Hz for men, up to 400 Hz for women). Other frequencies are harmonics: they determine timbre. Thus, what we perceive as a speech is numerous waves of different frequencies.
Traditionally, psycholinguists organise speech into different levels: phonemes, morphemes, words, phrases and sentences. The question is, however, how our brain can possibly distinguish these levels in a raw frequencies signal. Written language can be described as 'beads on a string', with each letter and word following each other, being perceived one by one. Distinguishing language levels in speech is much trickier for many reasons, and one of them is Lack of Invariance.
Lack of invariance refers to the idea that there is no reliable connection between the language phoneme and its acoustic manifestation in speech. The same word, or even single phoneme, can sound completely differently depending on many factors:
1) Individual differences. Acoustic structure of speech depends a lot on a speaker's accent, physical and psychological characteristics.
2) Speech conditions.
3) Coarticulation. This is the idea that more than one sound is articulated at once, so each of them is partly shaped by the sounds surrounding it. The articulates (jaw, tongue, mouth) move from sound to sound, allowing us to speak faster, thus acoustic structure of each phoneme depends a lot on its 'neighbours'. Consider the following spectrograms of a sound /d/ in three different positions:
The Cohort Model (Marslen-Wilson & Welsh, 1978)
The Cohort Model rises an important issue of word segmentation: surely, we need to know where the word starts for the appropriate cohort to activate. In continuous speech, words are not always separated acoustically; have a look at the spectrogram below, which corresponds to the phrase /to stand against/:
McQueen et al. (1994) suggested that the larger activated cohort is, the smaller role every word has, therefore the harder it is to find the right one as they compete. Frequency also matters: more frequent word is more likely to win. Here, 'spider' is probably more frequent word than 'spied' - thus it is more likely to 'win'. Once again, however, top-down processing helps as context plays a crucial role in comprehending speech.
Next time, I will discuss connectionism further, and show how - and for what reason - models of speech perception are constructed.