Библиографическое описание:

Степанцова Т. А. Tempo and Pause Variation in American Monologue: an Experimental Investigation // Молодой ученый. — 2013. — №1. — С. 237-240.

The present paper is a study of tempo and pause variation in speech and the role it plays in public speaking of American men.

Tempo has been described from different aspects and in different styles and genres of speech. It has been found to depend on the style of speech, on the age of people, on their pragmatic aim, on their emotional state.

Our task was to find out which parameter affected the impression of fast or slow speech and how a person should keep one’s timing to be well-understood, to make sure their speech is intelligible, comprehensible.

The research was based on TV-talk of three well-educated American men belonging to the elite of American society: Bill Gates (co-founder and chairman of Microsoft Corporation), Walt Wolfram (sociolinguist), Roger Shuy (sociolinguist).

  1. Methods and material of the experimental investigation.

    1. The experimental material of the investigation.

The method consisted in testing the perception of tempo by native speakers of American English and correlating the results with the tempo parameters measured in the corpus of three texts produced by three American men. Thus, the methods were: 1) auditory analysis; 2) acoustic analysis; 3) correlation analysis. The data was processed according to the practical explanations of the basic techniques of experimental phonetics, given by Peter Ladefoged in “Phonetic Data Analysis: An introduction to phonetic fieldwork and instrumental techniques” [4].

The corpus subjected to analysis included two samples from the video film “American Tongues” and a sample from the authentic material “English in Action — Businessmen and Politicians”.

The subjects who carried out the auditory analysis were five native speakers of American English: three men and two women. All of them are educated people: the men are former FBI-officials, now businessmen, about 45–50 years old; the women are 22-year old American students in Russia, studying sociology and political science.

The research was based on TV talk of three American men: Bill Gates (co-founder and chairman of Microsoft Corporation), Walt Wolfram (sociolinguist), Roger Shuy (sociolinguist). These representatives were selected according to the following factors. It is known that all the informants are well-educated, well-bred people. They are used to speaking in public — Walt Wolfram and Roger Shuy are lecturers and researchers with a great deal of experience, and Bill Gates is a well-known public figure. All the three informants are university graduates (though it should be mentioned that in his junior year Bill Gates left Harvard University to devote his energies to Microsoft, nevertheless, he is no doubt a man of high intellectual power) and belong to the elite of American society.

The choice of male informants decreases the display of emotions that impedes the process of female speech analysis [1], [2], [6].

    1. Some facts about the informants: Bill Gates, Roger Shuy and Walt Wolfram.

Bill Gates, IT specialist — Facts

William Henry Gates is an American entrepreneur and the co-founder, chairman, former chief software architect, and former CEO of Microsoft, the world's largest software company. Forbes magazine's list of The World's Billionaires ranked him as the richest person in the world for thirteen consecutive years. Gates is widely respected by people who see his wealth as a product of intelligence and foresight.

Roger Shuy, sociolinguist — Facts

Roger W. Shuy is the world's leading authority on forensic linguistics. He is Distinguished Research Professor of Linguistics, Emeritus, at Georgetown University. He is also president of Roger W. Shuy, Inc. founded in 1982 which specializes in providing forensic linguistic services to legal professionals.

Walt Wolfram, sociolinguist — Facts

Walt Wolfram is the William C. Friday Professor at the College of Humanities and Social Sciences (CHASS) at NC State University.

Over the past three decades, Professor Wolfram has pioneered research on a broad range of vernacular dialects. A prominent concern of Professor Wolfram involves the application of basic research findings to social and educational problems. He has conducted numerous workshops and seminars for school systems and other public and private agencies. He has authored or co-authored a number of textbooks and books profiling the sociolinguistics of diverse communities.

    1. The auditory analysis.

Auditory analysis was carried out by:

  1. me — marking intonation unit boundaries, pauses and accents in traditional notation;

  2. five educated native speakers of American English — overall speaking rate assessment.

The aim of native-speaker auditory analysis was to correlate acoustic data with perception data. Five American people were asked to evaluate and rate the speech of the informants as fast, slow or moderate.

    1. The acoustic (computer) analysis of tempo.

The multi-level analysis of the TV-speech of the informants was done by means of auditory and acoustic (instrumental) analyses.

The obtained data was processed on a personal computer Intel® Pentium® M (processor 1400MHz). The sounding texts were digitised from the videotape “American Tongues” and the CD “English in Action — Businessmen and Politicians” with the help of Microsoft software “Sound recorder version 5.1.”

The texts were normalized at about 40 seconds (one paragraph) and the computer program Sony Sound Forge version 8.0 was used to process the data.

The acoustic parameters of the research are the following:

  • for articulation rate

  • syllable duration

  • accented syllable duration

  • unaccented syllable duration

  • accented/unaccented ratio

  • words per minute

  • syllables per second

  • for overall speaking rate

  • intonation unit duration

  • pause duration

  • frequency of pause type

  • phonation/pause ratio

  1. The results of the correlation analysis.

    1. Auditory analysis.

The native speakers’ assessment of the overall speaking rate of the informants was as follows:



Walt Wolfram (sociolinguist)

the slowest, slow, slow, slow, slow

Bill Gates (IT specialist)

slower, moderate, slow, a little slow, slower

Roger Shuy (sociolinguist)

the fastest, fast, moderate, a little fast, fast

Thus, according to the native speakers’ perception, Roger Shuy’s speech is considered to be the fastest and Walt Wolfram’s speech — the slowest. What this impression depends on, will be discussed later.

    1. Acoustic computer analysis.

      1. Articulation rate.

In specialist literature the most common way of showing the difference in articulation rate is through the number of syllables in a time unit: normal tempo: 4–5.3 syllables / second, fast tempo: 5.6–6.7 syllables / second [3]. Our data give evidence that Wolfram’s TV-talk is characterized by 3.2 syllables / second, Gates’s — 3.3 syllables / second and Shuy’s — 3.8 syllables / second. Thus, we can state that the tempo of Shuy’s speech, according to these parameter values, is closer to normal, while the tempo of speech of others is rather slow. (Table1)

Table 1

Articulation rate (syllables/sec)

Walt Wolfram


Bill Gates


Roger Shuy


Mean syllable duration feature in TV-talks lends itself for comparison with reading. According to John Laver [5, pp. 539–546], articulation range in reading is 180–200 ms. Tatiana Shevchenko’s work data suggest that American (nonprofessional) reading is slower — 237 ms [7]. With these reference points to make the boundaries, American TV speech of educated male-speakers appears to be faster than American non-professional reading: 166/232/173 ms vs. 237 ms. (Table2)

Table 2

Articulation rate (syllable duration in ms)

Walt Wolfram


Bill Gates


Roger Shuy


For American speech accented / unaccented syllable duration contrast has been reported at 1.6 for reading and 1.5 for talk [8]. In our corpus with average accented syllable duration being 189/202/171 ms, and the unaccented being 150/258/175 ms respectively, the ratio comes up to 1.27, the result quite unexpected, as it was previously found that a good speaker makes a high contrast between accented and unaccented syllables which facilitates accentual structure of words and through this words themselves. Walt Wolfram and Roger Shuy are lecturers and researchers with a great deal of experience. They are aware of listeners’ ability or disability to understand speech pronounced at a high rate. In their speech we find a slightly higher accented/unaccented syllable duration contrast than in Bill Gates’s speech. However, when compared with the normal American talk, this contrast is still very low. (Table 3).

Table 3

Accented / unaccented duration ratio

Walt Wolfram


Bill Gates


Roger Shuy


      1. Overall speaking rate.

The first observation concerns the fact that the speech of the informants is broken into rather big chunks of uninterrupted speech. For want of a better term, we will use the conventional terminology of 'intonation unit', but with certain reservations. An intonation unit is generally agreed to be signalled by a terminal pitch movement, other cues being silent or filled pauses, final lengthening, rhythmic cohesion, change of tempo and pitch resetting. In the present study an intonation unit was defined as having a terminal pitch movement and an obligatory pause.

The average intonation unit in the material under study is 1713/2155/1374ms, compared to usual 1300 ms in an interview.

The way people group words in the sense groups reflects their mind work. People of lower education and intellectual level say 1 or 2 notional words, while people of higher intellectual level may have longer sense groups, which consist of up to 4 meaningful words.

Thus, such a long stretch of words without pauses can be attributed to the high intellectual power of the informants.

Table 4

Intonation unit duration (ms)

Walt Wolfram


Bill Gates


Roger Shuy


The four types of pause are: very short — up to 200 ms, short — 201–500 ms, unit — 501–800 ms, long — 801–1200 ms, very long — more than 1201 ms. In most of the TV-talks under study we deal mainly with short pauses. However, in Walt Wolfram’s speech, that has been recognized by native speakers as the slowest, unit pauses prevail. (Table 5).

Table 5

Pause distribution

Walt Wolfram

unit, long

Bill Gates

short, unit

Roger Shuy

very short, short

  1. Conclusions.

In the course of the present research we have processed samples of speech of three educated American men, two well-known linguists and the world’s top information technology specialist.

The corpus was further subjected to auditory analysis by five native speakers of American English, two men and three women. By applying the method of correlation analysis, we were able to find new facts about the perception of tempo, the acoustic correlates of fast and slow tempo and the individual speech style timing.

It was established that the impression of tempo is a complex phenomenon based on the length of sense groups, articulation rate and the length of pauses. Thus, for instance, the listeners rated the three performances as fast, moderate and slow, according to the combination of the following acoustic parameters:

  • average syllable duration;

  • average pause duration;

  • average intonation unit duration.

However, there is a contradiction between one speaker’s great syllable duration and the perception of his speech as “moderate”, “slower, but not very slow”. The reason for that is that his pauses are indeed average, not very long. This fact gives evidence to the greater impact of pause duration on the perception of tempo as compared with actual articulation rate.

Individual speech style of Bill Gates, a top IT specialist, is characterised by very long uninterrupted speech chunks (intonation units or sense groups) intercepted with unit, i.e. average pauses. It was found that slow articulation rate of the speaker is mainly due to the length of unaccented syllables which is a typical feature of American speech. We can also assume that the intellectual power of the great man is evidenced by the length of his intonation units, his ability to create and keep in mind long uninterrupted chunks of speech.

Walt Wolfram, a well-known American linguist with a great deal of experience in lecturing, achieves the effect of slow tempo by making a greater contrast between accented and unaccented syllables, as well as longer pauses.

Robert Shuy, the sociolinguist, whose speech was rated as fastest, completely ignores the possibility of contrast between accented and unaccented syllables; moreover, he produces short pauses. Although the length of his uninterrupted speech chunks is minimal, the small length of pauses doesn’t compensate for it, and his talk is difficult to comprehend.

Thus, we can conclude that there ought to be certain proportions in speech/pause delivery, as well as accented/unaccented syllable lengths, but the individual timing depends on the personality and the way one’s mind works.


  1. Bennet S., Weinberg B. B. (1979). Sexual characteristics of pre-adolescent children's voices. Journal of the Acoustical Society of America., 65, 179–189.

  2. Brend P. M. (1971). Male-female differences in American English Intonation. 17th ICPhS. Montreal, Canada, Aug.22–29, 3–4.

  3. Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in Spontaneous Speech. London: Academic Press.

  4. Ladefoged, P. (2003). Phonetic Data Analysis: An Introduction to Fieldwork and Instrumental Techniques. Blackwell Publishing.

  5. Laver, J. (1994). Principles of Phonetics. Cambridge University Press, 539–546.

  6. Pellowe J., Jones V. (1978). On intonation variability in Tynesyde speech / Sociolinguistic Patterns in British English. Ed.P.Trudgill. London: E.Arnold, 101–121.

  7. Sculanova, G.M., Shevchenko, T.I. (1999). Dialect, Accent and Prosody. Moscow: MSLU.

  8. Shevchenko, T.I.(2003). Prosody as a Community Code (Russian, British and American Women in Radio Talk). Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3–9 August. Sole, M.J.; Recasens, D.; Romero, J. (eds.), 1189–1192.

  9. Shevchenko, T.I., Uglova, N.G. (2006). Timing in News and Weather Forecasts: Implications for Perception. Speech Prosody-2006. Proceedings of the 3rd International Conference, Hoffman, R. and Mixdorff, H. (eds.) Dresden: TUD Press, Vol.1, 25–28.

Похожие статьи

American humor

American Dream


Социальные комментарии Cackle

Похожие статьи

American humor

American Dream