If you travel to another part of the world, the richness of a foreign language may be the first thing that strikes you. A new to study Researchers from the University of Lyon suggest that there may be fewer differences between languages than you think.
“Languages vary a lot in terms of the information they contain in a syllable and also in the speed at which they are spoken. But what is interesting is that the two types balance each other out, so languages more information-dense are spoken more slowly, and less information-rich ones are spoken faster. This means that there is a stable and very similar rate of information between languages,” explains the co-author of the study Dan Dediu, researcher at the Laboratoire Dynamique du Langage in Lyon.
The battle for a universal constant
Trying to find a “universal” constant for the language, Dediu’s team faced quite a battle. There are over 7,000 different languages and very few features link them all. This even extends to basic measurements of how information is encoded into words. For example, the number of syllables per word varies considerably from one language to another, which means that Shannon’s information rate (see gray box) also varies. However, Dediu and his team had the insight to consider not just the words, but the speed at which they are spoken.
Dediu and his colleagues used recordings from 170 native adult speakers of 17 different languages across Europe and Asia. Each speaker was instructed to read a set of 15 pieces of text, consisting of approximately 240,000 syllables.
Claude Shannon, researcher at Bell Labs, made a huge contribution to information technology when he formulated his theory of information in a seminal paper in the 1940s. The core of Shannon’s work was that information could be expressed as discrete binary values, qu he was calling parts. This meant that the noise produced by long-distance communication could be silenced by rounding the distortion up or down to 1 or 0. Applying this theory to language, Shannon showed that different languages have their own level of redundancy. English is sometimes noted have a redundancy level of 50%, which means that half of the letters of a given sentence could be deleted while preserving the meaning.
How many syllables in a second?
The researchers chose the syllable as the singular unit of information. This passed over two other options (it’s quite a controversial topic in computational linguistics, it turns out):
- Phonemes – units of sound that help us separate individual words – were excluded as Dediu’s team realized they could easily be omitted in speech.
- Words – these were considered too language specific for easy comparison
Armed with a data set and a metric, the scientists examined their results. They revealed some interesting differences between the languages of our world:
- The number of distinct syllables in English is almost 7000, but only a few hundred in Japanese
- Speech rate ranged from 4.3 syllables to 9.1 syllables per second
- Vocal harmony (a fascinating the linguistic innovation that requires suffixes to be “in harmony” with the word to which they relate) was present in four of the languages
In short, the languages seemed pretty damn different.
Despite this, Dediu’s team noted that the information rate, which takes into account the rate of speech and the information density of written text, was roughly consistent across all recorded languages; information-rich texts were read more slowly, while uninformative languages were spoken faster.
Language like a gingerbread reindeer: the two B/W versions use different resolutions and number of gray levels but encode the same information, just as the languages trade different strategies but are equally efficient. Credit: Dan Dediu, Lumière Lyon 2 University
The researchers were able to settle on a figure – 39.15 bits/s – as the average information rate for the 17 languages. There were some interesting variations – for example, female speakers had lower speech and information rate.
The team showed that differences in written text made little difference in information rate, suggesting that the findings could be generalized beyond the text-based study conducted here. Speech rate and syllable count were significantly more variable than information rate, cementing the latter as a valid cross-linguistic connector.
What does this mean for our brain?
The authors suggest that the results mean that the rate of information at stabilize around a narrow average, because too high levels would hamper the brain’s ability to process data and articulate speech clearly. On the other hand, a low information rate would require the retention of far too many words for the brain to remember before meaning could be extracted.
This highlights the dual role that language has to play, which Dediu summarizes: “There are two sides to the coin when it comes to language – one is cultural and the other biological, and when one changes – say one language becomes more informationally dense – the other reacts – its speakers begin to speak it more slowly.