top of page

The Components of Language Systems 101: Sounds of Language Systems - Phonetics and Phonology


This course will cover the fundamentals of linguistics (i.e., the scientific study of natural languages) and its sub-related fields. As a discipline, linguistics offers a comprehensive perspective of the internal composition of languages: it ranges from an awareness of the sounds that make up the phonemic system of languages, encompasses comprehensive insights into to words' internal compositions, and includes a deep understanding of the syntactic arrangement of meaningful elements in utterances. Consequently, this 101 Series, which englobes theoretical linguistics, phonetics, morphology, syntax, semantics, and language typology, will equip students with language analysis tools that will be relevant across related disciplines, whether venturing out on their own language acquisition journey and all its related skills (speaking, reading, listening, and writing) or specializing in language-related fields, such as historical linguistics and philology, as well as provide useful insights into the fields of literature and cultural studies.

This 101 series is divided into six articles:

1. Components of Language 101: Theoretical Linguistics and Theories of Language

2. Components of Language 101: Sounds of Language Systems—Phonetics and Phonology

3. Components of Language 101: Morphology and the Internal Structure of Words

4. Components of Language 101: Syntax—The Rules Governing the Arrangement of Parts of Speech

5. Components of Language 101: Semantics and Pragmatics—The Meaning of Linguistic Items and the Relevance of Context

6. Components of Language 101: Language Typology—How Languages are Classified into Types

The Components of Language Systems 101: Sounds of Language Systems—Phonetics and Phonology

Languages can be considered as systems because they are composite entities. Systems are made up of different parts and sections, which entails that languages are complex in nature, being, as they are, constructed on solid foundations of intertwined levels: phonology at sound level; morphology at lexical and word formation level; syntax at word order level in utterances or parts of sentences; semantics and pragmatics at meaning level. These levels are not discrete and separate since they do not operate in isolation, but they interact with one another to make up the entire internal composition and architecture of languages.

On the one hand, one of the integral levels of languages stems from the human ability of producing sounds, and, on the other, from the organic incorporation of sounds with communicative objectives. Because a sound is first and foremost a physical fact, it is measurable by specific parameters and it is describable: sounds fall within the scope of the branch of linguistics called phonology, whose objects of study are called “phones” (Berruto & Cerruti, 2017). Our phonetic apparatus is able to produce an enormous quantity of sounds, but only a small part of these sounds end up being selected and incorporated as distinctive sounds of particular language systems. Once a particular sound is selected as a part of the sound inventory of a language, it is considered as a “phoneme”, which are linguistic units equipped with the ability of distinctiveness: phonemes carry and convey meaning, which entails that their presence in a certain sequence of sounds makes a difference in the overall meaning of that sequence (e.g. in the English words ball and tall, the first position is occupied by two different phonemes: [b] and [t]. Because both words carry a specific meaning and differ from one another by that initial phoneme only, [b] and [t] are distinctive phonemes of the English language). Phonology is the branch of linguistics that studies phonemes. Linguistic phenomena such as accents, intonation, and tone (which extend to the entire utterance) are also pertinent to phonology (Graffi & Scalise, 2002).

Figure 1: The variety of producible sounds (Unknown, n.d.).

The label "phonetics" is derived from the Greek word phōnētikós, which means “voice” or “sound” (Lombardi Vallauri, 2007). Phonetics is further divided into different sub-branches: articulatory phonetics is one of such sub-branches and it is the study of the physical nature of the sounds and their propagation through air. It considers the sounds of languages with their intrinsic qualities and the way they are produced, and not as linguistic tools intended for communicative goals used by humans. It is concerned with their physical nature as vibrations propagated through the air, called “sound waves” (Lombardi Vallauri, 2007). Auditory phonetics is concerned with the way the recipient of messages perceive the sounds, which at this level are called “phones” (Graffi & Scalise, 2002). This sub-branch considers sounds as produced by the human vocal apparatus and it analyzes them according to the movements humans need to make in order to produce them. The entire process of sound production is called “phonation”: the airflow is sent from the lungs and reaches the trachea; from there on, it gets to the larynx, in the area of Adam’s apple. In the glottis, the air will then reach the vocal cords, which are folds of membranous tissue that come closer to each other and vibrate in order to produce the voice (Lombardi Vallauri, 2007).

Sound Description

In all linguistic systems, every identified sound unit can be defined in contrast to other units. This perspective can be adopted to present the sounds of languages as well, which are easily comparable to one another by contrast and relations of opposition. As it will be seen, two consonant sounds may differ in one trait only while sharing others: thus, [p] and [b] share the place of enunciation since they are both produced by the lips; they share the articulatory manner in that they both stop the airflow at some point; however, they are different because the former is said to be silent (the vocal cords are not involved) and the latter one is voiced, because the vocal cords are recruited in its production. Sounds are conventionally represented between slashes: /d/, ts/, /f/.

Figure 2: The human voice (Wells & Colson, 1971).

The three major broad categories into which sounds are classified are consonants, vowels, and semi-consonants, alternatively labeled as semi-vowels. A preliminary contrast, however, is the distinction of vocal sounds and consonant sounds. This fundamental distinction is grounded in the manner of articulating these two sounds: while the flow of air never encounters any interruption while pronouncing vowels, consonants can only be produced if an obstruction occurs at some point, regardless of the articulatory area where the interruption happens. As it will be seen, this obstruction can take the form of an actual interruption, albeit momentary, or it can be realized as a narrowing of the channel through which the air flows, without ever completely interrupting the airflow. Depending on this, consonants are classified into different types. There are three main parameters which are used to analyze consonants. The manner of articulation takes into account the type of obstruction interposed to the airflow. The point of articulation is the specific location along the vocal tract (lips, teeth, alveoli, palate, pharynx) where the production happens. Finally, sonority considers whether the vocal cords are engaged or not (Graffi & Scalise, 2002). Considering the first parameter, the manner of articulation, consonants may be classified into the following major types.

  • Occlusives or plosive consonants are produced by interposing a total obstruction to the air flow; occlusive consonants are therefore characterized by a complete interruption of the flow. After the interruption, the airflow is let out again. Such consonants are, for example, the sounds /t/, /p/, /g/, where the obstruction occurs in the dental area, the labial area, and the glottal area, respectively.

  • Fricative consonants are created when the involved articulatory organs come into close proximity: these organs, however, will not touch and, as a result, will not create a total obstruction like for plosives consonants; the airflow is not stopped but, rather, it is made to flow through a narrower channel. Such consonants are, for example, /f/, or /s/, where the airflow comes through the lower lip and the teeth in the first case, and through the tip of the tongue and the palate in the second case.

  • Affricate consonants are produced by combining the plosive type and the affricate type. Affricates are, indeed, characterized by two phases: one first, plosive phase, when there occurs a complete stop to the air, and a second phase, when the air is released through a narrower channel. Such consonants are: /ts/ as in the Italian word “stazione”, /dg/, as in the English word “judge”, where this sound occurs twice. As it is noticeable by their phonetic counterpart, affricates are represented by two sounds in order to accurately capture their two-step concretization.

  • Nasal consonants occur when the air is made to flow out of the nostrils. The most obvious representative sounds of this category are /m/ and /n/. They can also have a different place of articulation when they appear in the palatal variety, such as in the word /Congo/ and labiodental /imparare/ (“to learn” in Italian) (Graffi & Scalise, 2002).

  • /R/ and /l/ are the representatives of the liquid consonants. These are consonants in which it is the tongue that produces a partial closure in the mouth, usually by touching the upper dental arch. Liquid sounds are particularly delicate when learning a new language, because their pronunciation greatly varies across languages: for example, the /r/ sound as pronounced in Standard English and its varieties are (is) not the same as in Romance languages. The Italian language has two main pronunciations of the /r/ sound: one palatal and one alveolar.

Depending on the area of realization, consonants can be classified into the most common following types:

  • Bilabial are consonants produced when the upper lip and lower lip touch: /p/ and /b/ are such consonants.

  • Labiodental consonants occur when the obstruction is caused by the lower lip and the dental upper arch, as in the sounds /f/ and /v/.

  • Dental consonants are produced by the tip of the tongue going up to meet the posterior part of the dental upper arch: /t/ and /d/ are examples of such consonants.

  • Alveolar consonants are produced by touching the tip, or blade, of the tongue to the alveolar ridge, which is an area located exactly behind the upper front dental arch. Examples of such consonants include /t/ and /d/, which are both realized by having the tongue briefly touch the alveolar ridge.

  • Palatal consonants are produced when the front part of the tongue makes contact with the hard palate, which is found inside the mouth at the top, just behind the alveolar ridge. Examples of such consonants include the semi-vowel, or semi-consonant, /j/ and the sound /ɲ/ (which can be found in Spanish words such as piña, “pineapple”, and niño, “male child”).

  • Velar consonants are realized by bringing the back of the tongue into contact with the velum, which is a soft area in the roof of the mouth. Common velar consonants in English include /k/ (found in words such as coconut, where it appears twice, the Italian word caduta, “fall”, and the Japanese word kirei, “beautiful”) and /g/ (which appears in words such as great, or the French word gorge,“throat").

  • Palato-alveolar consonants are a combination of alveolar and palatal articulation. The blade of the tongue makes contact with the alveolar ridge, while the front part of the tongue gets nearer to the hard palate. An example of a palato-alveolar consonant is /ʃ/ (as in "shoe").

The parameter of sonority, as mentioned before, is determined by whether the vocal cords are involved or not. As it can be seen in the image below, area of articulation and manner of articulation can give rise to consonantic pairs, by which one consonant is usually silent (pronounced with no involvement of the vocal cords) and the other is voiced (the vocal cords are recruited in the production of the sound). Therefore, the sounds /t/ and /d/, the sounds /p/ and /b/, the sounds /k/ and /g/ share all other traits, but the first member of each pair is silent and the second is voiced.

Figure 3: IPA constonants chart (Unknown, n.d.).

As mentioned earlier, the enunciation of vowels does not involve any obstruction or interruption of sorts to the airflow. The articulation of vowels is determined by three fundamental parameters: the configuration of the oral cavity; the positioning of the tongue, which may come close to the palate and may shift backwards or forward; and the shape of the lips, which may be rounded or unrounded. Vocal sounds are also classified according to the anatomical area of articulation and the way the involved organs move in order to produce the sounds. However, because the vocal cords are always involved in the production of vowels, the sonority parameter does not apply: in other words, vowel sounds are always voiced and do not have a silent counterpart (Lombardi Vallauri, 2007). Depending on the area of articulation, the most common vocal sounds can be:

  • Anterior vowels are produced when the front part of the tongue is put in the front of the oral cavity area. In other words, the tongue moves forward. Examples of such vowels are /i/ (as in the word feet) and /e/ (as in the word debt).

  • Central vowels are produced when the tongue is approximately positioned in the middle of the oral cavity. The tongue is neither reaching for the front or the mouth nor is it retreating toward the back of the mouth. An extremely common central vowel is /ə/ (the schwa sound), which is a very frequent sound in the English language, as it tends to appear often in unstressed syllables (as in the second syllable of the word cover).

  • Posterior vowels are produced with the back part of the tongue relatively close to the back of the oral cavity, meaning the tongue is being positioned in the vicinity of the rear of the mouth. Examples of such vowels include /u/, as in the English word root, and /o/, as in the English word phone.

Depending on the height inside the oral cavity that the tongue reaches with respect to the palate – vowels can be classified as high (or closed) and low (or open) (Lombardi Vallauri, 2007).

In order to produce high (or closed) vowels, the tongue needs to move up towards the palate; it moves towards the soft palate and towards the hard palate for production of velar vowels and anterior vowels respectively. The sound /i/, as in the word heat, is an example of such vowels.

Low (or open) vowels are produced when the tongue is pressed down onto the bottom of the mouth, where it tends to assume a flat position and lets as much air as possible flow free. An example of open vowels is /ɑ/ as in the word farther.

Medium-high vowels are produced with the tongue positioned closer to the roof of the mouth, but not as high as the highest vowels. They fall between high and mid vowels in terms of tongue height. An example of a medium-high vowel in English is /e/ (as in "bet").

Medium-low vowels are produced with the tongue positioned lower in the oral cavity, but not as low as the lowest vowels. They fall between mid and low vowels in terms of tongue height. An example of a medium-low vowel in English is /ɛ/ (as in "bed"). As mentioned, the shape of the lips also plays a part in the classification of sounds: the rounder they become, the more the vowels will resemble such sounds as /o/ and /u/; the more stretched the lips are, the less rounded the sound is (/i/ is an example).

Figure 4: A Chart of English pure vowels (Unknown, n.d.)

Semi-vowels is the label used to refer to the sounds /j/ or /w/. These are alternatively called semi-consonants because they can be seen as occupying an intermediate position between vowel sounds and consonants. At no point in their articulation do they entail a total obstruction of the air flow, but the organs involved come so close together they almost create one. These sounds are normally found in the initial position of a diphthong, which is the juxtaposition of two vowel sounds. Thus, these semi-vowels can be found in words such as the English word world and the English word yesterday. Having had a look at the single sounds and their fundamental classifications, let us turn to sound entities that are more complex in nature–that is, made up of one or more sounds.

Further relevant phonetics phenomena are diphthongs, triphthongs, and long vowels in general. Diphthongs and triphthongs are complex sounds made up by the juxtaposition of more than one vocalic sound. Diphthongs start with one vowel sound and glide into another within one single syllable, as in the English word foil. Triphthongs involve a sequence of three distinct vowel sounds within a single syllable, as can be seen in the words hour (/aʊə/) and player (/pleɪə/) . Long vowels, on the other hand, are single vowels pronounced for a prolonged duration than their usual pronunciation; graphically, they are often represented by a double vowel letter like /oo/ in root (Lombardi Vallauri, 2007).

Figure 5: Examples of common diphthongs in English (Unknown. n.d.).
International Phonetic Alphabet and Phonetic Transcription

Because systems of graphic representations of languages are conventional and significantly vary across languages, there can never be a perfectly univocal correspondence between a sound and its written representation. Thus, several incoherences between graphic dimension and sound dimension occur. Two different graphic symbols may share the same pronunciation: in the Italian language, for example, the unvoiced velar plosive sound /k/ may be associated to a number of graphic renditions, as in the Italian words cuore (“heart”) and quando (“when”), where it is graphically represented by two different letters: [c] and [q]. Two different sounds may share the same graphic solution: in the French language, the words certain (“certain”) and cadeau (“present”) share the initial written letter [c], but they have a different pronunciation (/s/ and /k/, respectively).

Furthermore, graphic systems do not tend to take into account the vocalic contraposition of openness and closedness, which is a parameter that may end up not being represented by a written rendition: the symbol [e] in the Italian language, for example, is employed to represent both the open version /ɛ/, as in the word essere (“to be”) and the closed version /e/, as in the word entrare (“to enter”). Further examples of this divergence can be illustrated by the use of the symbol [i] to represent both the vowel /i/ and the semi-vowel /j/, as in the Italian words vino (“wine”) and piano (“piano”) respectively. In English, the single sound /k/ can be graphically rendered as [k] as in kind, [c] as in cure, but also with a more complex graphic solution like [ch] in words such as character (Lombardi Vallauri, 2007).

Figure 6: Phonemic chart of the sounds of English (Unknown, n.d.).

As well as presenting graphic symbols of sounds, IPA is made up of a set of further marks which have been called “diacritics”. These are combined with IPA symbols to convey additional pronunciation details such as word stress, place of articulation, duration of sounds, syllable separation. The most frequent diacritics are [‘] and [ˌ] used for primary stress and secondary stress, respectively. While stresses will be expanded on in the paragraphs that pertain to suprasegmental facts, the intuitive notion of word stress will suffice for the purposes of explaining diacritics. So, the word furthermore can be represented as /ˌfɜːrðərˈmɔːr/, where the primary stress is placed on the last syllable and the first syllable is characterized by primary stress.

A [.] serves to indicate syllable separation: the French word oppressant (“oppressive”) is broken into three syllables: /ɔ.pʀe.sɑ̃/. Because long vowels are signaled with the symbol [:], through in English can be represented as /ˈθruː. The diacritical mark [ʰ] usually flanks consonants to indicate the aspirated quality of the sound. In English, for example, plosive consonants are aspirated when they are in the initial position of a word and are immediately followed by certain vowels–meaning they are pronounced with an concomitant forceful outflow of air. Therefore, in English there may be sounds like /pʰ/, /tʰ/, and /kʰ/.


While phonetics is concerned with the concrete realization of sounds and their description, phonology takes into consideration the function that specific sounds take on with reference to particular language systems, or, in other words, the phones that have been selected by a specific language and incorporated into its sound inventory in order to convey meaning in a structured manner. Once a phone has been selected by a language system to belong to its sound catalog, it can be thought of as a “phoneme”, which is the linguistic unit that the field of phonology is concerned with. A phoneme is also a linguistic unit endowed with the function of carrying meaning. To put it from another perspective, phonemes are the abstract idea of a sound and a phone is its concrete realization: that entails that phonemes are located at the langue level, or at the competency level, whereas phones are at the parole level, or realization level, which were the two main components of languages identified by linguist Saussure. In order to verify if a sound plays a crucial part in determining whether a word carries meaning or not, the procedure consists in contrasting minimal pairs, which is a pair of words that vary by one single sound: if the two words under consideration convey two different meanings and, formally, their difference is limited to that one single sound, those two different sound are phonemes of the language. It is by virtue of this ability that phonemes are said to possess the value of distinctiveness: a phoneme is a phonic segment equipped with a distinctive function, and it cannot be further broken up into smaller phonic segments possessing this function in their turn.

Figure 7: The differences between phonetics and phonology (Unknown, n.d.).

An example of minimal pairs in the English language is the pairs of words hat and fat, which indicates that the sounds /f/ and /h/ are distinctive sounds in English because the two words would not be otherwise distinguishable and carry different meanings. This logic is not limited to sounds only, but may regard modality of enunciation as well. Aspiration, for example, which is signaled by [ʰ], is not distinctive in languages such as Italian (which includes an aspiration of plosive consonants in some regional varieties); however, it is distinctive in other languages such as Hindi or English, where its presence does make a difference and results in minimal pairs. In Hindi, the following are examples of minimal pairs: pal and p[ʰ]al (“to take care of” and “lama”, respectively); tan and t[ʰ]an (“song” and “roll of fabric”); kan and k[ʰ]an (“ear” and “mime actor”). The difference between English and Hindi is not, strictly speaking, in their choices of sound selection, nor are their differences restricted to the phonetic level: rather, it is at the phoneme level and their inventory (Lombardi Vallauri, 2007). The words sing, [sɪŋ], and sin, [sɪn], are another example of a minimal pair, as they signal that the sounds /ŋ/ and /n/ are phonemes.

However, this is not the case for Italian, where the two sounds, despite being both employed, do not have a distinctive function: the words anca (“hip”) and anta (“shutter”), are different by virtue of the sounds /k/ and /t/, but not /ŋ/ and /n/: /ŋ/ and /n/, therefore, are phonemes in English and allophones in Italian. The couplet of words thin and thing constitutes another minimal pair in English that isolates the two sounds /ŋ/ and /n/ as phonemes (Lombardi Vallauri, 2007). Furthermore, phonology is also concerned with phonological rules, or, the sound combinations that are considered acceptable and what are not acceptable in specific languages. It is the linguistic contexts in which sounds appear that indicate the position a certain sound can be featured in, while excluding others at the same time. For example, the sound /r/ in Italian can appear in intervocalic position as in the word dorato (“golden”), after the plosive consonant /t/ as in the word trota (“trout”), after /p/ as in the word prima (“before”), after /b/ as in the word bravo (“good”), at the beginning of a word as in the word rana (“frog”), or at the end as in the word “radar”, but it cannot be used in sound sequences such as */rtf/ or */mr/ (Graffi & Scalise, 2003).

Figure 8: Examples of minimal pairs in English (Unknown, 2019).

Lombardi Vallauri (2007) note that there can never be a phoneme that is ever pronounced in the same manner in all the concrete speech applications it will have. While phonemes with their enunciation features represent ideas of sounds, their concrete realization may vary not only across languages, but also across dialectal varieties of the same language and even speakers of the same language. For example, while in Italian all vowels /a/ are pronounced in a different manner depending on the dialectal variety and on the potential specificity of the individual language users, nevertheless all of them represent the phoneme /a/. Therefore, while there is a distinct idea of the qualities that a speech sound should consist of, its concretization is heavily influenced by a number of contingent elements, such as, as observed before, the particular manners of pronunciation of individual speakers or the linguistic environment a sound appears in, which may be a determining factor in how a sound is concretely pronounced. For this reason, phonology also analyzes how sounds may exert an influence on one another when they are juxtaposed within the same sound sequence.

Assimilations are a widespread occurrence across all languages and they occur when a sound influences the manner of articulation of another sound in its vicinity (usually when they are in close proximity) by transmitting one or more linguistic pronunciation features which the receiving sound would not otherwise possess. Depending on the reciprocal positions of the sounds involved and the extent to which the articulation of a sound is modified following the assimilation process, assimilations are classified as partial or total on the one hand, and progressive or regressive on the other hand. Total assimilation occurs when the affected sound is completely changed by another sound, thus resulting in a sequence of two identical sounds; assimilation is partial when the affected sound is modified only in some of its traits. Assimilation is said to be progressive when the element causing it is located on the left and the affected sound is on the right; conversely, assimilation is regressive when the element causing it is on the right and the affected element is on the left. An example of total regressive assimilation in the Italian language is represented by the adaptation of the prefix “in-” that creates the opposite of adjectives and adverbs. In words such as in the adjective inapporpriato (“inapporpriate”, singular masculine) which is composed by the prefix “in-” and the adjective appropriato (“appropriate”, singular masculine), the prefix “in-” is not modified. However, it does change its form when attached to adjectives and adverbs that begin with specific consonant sounds: thus, the opposite of the adjective ragionevole (“reasonable”, singular for both feminine and masculine words) is not, as would be expected, *inragionevole, but irragionevole, because words beginning in /r/ exert a regressive assimilation on the prefix “in-”. The same change happens if the adjective or adverb begins with the sound /l/: the opposite of logico (“logical”, singular masculine) is illogico (“illogical”). However, if the adjective or adverb begins with a labial consonant /p/ or /b/, something slightly different occurs. The opposite of the adjective probabile (“probable” or “likely”, singular for both masculine and feminine forms) is improbabile: the prefix “in-” has acquired the labiality trait, inherited from the /p/ at the beginning of the word, but has not undergone any further changes; in other words, because it has not completely morphed into the /p/ sound, the assimilation is partial. It is also regressive, as the affected sound is on the left of the sound that causes the assimilation. The same happens for adjectives such as imbevibile, imparziale, imbattuto (“undrinkable”, “impartial”, and “undefeated”, respectively). Another example of partial regressive assimilation in the Italian language is in the pronunciation of words beginning with the sequence of sounds /s/ and a voiced consonant. In such circumstances, the sound /s/ loses is characteristic of being silent and acquires the voice trait from the sound that follows: it happens in words such as sbattere (“to slam”), which is pronounced /zbattere/, and sragionare (“to talk nonsense”), which is pronounced as /zragionare/.

American English provides an example of total regressive assimilation with the verbs wanna and gonna, generated by the contraction of the constructions want to and going to respectively: the particle to is incorporated to the preceding verb and the two of them now form a new word; then, it is completely adjusted to the sound that precedes it, losing the sound /t/ and the vocalic features of the sound /o/. An example of partial progressive assimilation is represented by the pronunciation of plural -s in the English language. While the sound /s/ preserves its unvoiced trait if it is attached to words ending with a silent consonant (for example, in cats), it takes up a voiced feature if the word it is attached to ends with a voiced consonant, which is the sound causing the assimilation: it happens, for example, in dogs, which is pronounced as /dogz/. Assimilations can occur diachronically, which means along the historical evolution of a language. In the evolution from Latin to Italian, for example, the word factum (“fact”) has transformed into the word fatto, thus completely losing its velar sound /k/ (assuming that this is how the Latin language would be pronounced) and transforming it into the silent dental plosive /t/. This metamorphosis is evidently caused by the sound /t/ that followed the /k/ in the original Latin form. Assimilation does not necessarily occur only between adjacent sounds, but can also concern sounds that are located at a relative distance from one another. This is the case for the linguistic phenomena known as “umlaut” and “vowel harmony”, which are different manifestations but are motivated by the same principle. An example of the latter is provided by the way that the Turkish language creates the plural of nouns: the suffix that indicates the plural form adjusts its vocalic sound to be as much in harmony as possible with the vowel that it is closest to. Therefore, the plural of the noun adam (“man”) is created by attaching to it the the variant “-lar” of the plural suffix; on the other hand, the plural of the noun ev (“house”) is created by attaching the variant “-ler” of the plural suffix. In the German language, the plural of some nouns is obtained not only by addition of the appropriate suffix, but also via the partial modification of the vowel in the noun to make it more similar to the vowel of the plural suffix. Examples of “umlaut” plurals are: Buch (“book”), which becomes Büche (the /u/ sound in the noun has been accommodated to facilitate the pronunciation of the plural form which ends with the sound /e/, a medium-high vowel) and Dach (“roof”) becomes Dächer. In conclusion, in the case of “vowel harmony”, the choice of the suffix depends on the word it will be attached to; in the case of “umlaut”, the vowel in the noun is modified according to the vocalic sound in the suffix.

Figure 9: Examples of assimilations in American English (Unknown, n.d.).
Phonemic Awareness and Phonological Awareness

In the previous paragraphs, sounds were framed as either phones or phonemes depending on the perspective from which they are considered–whether humanely produced sounds defined through their articulatory characteristics, or as sounds incorporated to specific language systems in order to convey meaning. These two considerations of sound, while being different in their conceptual framework, correspond to complementary phases of the human processes of sound acquisition and of learning how to give shape to meaning through sound sequences. The first ability that kid-learners naturally acquire is called “phonological awareness”, which is associated with linguistic competences connected to sound recognition. Indeed, phonological awareness is in its turn a collection of a different set of subskills: being sensitive to figures of speech such as alliteration, being able to break up a sentence into the words that compose it, identifying the syllables in a word, are examples of such skills falling into the category of “phonological awareness”. In their language acquisition journey, they expand on this ability accordingly and develop the competence called “phonemic awareness”. Phonemic awareness grants kids and learners in general the mastery of phonemes: they are able to notice and spot them, recognize them in different words and predict the linguistic environments they will appear in. Phonemic awareness is a stage where language users are able to manipulate the individual sounds they have acquired. Having had a look at the properties of single phonemes and how the reciprocal behavior they may take with one another, the next section will consider bigger chunks of language and examine the phonological phenomena that involve longer sequences of sounds.


Graffi and Scalise (2002) differentiate between a phonetic definition and a phonological definition of the linguistic unit “syllable”. The former definition frames a syllable as a being made up by one or more phones assembled around one linguistic peak of intensity. Phonological definitions conceive of syllables as intrinsically connected to the words they represent. In this view, the syllable is a unit whose existence is motivated by necessities of sound organization. Syllables possess their own internal structure. The essential component is the “nucleus”, which is usually a vowel sound. It does not, however, need to be a vowel; it can also be a consonant which does not have a silent counterpart, such as /l/ or /r/, which can be syllabic nuclei in English words, such as little (/ˈlɪtəl). Syllables can also be divided into open syllables and closed syllables. Open syllables do not have a consonantic termination and are classically made by one initial sound (consonant or semi-consonant) followed by the vowel “nucleus” of the syllable, such as the first syllable of the French word valeur (/va.lœʀ/, “worth”); they may also be made of the “nucleus” vowel only, as in the first syllable of the word après (/a.preɛ/, “after”) in French. Closed syllables, conversely, do feature a consonantic termination after the vocalic nucleus: for example, the first syllable of the German word Ansehen (/ˈan.zeːən/), “popularity”) ends with the consonant /n/. So far, the contents presented have focused the analysis of words at a macroscopic level (prison), syllable level (/pri.son/), phoneme or segmental level (/p/, /r/, /ɪ/, /z/, /ə/, /n/) and distinct traits level (for example, the first phoneme of this word is described as a plosive labial consonant; the second phoneme is an alveolar liquid consonant; and so forth).

Figure 10: Syllable structure of the word "strengths" (Unknown, n.d.).
Suprasegmental Phonological Phenomena

If segmental phonology takes into consideration all those linguistic phenomena that occur at the phoneme level (that, differently put, concern single phonemes and the results of their interactions), suprasegmental phonology takes into consideration those linguistic manifestations which are not restricted to one single phoneme and whose consequences involve more than one phoneme. The most relevant are the phenomena that follow.

Length has got to do with the actual temporal duration of sound realization. Sounds are not necessarily pronounced with the same duration: some are shorter, some are prolonged on purpose. Length can be distinctive: in Italian, while the length of vowel sounds is not distinctive, the difference in length of consonants is distinctive. Thus, the word caro means “dear”, but the word carro has a different meaning (“wagon”): the two words have no differences aside from the length of the consonant /r/. In English, length is distinctive both for vowels and consonants: vowel length can serve to distinguish words such as hit (/ˈhɪt/) and heat (/ˈhiːt/) or ship (/ˈʃɪp/) and sheep (/ˈʃiːp/).

Stress is a phonological manifestation that regards syllables, which can be stressed or unstressed. Stressed syllables are realized with more emphatic and vocal strength and intensity than unstressed syllables. Stress position can be a distinctive feature in some languages, as it is in Italian, where it creates pairs of different words: ancora with the stress on the first syllable means “anchor”, whereas ancora with the second syllable stressed means “again”; capitano with the first syllable stressed means “they happen”, capitano with the third syllable stressed means “captain”, while capitano with the last syllable stressed means “he commanded”. Stress creates minimal pairs in English as well, for example by distinguishing the grammatical classes of words and verbs in some cases: import with the first syllable stressed is the noun, while import with the stress on the second syllable is the verb; the same applies for words such as contrast, export, and torment.

Intonation is another suprasegmental fact, also called “melody” or “melodic curve”. It is a sound technicality which can be mustered in to attribute a specific meaning to an utterance. In Italian, for example, interrogative utterances are not grammatically signaled by specific indications in their composition at word level; rather, they are only orally marked by a falling intonation, which is the single element that can effectively convey the fact that the utterance is framed as a question, as opposed to its affirmative counterpart which is otherwise formally identical. It constitutes a meaningful linguistic tool in English as well, where it signals a difference in eminence among the parts of an utterance: while some parts of the utterance may be neutral or unmarked, others may be presented as more meaningfully relevant by the speaker. In the following statement, for example, the meaning changes depending on where the intonation is most prominent: I did not say that. If the intonation falls on the subject I, the additional meaning intended by the speaker is that they did not say it, because someone else said it. If the intonation falls on the auxiliary did not, the speaker is emphasizing their innocence from the action, possibly in response to an accusatory utterance. If the verb say is the speech part that is marked by intonation, the implied meaning is that the speaker did not exactly utter “that”, but maybe they thought it, they whispered it, or they wrote it. Finally, if the prominent part of speech is the object that, the speaker may be distancing themself from having pronounced specific words: they did not say that, but something else.


This article has explored the linguistic branch of phonetics and phonology, shedding light on the intricate science of sound within language. Through an exploration of phonetics, insights have been given as to the physical properties of speech sounds, enabling a deeper understanding of how they are produced and perceived. The study of phonology has revealed the underlying structures and patterns of sounds in language, emphasizing the significance of phonemic awareness in decoding and comprehending spoken and written words. The use of the International Phonetic Alphabet (IPA) as a tool for transcription has been highlighted, facilitating accurate phonetic representation that is valid across languages. Moreover, the consideration of suprasegmental phonological features, such as intonation and stress patterns, has underscored their role in conveying meaning and tone in speech. This comprehensive journey through phonetics and phonology underscores the importance of these fields in linguistics and language education, as they provide a foundation for understanding the intricate nuances of human communication.

Bibliographical References

Berruto, G. (2021). Che cos’è la linguistica. Carocci Editore.

Berruto, G., Cerruti, M. (2017). La linguistica. Un corso introduttivo. UTET Università.

Carr, P. (2012). English Phonetics and Phonology: An Introduction (2nd ed.). Blackwell Publishing.

Davenport, M., & Hannahs, S. J. (2020). Introducing Phonetics and Phonology (4th ed.). Routledge.

Graffi G., Scalise, S. (2002) Le lingue ed il linguaggio. Introduzione alla linguistica. Il Mulino. Lombardi Vallauri, E. (2013). La linguistica. In pratica. Il Mulino.

Visual Sources

Figure 1: (n.d.) The variety of producible sounds. [Image]. Retrieved from:

Figure 2: Wells & Colson. (1971). The human voice. [Image].

Retrieved from:

Figure 3: (n.d.) IPA Consonants Chart. [Image]. Retrieved from:

Figure 4: (n.d.) A Chart of English pure vowels. [Image]. Retrieved from:

Figure 5: (n.d.) Examples of common Diphthongs in English. [Image]. Retrieved from:

Figure 6: (n.d.) Phonemic chart of the sounds of English. [Image]. Retrieved from:

Figure 7: (n.d.)The differences between phonetics and phonology. [Image]. Retrieved from:

Figure 8: Examples of minimal pairs in English. (2019). [Image]. Retrieved from:

Figure 9: (n.d.) Examples of assimilations in American English. [Image]. Retrieved from:

Figure 10: (n.d.) Syllable structure of the word "strengths". [Image]. Retrieved from:

Author Photo

Nicole Lorenzoni

Arcadia _ Logo.png


Arcadia, has many categories starting from Literature to Science. If you liked this article and would like to read more, you can subscribe from below or click the bar and discover unique more experiences in our articles in many categories

Let the posts
come to you.

Thanks for submitting!

  • Instagram
  • Twitter
  • LinkedIn
bottom of page