Components of Language 101: Linguistics and Theories of Language


This course will cover the fundamentals of linguistics (i.e., the scientific study of natural languages) and its sub-related fields. As a discipline, linguistics offers a comprehensive perspective of the internal composition of languages: it ranges from an awareness of the sounds that make up the phonemic system of languages, encompasses comprehensive insights into to words' internal compositions, and includes a deep understanding of the syntactic arrangement of meaningful elements in utterances. Consequently, this 101 Series, which englobes theoretical linguistics, phonetics, morphology, syntax, semantics, and language typology, will equip students with language analysis tools that will be relevant across related disciplines, whether venturing out on their own language acquisition journey and all its related skills (speaking, reading, listening, and writing) or specializing in language-related fields, such as historical linguistics and philology, as well as provide useful insights into the fields of literature and cultural studies. This 101 series is divided into six articles:

1. Components of Language 101: Theoretical Linguistics and Theories of Language 2. Components of Language 101: Sounds of Language Systems—Phonetics and Phonology 3. Components of Language 101: Morphology and the Internal Structure of Words 4. Components of Language 101: Syntax—The Rules Governing the Arrangement of Parts of Speech 5. Components of Language 101: Semantics and Pragmatics—The Meaning of Linguistic Items and the Relevance of Context 6. Components of Language 101: Language Typology—How Languages are Classified into Types

Components of Language 101: Theoretical Linguistics and Theories of Language

Linguistics is the study of languages (Graffi & Scalise, 2002). As an academic discipline, linguistics can shift its focus depending on its objectives of research: we can study languages from what is called a synchronic perspective if we analyze them exactly the way they are in a specific historical moment, whereas a diachronic perspective would entail considering them in their historical evolutions with the aim to describe their development over time and account for the variations and changes they went through; we can compare them, draw similarities and classify them into types, according to the behavior they exhibit. Linguistics courses share the common core of being mainly concerned with the elements that make the internal composition of linguistic systems: phonetics describes their oral structure; morphology is the study of language vocabulary; syntax takes into account the way languages order elements in linguistic productions, and semantics is the study of meaning. Before delving into each of these components, a few preliminary concepts will be explored: the overall characteristics of languages and the major approaches to language study and language conception.

Object and Objectives of Linguistics

If any field of research should be defined by the object it studies, it would appear to be relatively clear what any branch of academic investigation is concerned with. Biology is the scientific study of life and all phenomena that are encompassed in this label; anatomy is the study of the internal structure of organisms, and Literature considers the corpora of written works that belong to and define a specific culture. However, determining the object of study of linguistics is not necessarily as straightforward; differently put, it is no easy task to define what linguists do and what linguists research. As a preliminary measure, what the object of linguistics is not can be ruled out by exclusion: linguistics does not entail fluency in a plurality of languages, which would make the connoisseurs polyglots, rather than linguists; it is not the analysis of the historical evolution of lexical items in a specific language, which is what the field of etymology is interested in; it is not the thorough mastery of one language, whether a mother tongue or a second language, which makes the speakers expert users, but not linguists. The linguist, in addition, is not concerned with determining what is linguistically right or linguistically wrong, but, rather what is appropriate or inappropriate in a specific linguistic environment. To the linguist, every fact or linguistic realization, however ungrammatical that may be, is valid and enjoys equal status of interest. Linguists approach their objects of study from a descriptive perspective, rather than descriptive: this first contraposition is one of the several dichotomies surrounding languages and linguistics (Berruto, 2021).

Figure 1. Languages as Complex Systems (University of Cyprus, Faculty of Humanities, 2023).

Several dichotomies are indeed found that riddle the analysis of languages. Another juxtaposition is worth mentioning in this introductory stage: that languages are made of two complementary aspects. Languages—whereby languages we indicate human or natural languages—are all endowed with two components: one is a biological component, which is naturally founded; the second is a cultural, societal component of languages. The two aspects, two sides of the same coin really, are not contradictory or mutually exclusive: rather, they can be said to constitute a “synergic co-presence”: it is the biological, neuro-linguistic, psychological component, which belongs to the genetic heritage of homo sapiens, which accounts for the innate linguistic functioning of humankind, but it is society and its culture which lend this abstract scheme its concreteness (Gobber, 2003).

In Cours de Linguistique Générale (1916), the linguist Ferdinand de Saussure notably discussed the fundamental plural nature of linguistic realities and qualified their essence as fundamentally twofold, a pervasive duality that filters through languages and concerns every last one of their elements. To him, a linguistic identity always originates from the association of “heteroclite” elements. He argues that sciences are endowed with the task of investigating an immediate given, material object, a multiplicity of things that fall within the domain of the senses, are traceable by the senses, and testable through the senses. When it comes to Linguistics, quite the opposite is true: in order to illustrate this point, he uses the French word for “sea”, mer, as an example: a linguistic item can only exist so long as an idea is associated with the word mer. What is thus established is the first fundamental distinction between the vocal phenomenon and the real phenomenon itself. Form and concept are intrinsically related but separate entities nevertheless. Saussure defines language as a “heteroclitic object” and, as a consequence, linguistics is also a “heteroclite” field of study and, as such, needs to avail itself of the contribution of other related disciplines (Berruto, 2021).

Figure 2: The Object of Linguistics (University of Graz, Department of English Studies, 2023).
Dichotomies in the Study of Languages

One of the most important aspects by which the dual nature of languages is evident is in its manifestation, both in oral and written forms. The predominant dimension has long been established as the oral dimension: languages are orally transmitted and stem from the necessity to immediately and effectively communicate human needs. While the writing system is an apparatus that can be recruited to fixate it, it will always inevitably fail to keep up with the development of languages in their oral counterpart: it records changes and variations of languages relatively late as opposed to how fast the oral counterpart is able to produce and incorporate innovations within itself. Furthermore, while the ability to use words and sentences in the phonic-auditory dimension is biologically innate, the ability to write is acquired only through specific training and teaching: historically, biologically, and psychologically, it originates after the ability to speak. The etymology of the word “grammar” can cast some light to support the argument that fixed rules pertain more to the written dimension of languages than their oral one: grammar is and has always been intrinsically connected to the written dimension. The word “grammar” originates from the Latin, where it is found in the expression “ars grammatica”, which is in turn derived from the Greek "tekhnē grammatikē" based on the lexical item "grámma, grámmata” (letter), whose meaning is a combination of the actions of etching and writing. Grammar is the analysis and forms of functions. It is different from calligraphy, which is the art of writing graphic signs and is mostly concerned with its aesthetic: in grammar, meanings and forms are interconnected (De Mauro, 2008).

Another preliminary dichotomy is the distinction between the twofold semantic composition of the word “language”. In English, “language” has two primary meanings: one meaning has to do with the concrete languages spoken, and the many linguistic systems recorded all over the world, such as German, Italian, English, or Japanese. It is a countable noun, that is, it can be pluralized, as in "There are more than 7,000 languages spoken worldwide". The second meaning is more complex and refers to the fact of communicating; it is uncountable, as in the sentence "Gorillas have developed their own language". In this use, it cannot be pluralized, which makes it an uncountable noun. In German, the word Sprache has exactly the same structural meaning. These two usages of the word “language” in English are reflective of a tendency in romance languages, which differentiate the two meanings to the extent that they are represented by two separate, albeit similar, words: so we have lingua and linguaggio in Italian, langue and language in French (which is the distinction that the linguist Ferdinand de Saussure has employed, whose work in the field of linguistics still retains a fundamental relevance that will be explored in the following paragraphs), lengua and lenguaje in Spanish (De Mauro, 2008). What Saussure means by language is every expressive way and communication made up of a signal code with its own particular characteristics and defined by specifications of use (for instance, in the expression “animal language”). It is the universal, human faculty, biologically innate, to be able to communicate by way of signals especially made of words and sentences. Historical and natural languages constitute the many realizations of this ability via a specific language system, particular to a society in a circumscribable moment in history, usually equipped with its own tradition: without them, this ability would only remain an unrealized potential. Language is also unique; it is one innate ability shared by all humans as a species, whereas languages are several. Langue is, finally, the social product of language (Berruto, 2021). Language is an ability; it is the capacity, shared by all human beings, to develop a communication system founded on the characteristics mentioned above, whereas langue is the specific form that this communication system acquires in the communities. Languages are, therefore, natural objects that humans are exposed to and consequently acquired over the course of their growth and development; they are picked up, internalized, and learned. Humans are surrounded since birth by linguistic acts and manifestations; their mother languages are incorporated into the set of their cognitive abilities through the interaction with their context of applications, rather than studied (Graffi & Scalise, 2002).

The potential faculty of language, on the other hand, is a multi-layered system, a “system of systems”; every level is dependent on the others and strictly related to them. These levels are characterized by the fundamental components of every language system: phonology is the level of sounds; morphology is the level of words; syntax is the level concerned with sentence structure, and semantics is concerned with meaning. Each of these aspects will be more extensively explored in dedicated pieces. If language indicates communication systems that are specifically designed to transmit information from one individual, a sender, to another individual, the recipient, it is crucial to understand which properties characterize and distinguish languages from all other communication systems. Graffi and Scalise (2002) identify three characterizing properties: discreteness, recursivity, and dependency on structure. Human language is discrete, while, conversely, other types of languages (such as those of animal species) are continuous: in human languages, elements are distinguishable from one another by very clearly defined limits. So, we can have specific sounds belonging to a linguistic system, and these sounds are clearly different from one another by a specificity granted them by a set of characteristics: as an example, the sound /p/ is different from the sound /g/ in that it is labial—meaning produced with the lips—silent and plosive. Animal languages, on the other hand, can specialize the signal to adapt it to serving different functions and achieving a variety of objectives: bees adapt their dancing to their immediate communicative objectives, but their means of communication will still be dancing regardless of the conditions of actualization. Human languages are also recursive: they enable humans to create new sentences by encasing another one inside a given sentence, then another one, and so on: it is worth noting that the number of possible sentences in any language is potentially infinite, although it can never become concretely infinite: no one would have the time do do so. However, the fact that this can never be factually executed does not preclude languages from attributing to their speakers this competency.

Figure 3: Communication is Oral and Written (Clever Clips Studios, n.d.).

Discreteness and recursivity alone, however, do not account for a comprehensive description that pinpoints exactly what human languages are: there are other types of languages endowed with these two characteristics, most notably, programming languages: systems of notation to create commands in computers. The third identified characteristic is exactly what distinguishes human, natural languages from programming languages: structure dependency. Structure dependency explains how bonded every element in every utterance is, both from a grammatical perspective and from a semantic perspective: structure and meaning need to be unified. This can be illustrated by the notable example proposed by the linguist Noam Chomsky, who, in Syntactic Structure (1957), famously coined the utterance "Colorless green ideas sleep furiously" (p. 15) as an instance of how a sentence can be grammatically abiding by the rules governing the structuring of sentences in a language, but semantically inconsistent: it is by no means transparent what the content of this utterance is. This also allows for very complex relations: the form of words is not only determined by their succession in a sentence, but they are strictly dependent on a superstructural linguistic composition. Consequently, the form of words can be affected by other words that are very distant in the syntactic construction. This is evident in long and complex sentences: in "The girl we saw yesterday, who was in my yoga class, is also my brother’s neighbor", the choice of the relative pronoun “who”, which is used for people, is dictated by the subject of the sentence, a human. Chomsky (1977) highlights something similar as well. He defines “competency” as all that individuals know about their language in order to be able to understand and speak it; he identifies “execution” as everything that language users linguistically and concretely do. Programming languages, on the other hand, are structure-independent: the added value of each present element is determined exclusively by the adjacent elements. In light of all this, Graffi and Scalise (2002) define linguistics as “the scientific study of human language” (p. 18), thus making it a discipline that is fundamentally descriptive in nature. The meaning of the label “scientific” is twofold. For one, “scientific” is the formulation of general hypotheses that account for a multiplicity of particular facts: the objective is, in other words, to infer and formulate general rules from the observation of a limited, but comprehensive and representative, set of phenomena. In doing so, we will be able to explain other similar phenomena. Secondly, these hypotheses should be formulated in a clear, straightforward way.

Figure 4: The Complexity of Language Systems (Mind Map, n.d.).

Once the distinction between the language function in potentia and the ability to draw on this potential competency to create concrete utterances has been outlined, there is another important distinction to highlight, which, in turn, refers back to the fundamental juxtaposition of potential and its concretization: langue and parole ("speech"), usually employed in their French usage in Linguistic articles, according to tradition, as originally intended by Saussure (1911). His Course de Linguistique Générale is of paramount importance for the development of linguistics as a field of research, as it gives rise to many of the distinctions that are still referenced today (for instance, paradigmatic dimension and syntagmatic dimension, which will be explored below; the distinction between a language diachronically considered, that is, over the course of its historical existence, and synchronically considered, meaning an analysis of its characteristics and behavior during one precise moment of its existence). Parole is the linguistic execution by an individual; it is an individual act of communicative production. Langue is a social, abstract entity; it exists outside language users, it predates them and it survives them. Speakers of a language communicate through parole acts, but the foundation, the origin, and the ratio of those acts is the langue, which is the collective point of linguistic reference. Once again there is a distinction grounded in the abstract and the potential on the one hand, and actualization and concretization on the other hand (Graffi & Scalise, 2002).

Another important distinction, attributed to Ferdinand de Saussure (1911), is an opposition that, once again, is reflective of the inseparable coexistence of the two dimensions of the potential and that of its concretization. Saussure distinguishes between what he named signifiant and signe. Together, they form what he called the signifié. The most important unit of this triad is the signe, that is the “sign”, which is the union between the other two elements, or, in other words, is characterized by two elements: a signifié, or signifier, which is the vehicle by which a word or more complex meaning is expressed, whether through oral enunciation or written production. Signifier is the “phonic body” (De Mauro, 2008), which is constructed by combining and re-ordering in different ways a limited set of sounds. Signifié, or signified, is the concept, the immaterial notion that is being transmitted. Signs have been attributed to the following properties (Graffi & Scalise, 2002):

  • Distinctiveness (which has been mentioned in the previous paragraph in the comparison of the sounds /p/ and /g/).

  • Linearity, which indicates the extension over time and in space, for the oral and written dimensions, respectively. Linearity implicates the relevance of word order when taking language behavior into consideration: the utterances "My brother loves Annie" and "Annie loves my brother" are not only made up of the same number of words but are made up of exactly the same words. However, their meaning is different, because the order in which the elements are distributed is altered.

  • Arbitrariness is a crucial property of languages. The association of signifier and signified is arbitrary, in that it is not dictated or motivated by any traceable parameters or logical reasoning: it is entirely up to the discretion of the speakers during the genesis of new lexical items or expressions. There is no natural law that binds speakers to assign to a certain sequence of sounds the representation of specific concepts. For instance, there is nothing ontologically innate in the animals that are labeled as “dogs” that somewhat forced humans to extract the sounds “d”, “o” and “g” and use them to linguistically represent dogs: the word “dog” is but the product of a social convention. The arbitrariness of signs can be easily demonstrated by pointing to the immense quantity of languages that exist and have always existed: they exhibit a very high degree of lexical variation and may make entirely different choices of vocabulary ideation. In other words, in the absence of arbitrariness, the word for the domestic animal, which is also sometimes raised as a support for hunting, would be “dog”—or something very similar—across all languages, as this would mean there is something intuitive about dogs that calls for the humans to use that term exclusively. However, one glaring example that contradicts arbitrariness is represented by onomatopoeias. These are words whose sounds are intuitively reminiscent of the object in reality that they refer to: so we can have words like "boom" belonging to the semantic field of explosions, “tweet” to represent the chirping of birds, “popcorn” to represent the way this food is made —corn “popping”. However, it should also be noted that onomatopoeias are, in turn, strictly dependent on the linguistic system they arise in so, the cry of the dog is bau in Italian, but it is “woof” in English, ouaf in French, and guau in Spanish. In short, not even onomatopoeias resemble each other across languages.

Figure 5: Language is a Biologically Innate Competency (Keystone Academic, 2016).
Paradigmatic and Syntagmatic Relations

Another fundamental distinction in Linguistics, which is owed to Ferdinand de Saussure, is the relations that occur on the syntagmatic level and the paradigmatic level. The latter has been defined as the level of linguistic relations in absentia: by selecting some paradigms while producing utterances, speakers automatically rule out others. When producing a sentence, speakers will be more and more bound by the choice of words they have so far made. Thus, on the meaning level, a sentence can begin with “My”, and then “cat” or “dog”, but the verbs will be bound to what dogs and cats do or how they behave, or, more generally, situations that are regularly applicable to them, excluding from the composition of the sentences activities that are, for instance, human. Because the choice of some linguistic elements entails the exclusion of others, paradigmatic relations are said to be vertical. Syntagmatic relations, on the other hand, are horizontal, and they signify the relation between in presentia elements, that is, during the enunciation. They mainly encompass grammar cohesiveness: the suffixes, for example, will be bound by what comes before, as the sentence needs to be grammatically cohesive. This linguistic principle, whereby words take on specific forms depending on the grammatical context they appear in, is called “agreement” or “concord”, which is relevant in the grammar of some languages, while others display this feature to significantly lesser degrees: so Japanese, for example, does not have the subject-verb agreement, and because English does not classify its lexical items by gender, there will be no agreement between nouns and adjectives: in the utterance "I’ll grab my blue skirts and my black boots", the possessive adjective “my” and the adjectives “blue” and “black” do not change in form depending on the nouns they are associated to. They would, however, change in languages that feature grammatical gender for their nouns, such as Italian: in "Prendo le mie gonne blu ed i miei stivali neri", the aforementioned adjectives are declined by gender and plural (in Italian, “skirt” is a feminine noun and “boot” is a masculine noun). One language unit, any language unit, has syntagmatic relations with the adjacent forms at the parole level but has paradigmatic relations with the absent units that had an equal potential to be concretized in that utterance.

Figure 6: Paradigmatic and Syntagmatic Relations (Lehmann, 2023).
Study Approaches and Theories of Language

Because some preliminary notions regarding the theory of languages have been established, it is worth outlining a brief history of the development of linguistics as a field, lingering on the most momentous contributions to the discipline. In doing so, the specialized vocabulary employed forms the subcategory of lexis that has been called “metalanguage”, which is the group of linguistic items—labels, names, dedicated vocabulary—created with the specific purpose of discussing language itself, which is the language that researchers employ when discussing linguistics. Grammar, phonetics, accent, and pronunciation are all instances of metalanguage (De Mauro, 2008). Considerations on the nature and properties of languages date as far back as ancient Greece and Rome, and the research around languages has developed over time and has shifted its focus on a variety of different points and aspects, such as lexis, grammar, the realization that languages evolve and categories such as language families can be identified. Western tradition is characterized by a continuous, philosophical reflection on language. What had commenced was an immense work of description and analysis of single languages, and what had arisen was an awareness of linguistic analysis, an appreciation of their grammatical and syntactical structures, their lexis, their phonology, their variations in space and time, among other relevant properties. Thus, it is in the first half of the 19th century that the use of words such as "Sprachwissenschaft" or "Linguistik" in German, "Linguistique" in French, "Linguistics" in English, and "Linguistica" or "Glottologia" in Italian begins (De Mauro, 2008).

The sub-categories of fields of linguistic research can be classified according to their approach or the object of study they circumscribe: historical linguistics, concerned with the development of languages over time; comparative linguistics, dedicated to the reconstruction of language stages in previous historical moments; descriptive linguistics, which is about single language systems or language families; linguistic typology, which is concerned with the classification of languages into types according to some parameters behaviors they exhibit; general linguistics, which seeks to find the constants across language systems; and theoretical linguistics. More recently, subfields such as neurolinguistics, concerned with the relationship between language and the structure of the brain, and psycholinguistics, which deals with the psychological processes that enable humans to use language, have contributed to specialized scopes of research. General linguistics offers its own perspective on the definition of linguistics, narrowing the focus of linguistics to the study of the constants present in every language and in every point of the linguistic reality and, complementarily, the study of the modality of linguistic representations of these constants (De Mauro, 2008). In this respect, it is worth anticipating the work of linguist J. Greenberg (1963), whose research has most notably contributed to the identification of the common characteristics shared by as many languages as possible. From a group of 30 languages, he identified and listed a set of common traits that he termed “universals”, which are mostly concerned with commonalities in morphology and syntax-lexical items and word order in sentences, respectively.

Figure 7: Language Universals are Traits or Features Shared by Languages (University of Arizona, 2017).

While language study is rich in traditions and inspired investigations, two macro-approaches to language theory have been identified, which are distinguishable by the function they attribute to languages: formalism and functionalism. Most other studies can be said to refer back to either one of these tendencies in that they summarize the two major, distinct conceptions of what language is and what purposes it serves (Gobber, 2003). Formalism is a label that comprises linguists such as Ferdinand de Saussure, the current of structuralism, which holds that everything starts from a structure, i.e., meaning can only exist because languages exist and make sense of reality in the first place, and the current of generative grammar, whose most notable representative is Noam Chomsky. The idea behind generative grammar is that all humans share the same capacity for communication, and it is this very capacity that forms and informs the grammar in languages. Chomsky indicated this faculty as “universal grammar”.

Formalism frames language as a result of mental function and which replicates in its structures what in reality originates in thoughts: language, in other words, derives from internal, neurological factors. In short, the form is completely independent of the function: the two bear no relation whatsoever with each other. Functionalism, on the other hand, envisages languages primarily as a communication tool. Language is strictly dependent on external factors, forged on every occasion by the users, and strictly dependent on the situations of use: the function determines, or heavily affects, the form according to cognitive and socio-pragmatic contingent necessities. Languages exist only to serve specific purposes, such as information reporting and mood expression. Michael Halliday, for instance, conceives of linguistic exchanges as episodes of communication that occur within the scope of social purposes. Because language is a tool developed to serve specific objectives, Halliday identifies seven separate functions that can justify the use of language: instrumental, regulatory, interactive, personal, heuristic, representational, and imaginative (1975).

Similarly, Roman Jakobson (1960) lists six different functions of language, each of which is associated with one of the identified participants of the linguistic episode. The speaker is the first participant that Jakobson identifies, which would be the role of the prime sender of the message; its corresponding function is, accordingly, the emotional or expressive function, which realizes itself when the speaker indulges their will to communicate what is inside them. Reference is the second participant in the list, which signifies the object of the communication; this term refers back to the extralinguistic reality that is being linguistically described and is corresponded by the referential function, which is informative in its nature (for example, utterances such as "The train leaves at 6 pm"). The third component of utterances is the message, which is the content itself: its function is the poetic function. The channel is the entity through which communication is happening (air, usually, but this context can also be for telematic means, for instance). It is representative of the phatic function, which can be tested through sentences such as "Can you hear me?", which are intended to check the effectiveness of the channel of communication. The code is the fifth element and is endowed with the metalinguistic function: language that aims to talk about itself. Finally, the listener is the recipient of the message, characterized by the conative function or directive function when the linguistic utterance has the explicit or implicit intent of affecting in some way the recipient’s behavior, i.e., rulebooks and guidelines are created with the express purpose of indicating proper behaviors and sanctioning bad conduct and commands are an explicit imposition of a specific desired behavior on the listener’s part.


Languages are complex entities, and pinning down their fundamental characteristics in a comprehensive definition is no easy task. Additionally, several conceptions of what languages are have been proposed over the course of the existence of linguistics as a discipline. However, the nature of languages can be said to be fundamentally twofold: language is both the human capacity and predisposition for self-expression and the concrete linguistic systems developed to support this internal need for communication. Languages unfold over oral and written dimensions (although the oral precedes the written and the written necessarily follows the oral) and are massive systems that constitute a multi-layered communicative tool, and this tool serves the purpose of conveying abstract, immaterial, intangible concepts: they are made of content and forms.

