Linguistics in Amsterdam 4-2 (september 2011)

Esperanto: a language made transparent?[1]

Wim Jansen

Amsterdam Center for Language and Communication.


Esperanto is a so-called artificial or planned language, designed for use as a universal L2 in international communication. Common sense tells us that a high level of transparency, i.e. of one-to-one projections between different levels of grammatical analysis, should be beneficial to such a language. The notion of grammatical transparency was unknown to its author, Zamenhof, and if about 75% of the features reviewed in this paper are transparent, this is not due to a deliberate design philosophy, but to Zamenhof’s empirical and intuitive labor. The fact that no opacity is found among any of the phonological features testifies his endeavor to create an easily speakable and understandable language for people with many different mother tongues. Of great interest are certain diachronic developments in the use of the language, revealing a trend towards a wider exploitation of built-in transparencies. Particular reference is made to the freedom in predicate building, not constrained by the semantic categories of the selected lexemes. The price that is paid for this freedom is an increased level in semantic indeterminacy.

1 Introduction

The Esperanto project was published in Warsaw by Ludwik Zamenhof in 1887 (Zamenhof 2004/1887). It was Zamenhof’s goal to offer people with many different mother tongues an easy tool to communicate with each other on politically and psychologically equal terms. Following the publication of a skeleton language, an international speech community slowly developed, first in the Russian Empire and Eastern Europe, soon to be followed by organized movements in other European countries. Today, Esperanto is used as a written and spoken language in more than 120 countries around the globe by people with a great many different mother tongues. Probably a few tens of thousands of people are conversant in the language. The centre of gravity of Esperanto is still in Europe, but there are also relatively strong communities in Asia (particularly in China, Japan and Korea) and in South-America (e.g. Brazil).

Meant to be a contact language that should be easily accessible to everybody around the world, perhaps a high level of built-in transparency should be expected from the linguistic system of Esperanto. Most speakers learn the language as an L2 (or better: a non-L1). There is a small number of native speakers in the speech community (all of them being bi- or trilingual since early childhood), but since they play the same part in defining the standards of the language as the great majority of non-native speakers, their existence does not change the typical L2 status of Esperanto. It is an observable fact that the language has freely developed over the more than 120 years since its birth. Hence, it cannot be excluded that the consecutive generations of speakers have developed certain transparency-enhancing strategies or disambiguation techniques that may have become part of the structure of the language. On the other hand, the increase in the overall use of the language may have caused speakers to develop techniques to improve the efficiency of communication, a development that would go hand in hand with a decrease in transparency. A systematic mapping of transparencies in a diachronic context may uncover signs of these two counter-currents.

A problem associated with studying Esperanto is how to tackle the question of describing a language without native speakers. Prescriptive material (‘how to do things correctly, because that’s the way Zamenhof did it’) is abundantly available in Esperanto, and some of it is very detailed (e.g. Kalocsay and Waringhien 1980, Wennergren 2005). In the absence of native informants (at least: authoritative native informants) with their hidden knowledge and intuition we resort in this paper to (1) the formal baseline documents of the language, in particular Zamenhof (2004/1887 and 1963/1905); (2) contemporary quality grammars like Wennergren (2005); (3) the actual language usage in electronically accessible text corpora of material published over the last century. Since the initial codification of the language in (1) was not very detailed and, in what it did cover, not very explicit, the grammarian’s support from (2) and the user’s interpretations from (3) are indispensable in the research. Still dearly missing are empirical data concerning the usability and interpretability of linguistic structures that do not necessarily exist but could exist, given the formal characteristics of the language. To obtain these, more extensive field work would be required.

2 Interpersonal-Representational


Cross-referencing occurs when person marking on the verb, even though sufficient by itself, is optionally expanded by a lexically realized argument, e.g. when one single participant is evoked at the interpersonal level by two referential subacts, one bound to the verb and one free. In Esperanto, however, there is no person marking on the verb at all, and argument positions are obligatorily filled by lexically realized items. In (1), the clause ‘X came’, with ‘X’ as its single argument, reads in Esperanto:[2]

in which X stands for mi ‘I’, vi ’you’ (sg or pl) or any other nominal or pronominal subject. The finite verbal word venis ‘came’ is invariant with respect to person, number and gender of X. There is no cross-referencing in Esperanto, which is transparent in this respect.


Apposition occurs when two referential subacts at the interpersonal level evoke a single individual or entity. The example (2) below, with the appositional elements in bold print, is representative of many such examples quoting the public function and surname of the single individual concerned:

(Monato 2003/10: 9)

The two inputs from the interpersonal level materialize at the morphosyntactic level as two NPs, la usona prezidanto and Bush (the former through the representational level and the latter directly), referring to one single individual and therefore constituting a case of apposition and non-transparency.

In the following instance of apposition, anaphoric reference is made to two participants together, in this case of the third person plural. Such a reference is possible by means of ili ‘they’ or ambaŭ ‘both, the two’, but also by the two of them in apposition: ili ambaŭ ‘both of them’. In (3), the combination ili ambaŭ in bold print refers to ‘a townsman’ and ‘an acquaintance of his’, introduced in the paragraph preceding this clause:

(Zamenhof 1933: 79)

There is indeed apposition in Esperanto, which is not transparent in this respect.

Limitations on which semantic units can be chosen as predicates

In a fully transparent language, only pragmatic and semantic information should determine the choice of formal units. Under such circumstances, all semantic units should be usable to form predicates, no matter whether they concern events, individuals or properties.

A few words of introduction into the Esperanto morphology may be helpful to understand this paragraph and 3.3 below. In Esperanto, lexemes as the basic semantic units are clearly discernible, but their status is still a debated issue.[3] This is to some extent due to the fact that the first Esperanto vocabularies were defined as lexeme sets with word translations in five languages with differentiated parts-of-speech systems, i.e. English, French, German, Polish and Russian (see both Zamenhof 2004/1887 and 1963/1905). Most lexemes are bound lexical stems and cannot be used morphosyntactically without attaching to them the appropriate inflection that marks them for the slot in the clause they are destined to fill, i.e. the head or modifier position in a nominal or verbal phrase. Simply equating nouns, adjectives and verbs in the translations to their corresponding lexemes in Esperanto made it tempting for grammarians to classify these as specialized or categorial. According to rule 7 of the original grammar in Zamenhof (2004/1887), adverbs make use of the same lexemes as adjectives and differ from these only in their specific inflection, which becomes -e in lieu of -a. In other words, the Esperanto parts-of-speech (p.o.s.) system looks at first sight like a ‘type-3 flexible’ system, using specialized lexemes for nouns, verbs and modifiers (Hengeveld and Mackenzie 2008: 228).

However, this specialization is contradicted as early as in the language’s baseline documents themselves. The apparent verb-class specialization of certain lexemes would suggest that these are the only ones destined to fill predicate positions in a clause. The original presentation of the grammar points in this direction by showing the application of the lexemes far ‘do’ (see 4 below; the infinitive is fari) and kant ‘sing’ (see 5 below; the infinitive is kanti):

(Zamenhof 1994/1887: 37, 38)

Copula support is also evidenced among the first examples as illustrated by:

(Zamenhof 1963/1905: 85)

On the other hand, Zamenhof (1963/1905) shows that predicate positions are not obligatorily filled by lexemes that are translated as verbs. Among the 55 pages of practical exercise material in this baseline document we find reĝ-i, hero-i, sign-i, nom-i with the infinitive marker and nom-is with the past tense marker. They are associated with the lexemes reĝ ‘king’, hero ‘hero’, sign ‘sign’ and nom ‘name’, all given in these noun translations. We also find kuraĝ-as and san-i with the present tense and infinitive marker respectively, associated with the lexemes kuraĝ ‘brave’ and san ‘healthy’, both given in these adjectival translations (hyphens have been added in all cases for greater clarity). The indiscriminate use of lexemes in predicate positions, independent of what their word translations in other languages may be, has increased over the years, as may be illustrated by the following examples. In (7) the lexeme makul, translated as the noun ‘stain’, and in (8) the lexeme verd, translated as the adjective ‘green’, are both highlighted in bold print as they appear with the verbal tense endings as and is:

(Zamenhof 1925: 64)
(Waringhien 2002: 1224)

This trend extends to other lexical units, e.g. the temporal adverb baldaŭ ‘soon’ (which is a free lexical stem) in (9) below:

(Monato 2003/4: 14)

However, alternative solutions with copula support are often perceived as more normal by many speakers, probably because of their mother tongue habits. Examples of such constructions are provided in (10) with the predicative noun makulo contrasting with (7), and in (11) with the predicative adjective verda contrasting with (8):

(Waringhien 2002: 704)
(Zamenhof 1992: 85)

Frequently, predicates built on lexemes that are not translated as verbal words, have acquired a meaning that cannot be inferred from a straightforward conversion to a copula construction. Starting from lexemes with a nominal translation: the verb akvi from akv (akvo is ‘water’) appears to have the single definition ‘provide plants with the water they need’ (Waringhien 2002: 66). The verb loki from lok (loko is ‘place’) shows a variation on this theme: ‘to find a place for something or somebody’ (in other words ‘provide a place’ (Waringhien 2002: 689)), but also has a second meaning, which is ‘to put in a place’ (whereas akvi definitely does not mean ‘to put in the water’). Also the verb patri from patr (patro is ‘father’) has two quite distinct definitions: ‘to be a father’ and ‘to be like a father’ (Waringhien 2002: 849). The three verbs have acquired specific, more elaborate meanings, which are not automatically inferrable from a copula construction like ‘to be water/place/father’.

Speaking of lexemes with an adjectival translation: the verb beli from bel (bela is ‘beautiful’) is decribed as ‘to look beautiful’ (Waringhien 2002: 148). The verb rapidi from rapid (rapida is ‘fast’) shows three options: ‘to try and reach a destination in a short time’, ‘to complete an act in a short time’ and ‘to do something without delay’ (Waringhien 2002: 952). The verb verdi from verda ‘green’ (see also 8 above) means ‘to be intensely green’ (Waringhien 2002: 1224). The definitions of beli and verdi can be seen as intensifications of the more neutral copular constructions to be beautiful and to be green. Rapidi is different in that it displays three very specialized applications, without reference to a generic ‘to be fast’, which can, however, be inferred from the three as a kind of common denominator (though not explicitly mentioned in Waringhien 2002).[4]

The observations above illustrate two trends, which point in opposite directions. On the one hand, the spontaneous and growing use of all kinds of lexemes in predicate positions, which can be seen as an attempt to exploit a transparency that was potentially present in the language, but initially hidden due to the extensive calquing of syntactic models by speakers of Esperanto’s Indo-European reference languages. On the other, the meaning that was assigned to many of the new verbs, which goes far beyond that of the paraphrase based on the associated copula construction. It is often unpredictable to the point of being idiomatic and only explainable as a semantic calque from one or more reference languages. This increase in semantic indeterminacy is, however, not the issue under debate in this chapter.

In allowing all kinds of semantic units to be used in predicate positions, Esperanto is transparent.

3 Representational-Morphosyntactic

Grammatical relations or pragmatic/semantic alignment only

In Esperanto, the morphosyntactic organization of the clause is not exclusively driven by higher level pragmatic and/or semantic criteria. In what follows, the influence of syntactic functions on the alignment of items in a clause will be demonstrated. Other factors like pragmatic functions or phonological criteria (constituent weight) are not addressed here.

We choose an example from the conjugation system. Esperanto distinguishes between the active voice, the passive voice and the middle voice. Compare the three expressions (12), (13) and (14) below with the different verbal constructions in bold print:

In (12), the syntactic subject S coincides with the semantic actor A, and the object O with the undergoer U; (12) is typical of two-place predicates and often, though not necessarily, of volitional acts. In (13), the original U is promoted to S level and the original S either disappears or is demoted to become an oblique actor expressed by de mi; (13) is typical of one-place predicates and often, though not necessarily, of volitional acts. Example (14) maintains the S-U coincidence, but is hardly capable of expressing a possible actor; it is typical of one-place predicates and often, though not necessarily, of non-volitional acts. The two-place property designated by the lexeme ferm ‘to close’ materializes in (12) as the transitive verb fermis in the past tense. In the characteristic passive construction of (13), it appears as the participle fermita with copula support by estis. In (14), it requires obligatory modification by the intransitivizer suffix - before its application as fermiĝis in a one-place predicate.

A semantically determined alignment of the components of the clause would be based on the semantic functions carried by the arguments. The order displayed in (12) through (14) is AVU, UV and UV, hence not consistent. Taking the syntactic functions of the arguments as a basis, we notice the following sequences: SVO, SV and SV, hence consistently SV(O). In addition to this, the n-marking only applies to the syntactic object in two-place predicates, whereas actor or undergoer marking as such are non-existent. Morphosyntactic conditioning plays an important part in determining the linear organization of the components of a clause. In this respect Esperanto is not transparent.


Since all compounding, affixing and inflection is realized by a concatenation of invariable lexemes and morphemes, fusing of boundaries between items does not occur. Each affix and inflection expresses one single function only and is realized either as a prefix or as a suffix. No a priori discontinuities are identified and the language may be called transparent from this point of view.

Sensitivity of function marking to the nature of input

As an example we take the Undergoer function U, which, when carried by a noun phrase that coincides with the syntactic object, is morphologically expressed by attaching the inflection n to the head noun and any adjective-type modifiers, but remains unexpressed when it is carried by a clause. The latter case is illustrated in (15) below, in which the object clause ke mi estas diligenta is in no way marked:

(Wennergren 2005: 453)

Example (16) below is used in the grammar handbook Wennergren (2005) in conjunction with (15) with the specific purpose to emphasize the Undergoer role of the object clause. Whereas the clause carries no marking by n, the abbreviated expression below clearly does:

(Wennergren 2005: 453)

It should be clear from these examples that Esperanto is not transparent in this respect.

4 Morphosyntactic

Expletive elements

Esperanto does not use dummy elements in positions for which there is no interpersonal or representational material. For weather conditions we find e.g.:

(Monato 2003/7-8: 22)

In (17) there is no semantic referential argument and Esperanto does not require a dummy subject in zero-place predicates such as the one highlighted in (17) by neĝas in bold print (neĝi is the infinitive verb ‘to snow’).

The existence or presence of an entity may be expressed by a lexical verb in clause-initial position (or, at least, before the constituent corresponding to the referential subact). In (18), the constituent homo tre malbona kaj peka ‘a very bad and sinful man’ is preceded by the lexically expressed ascriptive predicate vivis ‘lived’:

(Zamenhof 1933: 63)

In cases like (19) we notice the use of the copula esti ‘to be’ as a dummy predicate preceding the referential subact:

(Zamenhof 1933: 37)

A dummy element like ‘there’ in the translations of (18) and (19) does not exist in Esperanto.

In (20) below, the complex subject clause ke … registaroj ‘that … governments’ requires no dummy element to fill the vacant S slot before the copula estas, and the predicate (estas) bedaŭrinde ‘is regrettable’ remains impersonal:

(Monato 2003/3: 11)

Esperanto is not entirely transparent in terms of not using expletive elements, in that it does not require dummy arguments, but may require dummy predicates in the form of the copula est.

Tense copying

We refer to tense copying or consecutio temporum, when the tense operator of the main clause is copied to an embedded clause, while the meaning (i.e. the location in time) of the embedded clause is not affected. This does not occur in Esperanto, as is shown in (21a) and (21b) below:

(Zamenhof 1963/1905: 100)
(Zamenhof 1933: 91)

If we replace the subordinate clauses introduced by the conjunction ke in (21a) and (21b) by direct speech expressions, the speakers would use the present and future tense operators -as and -os respectively. In indirect speech this is maintained (see the elements in bold print above), expressing that the act of loving in (21a) coincides with the time frame defined by the moment of telling in the main clause, and that the act of returning in (21b) is supposed to happen (or, rather, not to happen) later than in the time frame defined by the period of thinking in the main clause. Tense operators in subordinate clauses are selected in agreement with the intended meaning and are not subject to any tense copying rule. Esperanto is transparent in terms of not resorting to tense copying in embedded clauses.


We refer to raising when a constituent that semantically belongs to a subordinate clause appears in the higher level clause. In Esperanto, an example of raising is found in (22):

(Kalocsay & Waringhien 1980: 305)

Here, the manner adverb kiel ‘how’ inquires about agu, the expected ‘way of acting’, and thus belongs semantically to the subclause. Since interrogative clauses with non-polar questions should be introduced by a question word, kiel is raised to the initial position in the main clause. Another example, one level deeper, we find in (23):

(Kalocsay & Waringhien 1980: 305)

In this case, kion ‘what’ inquires about mi faru ‘the thing I should do’, in the second order subclause, but is raised one level up for reasons similar to those in (22) above. In (24) below there is focal contrast between a recently lost opportunity in the preceding clause (La unuan ŝancon mi fuŝis ‘The first opportunity I spoiled’) and the chance of being offered a new one soon, expressed in the subclause of (24) introduced by the conjunction ke ‘that’. Triggered by this focality, the modifier nun ‘now’ appears fronted and raised to the main clause:

(Monato 2003/10: 24)

Raising does occur in Esperanto, which is not transparent in this respect.

Grammatical gender, declination, conjugation

There is no grammatical gender in Esperanto. There are no morphosyntactic processes affecting nouns or verbs in their own right (we do not refer to agreement) that are triggered by other than semantic factors. Esperanto is transparent in this respect.


There is no agreement associated with the use of tense and mode markers in the conjugation, which mark the finite verb for tense or mode only, without any reference to the arguments involved.

There is number and case agreement in the Esperanto declination. In (25) the subject vi ‘you’ refers to the vocative sinjoroj ‘gentlemen’ and is therefore to be interpreted as the plural vi (vi is like ‘you’ in English and can refer to one or more addressees). The plural marker j attached to the predicative adjective neĝentila ‘impolite’ in (25) illustrates number agreement:

(Zamenhof 1963/1905: 98)

Number (j) and case (n) agreement between heads and adjective-like modifiers in noun phrases is shown in (26) and (27):

(Zamenhof 1963/1905: 88)

with a plural-accusative noun and adjective in (26) and a plural noun, quantifier and possessive in (27). Concentrating on the number agreement, we find a comprehensive overview in Allée and Kováts (2007), from which we quote the following:

(Allée & Kováts 2007: 4)

in which we have a set of individuals of one class (‘elephants’), consisting of a subset that is qualified ‘grey’ and apparently consisting of one element griza only, and of another subset qualified ‘white’, also consisting of one element blanka. Hence, we are necessarily dealing with a set of two individuals elefantoj with different qualifications.

(Allée & Kováts 2007: 4)

in which we have a set of one kato ‘cat’ and one hundo ‘dog’, but both qualified brunaj ‘brown’.

(Allée & Kováts 2007: 4)

In (30) we are dealing with a number of bestoj ‘animals’, of which one is ruĝa ‘red’ and two are defined as bluaj ‘blue’ (therefore the entire set consists of three individuals). Esperanto features a relatively complex system of number and case agreement and is not transparent in this respect.

Fusional morphology

There is no fusion in Esperanto.[5]

5 Morphosyntactic-Phonological

Parallelism of phonological phrasing and morphosyntactic phrasing

For a language to be completely transparent, every unit at the morphosyntactic level should correspond to one unit at the phonological level. Hence, a morphological word should correspond to a phonological word. This is the case in Esperanto, in which the primary stress is placed on the penultimate syllable of the morphosyntactic word, independently of its build-up or complexity. The following examples of words with two, three and six syllables with the stressed syllables underlined are from Wells (2010: 364): kuko ‘cake’, kolbaso ‘sausage’, fromaĝosalato ‘cheese salad’.

When several words are grouped into a phrase, the location in the phrase that is selected for primary stressing coincides with that of the primary stress on the head of the phrase. Other primary stress accents on words within the phrase, if affected, are not shifted, but only weakened.

In this respect, Esperanto is fully transparent.

Influence of phonological weight on morphosyntactic placement

When the alignment of items in the clause, ideally defined by interpersonal or representational criteria only, is ‘corrected’ by phonological weight criteria, we are dealing with conflicting inputs, disrupting the full transparency of the language in this respect. This is clearly the case in Esperanto, where ‘heavy’ (multisyllabic) items tend to be moved to the end of the sentence, and light-weight items are so mobile that they display a tendency to abandon their designated slots to move into positions more to the left.

When submitting different groups of Esperanto speakers to tests involving their preferred placement of nominal and pronominal subjects and objects with respect to the verb (Jansen 2007), it appeared that the expression ‘The student is reading the book’ with the nominal O ‘the book’ la libron was built up as in (31):

(Jansen 2007: 194, 203)

with a 100% SVO score, whereas ‘The student is reading it’ with the light-weight pronominal O ‘it’ ĝin showed a decrease to 87% SVO, complemented by 13% SOV as in (32):

(Jansen 2007: 194, 203)

Similarly, literature surveys show that sequences of A and N, and of O and IO are susceptible to being inverted from the default AN to NA and from the default O-IO to IO-O when A respectively O are particularly heavy (Jansen 2007: 171-73, 176-79). Hence, morphosyntactic placement is susceptible to phonological weight and Esperanto is not transparent in this respect.

6 Phonological


Under the influence of mother tongue habits, speakers of Esperanto may show an inclination to sandhi in this or that sound environment. An example would be Mi vidas ĝin ‘I see it’, in which the final s of vidas ‘see’ is pronounced as [z] due to voice assimilation (ĝ is /dž/). Though tolerated as long as it does not lead to confusion or misunderstanding, incorporating sandhi in one’s pronunciation is seen as a deviation from the desired standard. Textbooks recommend speakers to stay as close as possible to the pronunciation rules for individual graphemes, no matter in what environment they occur, and to maintain the one-to-one relation between written and spoken signs. We may call Esperanto transparent in this respect.


As was the case with sandhi, under the influence of mother tongue habits, speakers of Esperanto may show an inclination to fuse two subsequent identical consonants into one where they occur on morpheme boundaries (there are no double consonants within simplex words). Examples would be the compound huffero (huf + fero) ‘horse shoe’ and the derivation ekkoni (ek + koni) ‘to get acquainted’. Nevertheless, it is good practice to keep the two components of a geminate apart by inserting a short pause between them or delay their release beyond the point of release of the single consonant. We may call Esperanto transparent in this respect.


Esperanto has phonologically relevant descending diphthongs, which are sequences of a full vowel followed by a shortened /i/ realized as [j] or a shortened /u/ realized as [w]. The spelling rules prescribe the use of the letters j and ŭ in these positions: aj lo ‘garlic’, to ‘car’, ropo ‘Europe’, etc. Textbooks strongly emphasize the need for separate full vowel realizations of two colliding vowels, e.g. in bal ai s ‘swept’, pr au lo ‘ancestor’ and n eu tila ‘useless’. Experience shows that diphthongization of the kind *[ba'lajs] instead of the correct [ba'la-is] does not occur. Esperanto is definitely transparent in this respect.


Nasalization does not occur in Esperanto.

Vowel deletion

Vowel deletion does not occur in Esperanto.

