One of the typologically puzzling things about Arabic, and Semitic languages in general, is that /i/ and /u/ very often contrast with /a/, but hardly ever with each other. This is usually an indication that these are allophones, but this explanation can not be held if these vowels can't freely interchange, and are perceived as separate vowels.
Although this issue is an issue in the whole of Semitic, as far as I am aware, I am most familiar with Arabic, so I'll stick to using examples from this language.
Of course, there is one extremely productive pattern of 'minimal pairs' of vowels in the form of case endings.
Nom. rajul-un
Gen. rajul-in
Acc. rajul-an
So, sure they seem quite phonemic in that context. But what I find puzzling is that in stem formations we can't find u and i to contrast normally.
To further research this I have made a table of the distribution of Arabic vowels in CVCVC roots. The table looks as follows:
V1 \ V2 | a | i | u | ā | ī | ū |
a | + | + | + | + | + | + |
i | + | - | - | + | - | - |
u | + | - | + | + | - | + |
ā | - | + | - | - | - | - |
ī | - | - | - | - | - | - |
ū | - | - | - | - | - | - |
Several notes can be made about this table. I shaded the entry CaCiC, since it is difficult. The only word I can think of is malik 'king' (although doubtlessly there are more). Some people will probably know that this word is related to Hebrew mĕlĕḵ which paradoxically points to a CVCC root. Is malik perhaps from *malk with an epenthetic vowel? It is very reminiscent of dutch melk 'milk' which by many people is in fact pronounced [ˈmɛ.lǝk] rather than [ˈmɛlk].
Another thing that is strange is that, of the long vowels, only ā can occur in V1 position, and exclusively if it is followed by the vowel i. Could it perhaps be that the CaCiC is indeed from *CaCC, and that CāCiC represents the orignal *CaCiC?
If this were true, the table of vowel distribution would look a lot more elegant.
V1 \ V2 | a | i | u | ā | ī | ū |
a | + | + | + | + | + | + |
i | + | - | - | + | - | - |
u | + | - | + | + | - | + |
There is an enormous problem with this reductionist approach though. The Vowel pattern CāCiC is associated with a meaning of nomen agentis. It is quite productive, from the word kataba 'to write' we can form kātib 'writer'. That would be fine, if it wasn't that Hebrew has this exact same pattern. Hebrew has the verb ṣāfăr 'to count' besides ṣôfēr 'scribe, writer (litt. 'counter')' (ô < *ā, ē < *i). If we assume that CāCiC is from *CaCiC this must have been a common shift for Arabic, Hebrew and I've been told, also Aramaic. Could someone with knowledge of Akkadian/Ethiopian Languages let me know whether this pattern exists and whether it has CāCiC or CaCiC?
So, after the discussion on CaCiC, let's continue regarding this vowel table. Maybe not completely surprising, but for allowed vowel distributions, Arabic disregards vowel length. CiCiC isn't allowed, whether the second i is long or not. Same goes for the other disallowed vowel combinations. I wonder what this implies. I have no experience with languages that have long vowels and limitations on their distribution, so I'm not sure what scenario is typologically plausible.
It is good that I made this table, for it has shown me some stuff that I was previously unaware of. I was under the impression that the distribution of u and i was identical, but I have found absolutely no examples of words with CiCiC, while CuCuC is in fact quite a common plural formation. As I knew before writing this combinations with i and u in one root are impossible, which is mysterious. It almost looks like a sort of 'vowel disharmony' if I may coin that term.
I had written a large post of a proposal of a fourth proto-semitic vowel *ǝ , that would be affected by its surroundings, but often simply surface as a or i. But once I put the distribution into a table, I became uncertain if such a proposal would be feasible, and threw away most of this post.
It is true that i and also u sometimes have schwa-like properties, if malik indeed comes from *malk that's obviously an example, but there's even more readily available examples in the form of the 'alif al-waṣl. When a Arabic word starts with a CC cluster a vowel is placed in front of the first consonant to make the cluster pronounceable. For example *sm 'name' becomes (i)sm. When a vowel proceeds it, this vowel is lost again, it is purely epenthetic. When the root contains no vowels, or an a or i the value of the 'alif al-waṣli is i. But if the following vowel is an u the 'alif al-waṣl is also u as in *drus > (u)drus 'learn!'. This is in fact an example of vowel harmony. There are some nouns violate this rule though like (i)mru'' 'man'. Another strange thing is that the a in the definite article (a)l behaves just like 'alif al-waṣl except that it is always a in isolated pronunciation. Nevertheless it is quite obvious that this alif al-waṣl must have come from a subphonemic *ǝ.
Another example of a *ǝ is the i that is often used to break up clusters in a sentence especially the apocopate verb often needs an extra i places in between its final consonant and the following word.
If there was a *ǝ in the middle of words, would that help to explain the distribution of the vowels? It might, if we assume that all i were in fact *ǝ we would understand which CiCuC and CuCiC do not occur, since the u would have affected the *ǝ to become an u. But it still does not explain why CiCiC and CiCīC unless we assume that *ǝ and *ī turned a preceeding *ǝ into a. Such an explanation is entirely ad-hoc. Although it might be true, there is no indication that it was like that, and we would need comparative evidence to prove it.
So to conclude, Arabic gives quite strong indications that i was in fact rather a *ǝ than an *i that was heavily affected by its surroundings. This does not increase or decrease the amount of phonemic vowels, but it may help understanding the vocalic patterns in Arabic better.
There is no conclusive evidence though that i was *ǝ, one would have to look at deeper genetic relations (Afro-Asiatic? Maybe only Berbero-Semitic?). I do feel that one should probably position this *ǝ in proto-semitic times if it exists. Hebrew vowel distribution is as far as I can see it, quite similar to that of Arabic.
I hope to soon dive into correspondences between Arabic and Berber verbal morphology with this hypothesis that i should be interpretead as a *ǝ. But before that I should probably consider the Arabic verbal morphology first, since I've only considered nouns of the type CVCVC so far. The vowel distribution in the verbal morphology becomes quite a bit more difficult though.
Recent Comments