A. NORTHERN PHILIPPINE GROUP (70 languages)
I. Northern Luzon Subgroup (54 languages)
i. Alta: Northern Alta, Southern Alta
ii. Arta: Arta
iv. Northern Cordilleran (20 languages)
1. Dumagat (9 languages)
A. Northern Dumagat: Casiguran Dumagat Agta (#9) [DGC], Central
Cagayan Agta (#1) [AGT], Dicamay Agta, Dupaninan Agta,
B. Southern Dumagat: Alabat Island Agta, Camarines Norte Agta,
Umiray (Umirey) Dumagat Agta
A. Gaddang (#10): Gaddang and Ga’dang
B. Ibanag: Villa Viciosa Agta, Faire Agta, Pamplona Atta (#2) or
Agta, Pudtol Agta, Ibanag, Itawit or Itawwis, Yogad
C. Isnag: Adasen, Asnag or Isneg (#16)
v. South-Central Cordilleran (30 languages)
1. Central Cordilleran (22 languages)
A. Isinai: Isinai
B. Kalinga-Itneg (12 languages)
a. Itneg (#18): Binongan, Inlaod, Southern, Masadit.
b. Kalinga (#21): Upper Tanudan, Mabaka Valley, Madukayang,
Limos, Lower Tanudan, Lubuagan, Southern, Butbut.
C. Nuclear Cordilleran (9 languages)
a. Balangao: Balangaw (#3)
□ Bontok (2 languages): Bontoc (#8)
□ Kankanay (#24): Northern Kankanay, Kankanay
c. Ifugao (4 languages): Amganad Ifugao (#11), Batad
Ifugao (#12), Tuwali Ifugao, Mayoyao Ifugao.
2. Southern Cordilleran (8 languages)
A. Ilongot: Ilongot (#14) or Bugkalut
B. Pangasinic (7 languages)
a. Benguet (6 languages)
□ Ibaloi-Karao: Ibaloi or Inibaloi (#15), Karaw
□ Kallahan: Kayapa (#22), Keley-i (#23), and Tinoc.
(#13) is allegedly a dialect of (#23). They are shown differently on our diagram. (#23) is a dialect of (#22).
II. Bashiic–Central Luzon–Northern Mindoro Subgroup (16 languages)
i. Bashiic (or Batanic?), (3 languages):
1. Ivatan: Ivatan (#19), Itbayaten (#17)
2. Yami: Yami
ii. Central Luzon (10 languages)
- Pampangan: Pampangan
2. Sambalic (8 languages)
A. Abenlen Ayta, Ambala Ayta, Bataan Ayta
B. Mag-Anchi Ayta, Mag-Indi Ayta
C. Bolinao, Tina Sambal, Botolan Sambal (#35)
3. Sinauna: Remontado Agta
iii. Northern Mindoro (3 languages): Alagan, Iraya and Tadyawan
SOUTHERN PHILIPPINE GROUP (23 languages)
I. Subanun subgroup (5 languages)
i. Eastern: Lapuyan, Northern Subanen,
Central Subanen or Sindangan Subanun (#38)
ii. Kalibugan: Kolibugan Subanon, Western Subanon or Siocon Subanon (#39)
II. Manobo subgroup (15 languages)
i. Central (8 languages)
Agusan Manobo, Dibabawon Manobo (#27), Rajah Kabungsuwan
A. Ata-Tigwa: Ata Manobo (#26), Matigsalug, Tigwa (#31) dialect
B. Obo: Obo Manobo
3. West Ilianen Manobo (#28), Western Bukidnon Manobo (#32)
ii. North (4 languages): Kagayanen, Binukid (#7), Higaonon, Cinamiguin
iii.South (3 languages): Tagabawa Manobo, Sarangani Manobo (#30),
Cotabato Manobo (#29) or Tasaday
III. Danao (3 languages)
i. Magindanao: Magindanao (Magindanaon, Magindanawn, Maguindanaon)
ii. Maranao-Iranon: Maranao, Iranon
C. MESO PHILIPPINE GROUP (60 languages)
I. South Mangyan (4 languages)
i. Buhid-Taubuid: Buhid, Eastern Tawbuid, Western Tawbuid
II. Kalamian: Agutaynen,
Tagbanwa Kalamian or Calamian (#42), Tagbanwa Central
III. Palawano (7 languages): Batak (#4), Molbog, Bonggi, Central Palawano or Palawenyo,
Southwest Palawano, Brooke’s Point Palawano, Tagbanwa (#41)
IV. Central Philippine (46 languages)
i. Tagalog: Tagalog
ii. Bikol (8 languages)
1. Coastal (4 languages)
A. Naga: Isarog Agta, Mount Iraya Agta, Central Bicolano
B. Virac: Southern Catanduanes Bicolano
2. Inland (3 languages): Mount Iriga, Albay Bicolano, Iriga Bicolano
3. Pandan: Northern Catanduanes Bicolano
Mansakan (9 languages)
1. Davawenyo: Davawenyo
2. Eastern (4 languages)
A. Caraga: Karaga
B. Mandayan: Mansaka
(#33), Cataelano Mandaya, Sangab Mandaya
3. Northern: Kamayo
4. Western: Tagakaulu Kalagan, Kagan Kalagan, Kalagan (#20)
Mamanwa: Mamanwa (#25)
v-x: Others, apparently unclassified: Ata, Sorsogon Ayta, Tayabas Ayta,
(Gitnang Negros), Magahat (Southwestern Negros), Sulod (Tapaz, Capiz)
vi. Bisayan (21 languages)
1. South (3 languages)
A. Butuan-Tausug: Butuanon, Tausug (#43)
B. Surigao: Surigaonon
2. West (7 languages)
A. Aklan: Aklanon, Malaynon
C. Kinarayan: Kinaray-a
D. Kuyan: Ratagnon, Cuyonon
E. North Central: Inonhan
3. Cebuan: Cebuano
4. Banton: Bantoanon
5. Central (9 languages)
A. Peripheral: Ati, Capiznon,
Hiligaynon, Masbatenyo, Porohanon
B. Romblon: Romblomanon
a. Masbate Sorsogon
b. Gubat: Waray Sorsogon
c. Samar-Waray: Waray-Waray
D. SOUTH MINDANAO GROUP (5 languages)
I. Bagoboo: Giangan
II. Tiruray: Tiruray
III. Bilic: Koronadal Blaan/Bilaan (#5), Sarangani Blaan/Bilaan (#6), Tboli or
E. SAMA-BADJAW GROUP (9 languages)
I. Abaknon: Abaknon Sama
II. Sulu-Borneo (7 languages)
i. Borneo Coast Bajaw: Indonesian Bajau, Jama Mapun, West Coast Bajau
ii. Inner Sulu Sama: Balangingi Sama, Central Sama or Samal (#34), Southern Sama
iii.Western Sulu Sama: Pangutaran Sama
III. Yakan: Yakan
SULAWESI (CELEBES) GROUP (mainly in Indonesia, 113 languages)
I. Sangir-Minahasan (10 languages)
i. Sangiric (7 languages)
(mainly Indonesia, Sulawesi): Sangir in Balut and Sarangani
2. Sangil (#36) in the Philippines, on Sarangani Islands.
Chavacano is a creole. It does not belong to the Austronesian family of languages. Its lexicon is Spanish bur its syntax is similar to that of other Philippine languages. Chavacano is spoken in Zamboanga, Basilan, Cavite, Ternate, and Ermita (Manila).
Maguindanao, Tausug, Maranao, and Ibanag complete the first dozen of Philippine languages with the most number of speakers. Four Philippine languages are listed by the Summer Institute of Linguistics, Inc. (SIL) in the “Top 100 languages by Population”
as follow: Tagalog (fifty-seventh on the list), Cebuano (61st), Ilokano (91st), and Hiligaynon (100th).
It is also worthwhile noting that some of these languages now are on their way to extinction: Agta (Alabat Island, Camarines Norte,
Iraya), Northern Alta (Baler Negrito, Ditaylin Alta, Ditaylin Dumagat), Arta (of Aglipay and Nagtipunan in Quirino Province), Ata (Mabinay, Negros Oriental), Ayta (Sorsogon, Tayabas), Batak (Babuyan, Tinitianes, Palawan Batak), Katabaga (Bondoc
3.2 The list of the languages described by Reid (1971), identification of each language with the proper language and SIL code. It contains important or interesting details about these Philippine
languages, including their geographical distribution and the number of speakers.
The full list of 43 Philippine minor languages is shown here, using the serial numbering
of Reid (1971:1-43). Note that his serial numbers (#) for these languages are identical with his page numbers, from 1 to 43.
(#1) Agta or Central Cagayan Agta, the dialect spoken in the central Cagayan Valley. Code: [AGT]. Region:
Northern Luzon. It has 700-800 speakers (1998 SIL).
(#2) Atta as spoken in Pamplona, also called Northern Cagayan Negrito. Code: [ATT]. Region: Northwestern Cagayan Province, Luzon. It has 1,000 speakers (1998 SIL), 91% lexical
similarity with Ibanag North.
(#3) Balangaw. Also called Balangao Bontoc or Farangao. Code: [BLW]. Region: Eastern Bontoc Province, Luzon. It has 6,560 speakers (1975 census).
(#4) Batak, Palawan.
Code: [BTK]. Region: North Central Palawan, 2,041 speakers (1990 census).
(#5) Bilaan, Koronadal. Code: [BIK]. Region: South Cotabato Province, Mindanao. 100,000 speakers (1981 SIL)
(#6) Bilaan, Sarangani.
Code: [BIS]. Region: South Cotabato Province, Sarangani, Davao del Sur Province, Mindanao. 75,000 to 100,000 speakers (1998 SIL)
(#7) Binukid. Code: [BKD]. Region: North Central Mindanao, Southern Bukidnon, northeast Cotabato, Agusan
del Sur. Also called Binukid Manobo. It has 100,000 speakers (1987 SIL).
(#8) Bontoc as spoken in Guinaang (village), Bontoc (municipality), Mountain Province. Region: Central Mountain province. Code: [BNC]. The Central Bontoc (40,000
speakers) is also called Bontok or Igorot.
(#9) Dumagat or Casiguran Dumagat Agta. Code: [DGC]. Region: East of Luzon, Aurora Province (?). 580 speakers (1998 T. Headland). Reid adds that this is not Umiray Dumagat Agta.
Gaddang, also called Ga’dang, as spoken in Butigui, Mountain Province. SIL Code: [GDG].
(#11) Ifugao, Amganad. Region: Ifugao Province, Luzon. Code: [IFA], 27,000 speakers (1981 SIL).
(#12) Ifugao, Batad.
Region: Ifugao Province, Luzon. Code: [IFB], 43,000 speakers (1987 SIL).
(#13) Ifugao as spoken in Bayninan, Ifugao. SIL Code [IFY] is uncertain. Region: Napayo, Kiangan Ifugao Province, northwest of Aritao, Nueva Vizcaya, Luzon. Officially (SIL)
this is a dialect of the Keley-i Kalanguya language, also called Antipolo Ifugao or Keleyqiq Ifugao. (Our opinion is different below.) Also see under (#23). It has 5,000 speakers (1980 SIL). Professor Reid (2003) informs me, “The Ethnologue is wrong
when it says Bayninan is a dialect of Kalanguya as spoken in Keley-i. It is similar, but not the same as Batad Ifugao. It is definitely not SIL Code: IFY. You are right, it needs a new code.”
(#14) Ilongot, or Bugkalut. Code:
[ILK]. Region: Kakidugen, in Eastern Nueva Vizcaya, Western Quirino, Luzon. It has 50,786 speakers (1990 census).
(#15) Inibaloi. Also called Ibaloi. Code: [IBL]. Region: Central and Southern Benguet Province, western Nueva Vizcaya Province,
Luzon. It has 111,449 speakers (1990 census).
(#16) Isneg. Also called Isnag. Code: [ISD]. Region: Northern Apayao, Luzon. It has 30,000 speakers (1994 SIL).
(#17) Itbayaten. Region: Itbayaten, Batanes Islands. It has no code, being
a dialect of Ivatan [IVV], the language (#19) of Reid (1971:17-19), according to the SIL classification. Our evaluation gives a lexical similarity of 76% between these two. (Therefore, it has to be settled if they can be called either dialects, or languages.)
The comment of Reid (2003): (#17), Itbayaten should probably be considered a separate language from Ivatan, although the former can understand the latter (because the former go to high school on Batanes Island, and sell their cattle there), the latter cannot
easily understand the former.
(#18) Itneg as spoken in Binongan. SIL Code: [ITB]. Note: the SIL shows four languages called Itneg.
(#19) Ivatan. Code: [IVV]. Region: in Batanes Islands, dialect spoken in Basco, Batanes. Itbayaten
(#17) is one of its dialects. SIL gives “72% lexical similarity” between (#17) and (#19). Our comparison shows a lexical similarity of 76% between them.
(#20) Kalagan. Code: [KLG]. Region: Southern Mindanao, South Cotabato, south
of Kalagan. Related to Mandaya. It has 50,000 to 70,000 speakers (1992 SIL).
(#21) Kalinga as spoken in Guinaang, Lubuagan. SIL Code: [KNB].
(#22) Kallahan. Also called Kallahan Kayapa, Kalanguya, Kalanguyya or Kalkali.
Code: [KAK]. Region: Kayapa Proper Barrio, in Nueva Viscaya, according to Reid (1971:22). The Ethnologue-SIL report is placing them in Western Nueva Vizcaya, northeastern Pangasinan, western Ifugao, Luzon, with 15,000 speakers (1991 UBS). The identity
of these regions must be checked.
(#23) Kallahan or Antipolo/Atipolo Ifugao. Code: [IFY]. Region: in Keleyqiq, in Napayo Barrio, Kiangan Ifugao Province, northwest of Aritao, Nueva Vizcaya, Luzon. It has 5,000 speakers. The SIL classification
shows the Ifugao, Bayninan (#13) as a dialect of this language. My opinion is different below. Reid (2003) tells, “(#23), This language and its closely related neighbor spoken in Kallahan proper, have long been recognized as related, contrary
to your statement, and their close relationship to Inibaloi commented on in several papers. The fact that they are all characterized as Pangasinic, Southern Cordilleran indicates this. They have no close genetic connection to Bayninan Ifugao, which
is part of the Nuclear Cordilleran group of the Central Cordilleran family.”
(#24) Kankanay as spoken in Balugan, Sagada. Code: [KAN]. Region: Mountain Province (correct), Luzon. The total of two languages has 218,279
speakers (1990 census), so subtracting 70,000 yields about 148,000 speakers for the Kankanay [KAN].
(#25) Mamanwa. Code: [MMN]. Region: Agusan del Norte and Surigao provinces, Mindanao. It has 5,152 speakers (1990 census). Also called Mamanwa
Negrito or Minamanwa.
(#26) Ata Manobo. Code: [ATD]. Region: Mindanao, northwestern Davao. It has 15,000 to 20,000 speakers (1981 SIL).
(#27) Dibabawon Manobo. Code: [MBD]. Region: Manguagan, Davao del Norte, Mindanao. It has 10,000
speakers (1778 SIL). Also called Dibabaon, Debabaon or Mandaya.
(#28) Ilianen Manobo. Code: [MBI]. Region: Northern Cotabato, Mindanao. It has 12,000 to 15,000 speakers (1996 SIL).
(#29) Kalamansig Cotabato Manobo. Code: [MTA].
Also called Kalamansig Tasaday or Manobo Cotabato. Region: South Cotabato, Limulan Valley, Mindanao. It has 15,000 speakers (1991 SIL). Tasaday is its dialect spoken in the town of Kalamansig.
(#30) Sarangani Manobo. Code: [MBS]. Region: Southern
and eastern Davao, Mindanao. It has 35,000 speakers (1987 SIL).
(#31) Tigwa Manobo. It is a dialect of Matigsalug Manobo, not a separate language. The latter has a code [MBT], numbering 15,000 speakers (1998 SIL). Region:
Davao del Norte, southeast Bukidnon, Mindanao.
(#32) Western Bukidnon Manobo. Code: [MBB]. Region: Mindanao, southern Bukidnon Province. It has 10,000 to 15,000 speakers.
(#33) Mansaka. Code: [MSK]. Region: Eastern Davao and
Davao Oriental Provinces, Mindanao.
(#34) Samal. Region: in Sisangat and Siganggang, Siasi Municipality, according to Reid (1971: 34). However, the Ethnologue’s description locates them to Sulu Province and Sabah, Malaysia. It
is questionable that it is the same area, or, they have migrated since 1971. The same name, Samal or Central Sama, has a code [SML]. These numbered 120,000 to 150,000 speakers (1997 SIL). It needs confirmation.
(#35) Sambal. Also called Botolan
Sambal or Aeta Negrito. Code: [SBL]. Region: Central Luzon, Zambales Province. It has 31,500 speakers (1975 census).
(#36) Sangil, spoken in Sarangani Islands (Reid, 1971). In the SIL description, Sangil has a [SNL] code, and it is spoken in
Balut Island, off Mindanao. The language is also called Sangiri or Sanggil, one of its dialects is Sarangani. It would be useful to confirm if these two languages refer to the same. It has 15,000 speakers (1996 SIL).
(#37) Sangir, spoken in Sarangani
Islands, in Tubukang Municipality (Reid, 1971:37). In the SIL description Sangir, with [SAN] code, has 200,000 speakers in Indonesia, including 50,000 Siau. (Both countries: 255,000 speakers.) Also spoken in the Philippines, according to SIL, but they do not
tell exactly where. The confirmation of the identification would be useful.
(#38) Subanun. Sindangan, or Subanen Central. Code [SUS]. Region: Eastern Zamboanga Peninsula, Mindanao, Sulu Archipelago. It has 120,000 to 150,000 speakers (1998 SIL),
and a lexical similarity of 79% with Siocon. (Our evaluation shows 65.5% only.)
(#39) Subanon, also called Siocon Subanon or Western Subanon. Code: [SUC]. Region: Mindanao, Zamoanga Peninsula. 89% lexical similarity with its dialect called Western
Kalibugan. It has 75,000 speakers (1997 SIL).
(#40) Tagabili. Correctly: Tboli. (They dislike the name Tagabili, formerly used for them, according to Ethnologue.) Code: [TBL]. It has 80,000 to 100,000 speakers (1997 SIL).
Tagbanwa, Aborlan. Code: [TBW]. Region: Central and northern Palawan, around Lamane. They are also called Apurahuano. 8,000 speakers (1981 SIL) or “13,386, probably including Central Taganwa (1990 census).” This has a typo in the Ethnologue
report for TBW: it is correctly Tagbanwa, with the code [TGT]. It is a relative of (#4). (Our evaluation shows 58.5% lexical similarity between them.)
(#42) Tagbanwa, Kalamian. Also called Calamian. Code: [TBK]. Region: Coron Island,
north of Palawan, northern Palawan, Busuanga, and Baras. It has 8,472 speakers (1990 census).
(#43) Tausug. Code: [TSG]. Region: Jolo, Sulu Archipelago. Also called Sulu or Sulug. 651,808 speakers in the Philippines (1990 census), total all countries
3.3 Notes containing questions and problems, mainly in the identification process. It also includes some conclusions, explanations, and suggestions for further examinations in the linguistic research.
Difficulties in drawing border lines between language and dialect.
We had some difficulty in the correct identification of the languages described by Reid (1971) with the proper SIL code, because sometimes the geographical information
was quite short. In his personal communication (2003) he has helpfully eliminated all possible sources of such confusion. I was also ignorant in matching the two different type of information, Reid vs. SIL. The commas between the words may mean a continuous
description of only one small location, progressing from the smaller place to the middle-sized, then the greater regional unit. However, this kind of list may indicate that the language is spoken in all geographical places separated by the commas,
regardless their size. It is not obvious for us if Ethnologue means horizontal or vertical geographical divisions, or sometimes a mixture of both.
A classification cannot combine genetic and regional features. Both criteria cannot be satisfied
at the same time. We have applied a genetic historical or diachronic linguistic approach only.
Language (#13) is Ifugao, Bayninan. Region: Napayo, Kiangan Ifugao Province, northwest
of Aritao, Nueva Vizcaya, Luzon. Officially (SIL) this is a dialect of the Keley-i Kalanguya language, also called Antipolo Ifugao or Keleyqiq Ifugao that is numbered (#23). Our evaluation shows that the classification is inaccurate here. We have
got only 43% lexical similarity between (#13) and (#23), so they cannot be each other’s dialects but separated languages. (#13) has 77% lexical similarity with (#11) and 73.5% similarity with (#12). Therefore, in our opinion, the orthodox or official
[IFY] code for language (#13), also accepted in Reid (1971) is wrong. It would require a new code, and Professor Reid (2003) agrees with me.
Language (#23) is Kallahan or Antipolo (or
Atipolo) Ifugao, with the code [IFY]. The SIL classification shows the Ifugao, Bayninan (#13) as a dialect of this language. In our opinion, it is impractical to give identical codes as [IFY] for two different languages, i.e., (#13) and (#23)
that have only 43% lexical similarity with each other. It would be worthwhile to clarify this controversy. Also, this (#23) language is a close relative of the Inibaloi language (#15), with 63% lexical similarity which fact went apparently unnoticed so far.
If is possible that language (#23) has been originated by a group of (#22) speakers as the majority, being mixed with a minority group from (#13).
Language (#17) is Itbayaten. It has
no code, being a dialect of Ivatan [IVV], language (#19) of Reid (1971:17-19), according to the SIL classification. Our evaluation gave 76% lexical similarity between these two. (Therefore, it has to be settled if they can be called either dialects,
or rather languages.)
Itneg includes four languages, and one of them is (#18). Kalinga includes eight languages, and one of them is (#21). I had some uncertainty with (#24),
Kankanay, and I learned that Reid (1971:24) had meant the language marked with the SIL code [KAN].
Most of our conclusions agree with those of the Ethnologue, SIL, Thomas and
Healey (1962), Dyen (1965), Walton (1979), Zorc (1986) and Blust (1991). Languages (#1), (#2), (#9), (#10) and (#16) belong to the Northern Cordilleran group. The languages (#5), (#6), and (#40) belong to the Bilic group. Several of them, such as (#3), (#8),
(#11), (#12), (#13) and (#24), belong to the Nuclear Cordilleran, (#18) and (#21) to the Kalinga-Itneg group, (#15), (#22), (#23) to the Benguet, (#17) and (#19) to the Bashiic, (#38) and (#39) to the Subanun, (#4) and (#41) to the Palawano, (#20) and (#23)
to the Mansakan group, (#7), (#26), (#28), (#36) and (#37) to the Sangiric group, (#31) and (#32) to the Manobo group, while the rest form separate languages: (#14) is Ilongot, (#25) is Mamanwa, (#34) is Samal, (#42) is Kalamian, and (#43) is Tausug.
We do not agree totally with the SIL classification of the Manobo Cotabato (#29) and Sarangani Manobo (#30), feeling that these are separate languages, probably mixtures of Manobo with another language. If the official classification considers Mamanwa (#25)
as a separate language, then the inclusion of (#29) and (#30) into the Manobo group would be like using double standards, at least from the lexicostatistical point of view. Mamanwa has higher similarities with many of the Manobo languages that (#29) and (#30).
Also, the Mansakan languages show closer relationship with the Manobo. The accepted classification is silent about the high similarity between (#25) and (#43) which is 46.5%.
Reid (2003) wrote me, “The classification of the Manobo languages has been the subject of considerable research, and includes not just the numbers of shared cognates, but also other shared sound changes, specific lexical innovations, and syntactic
devices. Certainly Cotabato and Sarangani are very different Manobo languages, but they group together as against other Manobo languages, and the people of each of these groups consider themselves to be Manobos. Your system of counting of
lexical items is unfortunately uncritical. It does not distinguish inherited lexical items, those that are unique to a small set of languages and represent common innovations, from those that are generally inherited; neither does it distinguish recently
borrowed items which are the result of trade and contact. Each of these sets of forms has a different story to tell. The lexicostatistical numbers that you cite do not necessarily tell us anything about genetic relationships.”
These arguments are quite correct, but I must emphasize that this paper has been based exclusively on lexical items, and does not claim a magic knowledge in establishing genetic relationships.
However, I must emphasize that the comparisons of basic vocabularies tell a different but longer story than other official methods. For example, if a language absorbs another two smaller languages, it is forcing its grammar and word order on the other two.
The concurrent grammar and word order eventually dies out, but a high percentage of the two subject languages does not die out without a trace: large parts of those may survive for several millennia as dialects, through semantic shifts. These mean that, in
my opinion, a carefully selected vocabulary embraces more information than other comparative methods.
The criticism for this paragraph is as follow.
“Your understanding of the meaning of ‘basic vocabulary’ is different from any that I have heard before. The term ‘basic vocabulary’ refers to forms that are supposedly more resistant to change than other ‘cultural’
items. It doesn’t mean that they are older than grammatical structure. Language must have had grammatical structure from the beginning of its use, otherwise there could have been no communication. All parts of a language
appear to have developed in tandem. But then we must be talking about the origins of language, probably hundreds of thousands of years ago. The basic vocabulary items that you are looking at probably go back no more than 5000 to 10,000 years
at most. That is why it is extremely difficult to make claims about the relationships between Austronesian and mainland Austroasiatic languages. The languages have changed sooo much over the 6000 years since their first separation, that
similarities resulting from genetic relationship can no longer be distinguished from coincidental similarities...” (Reid, 2003)
The other problem with the orthodox approach is
that it does not offer objective and measurable results: percentages of similarities, etc. It arbitrarily picks lists of similar linguistic items from an arbitrarily established group whose speakers have never shared a language. (The native language of the
author has the same fate. It ended up in a group or situation by accident, just “like Pontius Pilate in the Credo,” as Hungarians say.) On the other hand, the orthodox approach points out the similarities only, and keeps the high ratio
of the important differences between languages in secret. The lexical comparison cannot follow such convenient and misleading strategy. However, these problems are rather ideological than scientific. Please note that we do not use the word “lexico-statistical.”
Our approach contains the comparison of basic vocabularies, but never claims dates for the separations of languages, avoiding that part of Swadesh’s theory.
The vagueness of boundaries between the meanings of languages and dialects is an issue to complex and large to deal with. Language is a dialect with an army and flag. Crystal (1989:25) indicates, “One of the most difficult theoretical
issues in linguistics is how to draw a satisfactory distinction between language and dialect... There is often a ‘chain’ of dialects spoken throughout an area, a geographical dialect continuum... The speakers of the dialects at the two ends of
the chain will not understand each other; but they are nonetheless linked by a chain of mutual intelligibility.”
The SIL materials often mention serious problems in a continuous link of mutual intelligibility in a relatively “small”
area as the Philippines. Therefore, one may pose a similar question about the supposed existence of an ancestor of the so-called Indo-European languages. (That must have been spoken a huge area, because seminomadic nations living in primitive economical conditions
could not survive a small area.) If once such common genetic ancestor existed at all, their language(s) may have been a language continuum in which many tribes did not have any mutual intelligibility with each other, but they borrowed some common
grammatical features later. Our standing is that the very limited basic vocabulary of any language must have been more ancient than their grammatical structures. It is extremely hard to form a sentence without a vocabulary, using only phonetics, phonology,
morphology, syntax, or grammar. Regarding the definition of “mutual intelligibility” the reader can refer to his/her own logic. This term is very relative and quite flexible.
As far as the author knows, there is no linguistic rule
about the degree (or percentage) of mutual intelligibility and the definitions language or dialect. For example, Green (1988:123-124) estimates that cognates between Castilian Spanish and Peninsular Portuguese may be around 89 percent. They are quite intelligible
for each other, particularly in writing. They are called languages, not dialects, perhaps based on their separate histories. On the other hand, the Philippine language called Subanen, Central on the Ethnologue web page of
SIL under this entry [SUS] notes its “79% lexical similarity with Siocon” which is [SUC]. These are called languages (#38) and (#39) in our study at (3.1). Only four other pairs can be found in this range on our list: (#5) and (#6): 81.5%,
(#11) and (#13): 77%, (#17) and (#19): 76%, finally (#26) and (#31) with 80% lexical similarities.
Two of our Central Manobo languages (#27) and (#28), both called as a kind of Manobo, have only a 45% level of intelligibility with each other,
so they are definitely separate languages.
The name of a language deserves some attention as well. A Portuguese speaker knows from tradition that he/she speaks Portuguese and not Spanish, so one cannot call his/her speech Spanish. Based on
these circumstances, the borderline between language and dialect shall be drawn perhaps somewhere between 80% and 95%. Two languages are often actual dialects now, usually one has adopted most of the vocabulary and grammar of the other, but their speakers
have proudly kept the name of their previous ancient language. Perhaps the Catalan–Spanish and the Hindi–Urdu relationship is somewhat similar to this.
The comparison of the basic vocabulary of forty-three Philippine minor languages with other, not Malayo-Polynesian, languages.
4.1 The comparison of
the basic vocabulary of forty-three Philippine minor languages with those of the Baltic languages, offering a preliminary explanation for the results
We have satisfactorily demonstrated here, in agreement with the SIL classification of the Philippine languages, that these languages have strong relationships with each other. Therefore, it is reasonable to suppose that a few millenia ago their ancestors spoke
a common ancestral proto-language. The evaluations show that their relationship was closer that the ones between many Proto-Indo-European languages. And if linguists are bold enough to assume the existence of such theoretical Proto-Indo-European language,
even if only a dialect continuum, it is less risky to make the same assumption for the reconstruction of the Philippine proto-language. This means that if two or more Philippine minor languages use certain cognates now, some dialects of that Proto-Philippine
language must have had it, since their modern form could not come out of nowhere.
The Latvian and Lithuanian are Baltic languages in Europe. They are close relatives, both belonging
to the so-called family of Indo-European languages. The following list compares some of their basic words with the corresponding words in the Philippine minor languages. The asterisks (*) here and in comparative philology are used for forms not actually recorded,
but artificially reconstructed by comparative linguistic method (Lord, 1971:15).
ENGLISH BALTIC SOME
or Lithuanian) MINOR LANGUAGES PROTO-WORDS(*)
head galva qulu,
gabals qabal, qaballa
bone kauls tikol
skin kailis kulit
left (hand) kaire
and foot kaja
woman kalpone glibun
sawa, qasiwa *siwa
*don (Celtic river)
thirsty slapis lupahan
to sleep gulet qoloq
sky dangus dlangit
to hold aizkawet qawid
dead, death nave napasi
cemetery, bury kapakmens gubuk
short mazs masanaq
to fly lekti layug
peak, summit gals kali'susu
to learn macities
macinanaoq, miktuqun *mactuks
to turn likums,
sukti likid, ligguh, masugid *ligu, mas, sugdi
to wait laukti,
glunet lagad, qilat
to sit sedet
to climb kapt
to know zinat sunan
to choose velet pilik
to buy pirkt 'palit
to sell pardouti barigyaq
to run begti
betik, bugkut *begt
to see matyti
to say, tell sacit
to tell a lie meluoti
duwa (note: dwa in Russian) *dwa
three tris, trys turu,
tulu (tri in Russian) *tru
(to) work izmantot
qaripan, kiripin *karpan
atbildet tubal, tobal
mouse or rat
pino, punuk *plnoch
daugelis dakiliq, dakel
road, trail cels calan
who? kas katuqu
carpet, mat tepikis
tepo, tipihiq *tepichich
turtle or frog
sugkad, ‘sauk, sagaysay *saugkas
fat, lard tauki tabaq,
nebylus, tylus nabongol, tuleng *nebonl, tulans
other, different cits,
kitas satu, saddi
These fifty-six similarities do not mean that some of the Philippine languages belong to the Indo-European family, or vice versa. However, if the ancestors
of their speakers were not relatives, they must have had some prehistoric contacts. Maybe they lived next to each other. Who knows? A plausible conclusion: The world is not as big as we thought.
It seems to be quite certain that these similar features in the proto-Baltic and proto-Philippine languages are not coincidental. Random matches in the similarities would result in a scattered pattern that is not the case. A statistician would observe it at
the first glance that there are large numbers of matches in 27 of the 43 minor languages, while there is a total lack of matches in the other 16 of them. There are no matches in the similarities between the Baltic words and the following Philippine minor languages
as numbered by Reid (1971): 1, 2, 3, 4, 8, 9, 16, 18, 21, 24, 25, 28, 31, 35, 41, 43. In other words, it appears that the Northern Dumagat languages, the Kalinag-Itneg, the Sambalic, and the North Manobo languages have never got any proto-Baltic loanwords.
The following language groups on Reid’s list are divided, regarding the proto-Baltic similarities, as follow: Ibanagic languages, Central Philippine languages, Central Manobo, Nuclear Cordilleran (in which Balangao and Bontoc-Kankanay do not have similarities,
but the Ifugao languages do), Meso-Philippine group (where Kalamian has similarities, but the Palawano languages do not).
It would be helpful to have a source that distinguishes between
dialects of the Latvian and Lithuanian languages, comparing their vocabularies. Since such material is unavailable for us, we just suppose that some speakers of the proto-Baltic language (or a relative of that ancestral Balto-Slavic language) have migrated
or drifted to the area where the ancestors of the proto-Philippine speakers once lived, not necessarily on the Philippines. That proto-Baltic fragment probably became divided and dispersed, and finally became mingled with the more numerous local population.
Progressive scholars may attempt to find correlation of these languages with the frequency with which the O (and A, B, or AB) blood-group gene is distributed in the population. Unfortunately,
this type of information has not been available for us. The presence of certain “marker genes” in the human blood revealed some proofs for their common genetic origin, as a Japanese professor concluded two decades ago. (That is not necessarily
the same as the genetic origin of their languages.)
For example, it is accepted that the Malagasy language of Madagascar in Africa is an Austronesian language. The last large wesward
expansion of the Austronesian speakers was to Madagascar, which was settled by them in the first millenium AD (Comrie, 1996:96). This is confirmed in Crystal (1991:18). Lord (1971:28) adds that they came from Sumatra or Malaya. Now, many scholars consider
the Baltic languages as the most archaic element of the Indo-European languages. If we suppose that the ancestors of the Balto-Slavic speakers once lived in the fertile lands of the modern Bangla Desh, and a pioneer group the Philippine ancestors have reached
those shores during those centuries, then a possible explanation presents itself. The Austronesian ancestors have reached Madagascar from the modern Singapore by a dangerous navigation, through the 6,000 kilometers wide stormy Indian Ocean. Therefore, making
the half of this distance must have been fun for them by a comfortable navigation in small canoes, following the shoreline that now belongs to Burma (Myanmar). We just mention this possibility, but believe that the direction of the migration must have been
the other way around. A group of proto-Balto-Slav speakers may have been drifted to the ancient Malay territories, and mingled with more than one tribe. (That group of newcomers could not have arrived very long time ago. Otherwise all Philippine languages
would have this long list of similar words that is not the case.)
The map in (Crystal, 1991:318-319) has a typo, since the “Eastern Austronesian languages” are shown
on the western and the “Western Australian languages” on the eastern side of the map, mainly in Polynesia. Crystal (1991:331) shows a family tree of the Caucasian languages, with lexicostatistical percentages added, after J. C. Catford. A similar
one could be drawn for the Philippine languages, based on the split-ups of numerous basic words, with very vague dates only. A table of Crystal (1991) shows “Percentages of divergence between two languages” and “Minimum number of centuries
of divergence” in two columns side by side. It appears to have a typo for “percentages of similarities between two languages,” instead of divergence that has an opposite meaning.
There are two main systems of classification, the genetic and the typological, of which one is predominantly historical, while the other follows the line of descriptive linguistics. A third, non-scientific classification is the geographical (Pei, 1971:19-20).
Robert Lees (1922–) and Morris Swadesh (1909–1967) worked out an approach, called lexicostatistical glottochronology... in which Italian padre and Portuguese pai
would be accepted as cognates (Crystal, 1991). Our criteria during the selection of cognates into separate groups were similar or perhaps stricter than this example. According to Swadesh (1952), the 8% threshold or limit of common cognates in the lexicostatistical
comparisons between two vocabularies can be accidental, or they are borrowings only. However, in the opinion of the author, if those basic vocabularies contain more than 3% common cognates, that cannot be explained by mere coincidence. Similar development
happened in palaeontology about two centuries ago. Serious scholars that did not believe in Noah’s flood argued that the remnants of sea shells and teeth or skeletons of sharks and ocean fishes found on high mountains were the caprices of the nature
that had nothing to do with those animals. In their logic, those findings were not proofs for the presence of an ancient sea in that region. They ridiculed the persons who had explained it that way. Time has proved that the realistic logic does not admit an
endless possibility for accidental similarities. Although most linguistics departments offer courses in mathematical linguistics and statistics in their curriculum, perhaps the international linguistic educators should include more mathematics and statistics
in their curriculum. Or, they and the students should pay more attention to it, with more faith and seriousness. In our opinion, the majority of authirities do not seem to understand that the word ‘coincidence’ has some semantic limitations. (The
words ‘coincidental’ and ‘obvious’ do not mean the same, and there are several intermediate possibilities between the two extremes.)
I have no objection against
the 8% similarity or uncertainty threshold suggested by Swadesh. But if one applies it, for example, to the so-called Finno-Ugrian languages, the sensitive indicator of Swadesh shows that something is wrong. The Finnish language has about 5% lexical similarity
with the Hungarian. This fact does not mean that the principle of Swadesh is wrong. Rather, it may mean that those authorities are right who had refused the northern Siberian origin of all Hungarians. (They claim that the latter theory had been created under
Austrian and Russian inspiration during the last two centuries, as their tool to oppress and discourage the Hungarians seeking a “more attractive” Central Asian origin.) Just for the sake of comparison, there is 16% lexical similarity between the
Finnish and the English, 15.1% between the Estonian and the English, and 68.1% between the Finnish and the Estonian, according to our count.
Returning to irregular (i.e., too
high or too low) similarities, they are particular phaenomena that would require serious linguistic studies and considerations. There are no regular miracles in the science of linguistics. Most mysteries have some kind of explanation.
Lexicostatistical comparison of these Philippine languages with other languages of other continents, offering a possible explanation by early contacts, and defending Swadesh.
It is interesting that some Philippine words sound similarly in some languages over the globe. For example, the Ilokano numeral one is “maysa” which is “-moja” in the African Swahili language, “mayaat”
in the Delaware or Algonquin in North America, or “moo-ay” in the Cambodian language. Once I heard a lecture at a small international conference of historians, in Switzerland, claiming that the Hungarian and the Visayan languages have
many similar words. (It was not too convincing.)
At that time, I was somewhat incredulous about those possible contacts. Later I was surprised by seeing in our Polyglot dictionary of 116 languages (Simon, 1998) that “blood”
(masakit, misakit) in some minor Philippine languages is almost identical to the North American Ojibwe “misque,” also meaning “blood.” Or, the adjective “small” is “molintokon”
in some Philippine languages, “maliit” in Tagalog, and “malenki” in Russian. (So the original archaic ancestor may have been *malentki.) Would all these be mere coincidences? For this question, the comparison
of two Baltic languages with the minor Philippine languages has given us an intriguing answer.
Professor Reid (personal letter in 2003) gives the following criticism:
“A closer look at the forms that you cite, show that the phonetic similarities
that you cite, are just that, phonetic similarities, “mere coincidence” and nothing more. Let’s look first at the number “one”. We know that the Ilokano form is a morphologically complex form, with a prefix ma- plus
the basic numeral for one. It literally means, “in the state of being one”. This is the same ma- that occurs in masakit, about which more below. We also know that the form isa in Ilokano is a borrowed form from Tagalog, because
the number one is reconstructed as esa, not isa, where the vowel e is actually a central schwa vowel whose regular reflex in Tagalog is i, whereas in Ilokano the form would have been essa, before replacement. So then, unless you can show that the m-
in the corresponding words in Swahili and the ma- Delaware etc, have the same function, or meaning as the one in Ilokano, then the similarity is completely coincidental. Next, the word masakit, or misakit is a combinationn of the ma-
stative prefix, and the root word sakit, which means “pain”. It has extended its meaning in a couple of languages to mean blood, but that change must be very recent, because we can clearly see the derivational source of the word. So
once again, you would need to show that ma- (mi-) in Ojibwe is also stative, and that the word originally painful, before you could begin to draw conclusions. Likewise Tagalog maliit, contains the ma- stative prefix.... in the state
of being small. Is that what ma- means in Russian malenki? Also if the forms are related what about the final sounds in the Tagalog and Russian words which are so dissimilar?? I am sorry your argument is not at all convincing!”
The best anwer for these opinions is that ancient people did not know much about grammar, suffixes, and prefixes, even if they are present in many languages. Educated persons, paticularly linguists, tend to explain everything by the synthetic way, compared
to the construction of a building. In their mind, in 100% of the cases, small boulders can form a huge wall. They claim that the direction from the small to the larger units is always obvious and logical. They perhaps forget that those small boulders had originated
from a huge rock that had been broken into tiny pieces gradually, by the force of the water, wind, earthquakes, etc. Our studies are directed mainly at the procedure of the breaking up, before the first rows of different walls.) Each wall may represent a separate
language, but our research is aimed at the original rock and the first quarries. These are two different ways vertically in time and the development of the languages, not horizontally.
Nations often “collected” two or three words starting
with an “m” sound, and all of them had meant for them – by a sort of group consensus – something stative. Another group may have observed that thed had kept four words ending with –ó, and all the four meant a quality. For
them, this suffix started to give the meaning of an adjective for any future word. A third group also had the –ó ending, but mainly for things related to water. Therefore, they began to associate similar future words with water only. A fourth
group may have agreed that the –ó ending is mainly related to action, for the water is hardly totally quiet, and the same ending gradually has become a suffix indicating only verbs. Now, my question is: even id the –ó suffix is demonstrably
common for some of these groups, are their languages related or not? It is almost impossible to draw a sharp line. However, the conclusion is that this kind of selection of the elements of words is almost random. It is rather a natural selection than a procedure
that can be announced ‘ex cathedra’ with certainty.
For eample, orthodox linguists are unable and unwilling to explain the similarities between the German word ‘Gebirge’ and the Arab ‘Gebel’
(or ‘Jebel’), or their corresponding – and similar – words for ‘white’. They might tell you that the ‘Gebirge’ has obviously derived from the German ‘Berg’ and has nothing
to do with the Arabic word. But what if they are wrong, and one of the ancient German (and Arab, related to the word ‘akhbar’ or ‘great’) dialects a few millennia ago had a common word ‘Geberg’ that became
shorter, say, ‘Geber’. The scholars that have and claim with 100% certainty that this situation has been impossible do not have better proofs than the opposite camp. Although the German –ge prefix is typically German, no one can
prove that it had been always German since the beginning of times. This kind of interactive situation within and amongst ancient tribes may have been common in prehistoric languages. Their logic was less artificial and quite different than that of modern man,
but this does not mean that the modern over-sophisticated and analytical logic is better than theirs. No one has found the notion of ‘stative prefix’ in the daily life of any protohistocic nation or culture. It is quite a modern term for
the overall population of any country.
Returning to the similarities between the European languages, there is an additional and very serious argument here. It is the almost total lack of similar cognates between the English and the Baltic languages
for these 56 words. Why is this grave silence in the lexicostatistical similarities? Would Swadesh be “guilty” again? As we anticipate the categorical answer of the most orthodox scholars: “This is a serious proof that lexicostatistics does
not worth a thing, it is totally unreliable and ridiculous, etc. It only proves that it is incompatible with many of the other approaches of linguistics, like grammar, syntax, inflexional endings, prefixes, internal changes, morphology, or word order. And
the other approaches govern, because they have a majority in our linguistic parliament.” However, one can pose this question the other way around. Is it impossible that in the objective science lexicostatistics governs with semantics, because many other
approaches (like grammar or word order) are incompatible with it? For instance, morphology is non-existent in Chinese, and syntax alone gives meaning to the phrase or sentence (Pei, 1971:10). The Hittite language was exclusively postposing, whereas the majority
of the older Indo-European languages are preposing. Also, Hittite shows no dual number, a category systematically found in some Indo-European languages and residually in many others. The basic word order is not a critical factor either. The Finnish has SVO
word order type: “cows eat grass” while the Hungarian has SOV: “cows grass eat” (Comrie, 1996:19). The expression “my mother” is mamma mía in Italian, minha mãe
in Portuguese, and mi madre or madre mía in Spanish. These three are related languages. These samples indicate that the fashionable modern word order has not been settled two millennia ago but these two basic words must have existed
before any word order. Our selection of words intends to reflect the ancient, pre-grammar times.
An orthodox professor of linguistics was kind enough to communicate me his ideas
as follow. My Words-And-Things approach, he wrote to me, was strong in the 19th century, and I am many decades too late to become part of the real linguistic mainstream. (My answer was that I had had no such aspiration. I also sent him a long list
of semantic connections between the cognate pairs of unrelated words from different languages. Such as blood and true, anger and anchor, or sad and thirsty in certain languages.) There is a complete explanation
for the semantic relationship between those cognates in the myths and legends of other nations often living on the opposite side of the globe. For my concerns about the general lack of interest of the mainstream linguists in mythology, mainly in cosmogonic
myths, he answered: The linguists concerned do study anthropology, including the nature of myth but for many other linguist, of course, such studies would not be centrally relevant. In the same way, historical linguists, especially those interested in deep-time
reconstruction, do often study probabilistics. He added that Don Ringe at Philadelphia, Jacques Guy and others have shown that glottochronology and other associated techniques are very suspect as a method of determining either the relative or absolute ages
of languages on the one hand or the genetic relationships between them on the other. The same applies to the newer versions of these methods espoused by Ruhlen, Greenberg and the advocates of “Nostratic” and similar deep-time ancient proto-languages.
Essentially, the examination of isolated synonymous or near-synonymous words in many languages cannot, it seems, lead reliably to the establishment of links between languages, because there is a statistically very insignificant chance of similar forms with
at least loosely related meanings arising by coincidence. This is easily exemplified with many actually found known pseudo-cognates. Reliance on similar methods has led Kaulins to see Latvian as the Ursprache, Winters to posit that Olmec and Mandingo
are related, etc, etc. One can more or less “demonstrate” that any two languages are related by this means. This is why the method was replaced, well over a century ago, by the careful examination of phonological and grammatical systems as they
develop over time in a range of languages (the “comparative [linguistic] method” and its theoretical offshoots). This method is very far from perfect, of course. All these are quoted from Newbrook (1999:1-4).
The author agrees with him on
the conclusion that lexicostatistics and glottochronology cannot determine the age of words. It is not another C-14 dating method. However, the comparison of numerous, properly selected, basic old words truly reflect the genetic relationship. Another trap
in which some glottochronologists have apparently fallen originates from the ignorance in historic dialects. Without the separation of those dialects of any language by dialectal studies and hard “slave work,” linguists should not compare mixed
languages (such as the English, Spanish, Hungarian, Italian or Rumanian) directly by the formula of Swadesh. As the historic dialects are the last remnants of ancient tribal languages, ignoring them would bias the otherwise sound glottochronologic calculation.
For a comparison, we may press some botanists to classify our salads in dozens of different salad bowls where the bowls represent different languages, and quickly locate them in different rooms. Since we do not give them sufficient time for the research,
most of them rush to separate them somehow. This way the six rooms get labels such as “green salad on large wooden trays,” “red salad in small bottles,” “brown salad in small glass bowls,” “animal
salad in large porcelain bowls,” “orange-coloured salad in metal cups,” and “white salad in large cooking pots.” (Some of the botanists are against this hasty classification but the majority is locking them
up in the washroom.) When we open that little room, like a “Pandora’s box,” we understand the problem. We wanted to separate the food scientifically, not according their present shape and the colours of their surfaces.
It is a tragic
misunderstanding. However, it has a happy ending. All botanists are free now, and the rooms have new labels with the origin of the food. For example, “roots” (potato, heads of young onion, all of them white, with the orange-coloured
carrots and red radishes) in the first room. The green peas and the brown beans are in a second room, the brown mushrooms in the third room, the red tomato, with the green peppers and cucumbers in the fourth room. The shrimps and crabs go to the fifth room,
and the lettuce, spinach and cabbages (all green) to the sixth room.
A similar situation would arise if you ask a beginner botanist about the root systems of plants. He is perhaps an expert who knows about most of the trees, where smaller and smaller
branches of roots are branching off from the main trunk. However, he could not imagine that the roots of the corn do not obey this kind of rule: the roots are equal in length, branching off simultaneously at the same level. This situation may have happened
at the origin of many languages as well. Nobody can prove the impossibility of this theory. Lord (1971:16) tells that one of the most promising developments is the attempt in recent times to apply the correspondence method to the Indo-European, Semitic and
Finno-Ugrian families in an attempt to establish common parenthood and the relatedness of these extraordinarily diverse groups. Fortunately, this avant-garde group does not want to die out. Comrie (1996:38) and his two colleagues state, “The Eurasian
linguistic area, consisting of Europe and northern and central Asia, is home to the Indo-European, Uralic and Altaic language families. It has long been suspected that distant relationships exist between these families. The Nostratic and Eurasiatic hypotheses,
for example, both seek to establish such genetic linkages, the former suggesting that Eurasiatic is itself a branch of a greater Nostratic family.” It may seem to us that even the “Nostratic umbrella” is too small as a scope of a thorough
and properly scientific research.
Mainstream scholars, not necessarily linguists, assert that about the end of the last Ice Age man was unable to, or did not like to, navigate. It is accepted fact that 70,000 to 10,000 years ago sea levels, all
over the world, were low. Land bridges between continents were exposed (Comrie, 1996:102). It is hard to understand why many authorities prefer the land routes and not the sea routes for the immigration of tribes. Some maps show an imaginary corridor free
of ice through North America during the Ice Age. This is nonsense. Could Amundsen have found a similar one along his route to the South Pole? For comparison, there is a small chance that a reader’s family would accidentally be drifted by boat from one
continent to another. However, there is zero chance that you, unintentionally but successfully, cross the endless ice fields of Antarctica with your family.
Mary Ritchie Key is a noted linguist and specialist in South American native languages.
Her bibliography contains 28 items. According to Key (1984), the languages of Polynesia contain elements also found in North and South American languages that suggest distant historical connections. The similarities are either due to borrowings, or to the
same genetic origins. Thor Heyerdahl also has been trying to point out early connections between South America and the Easter Island. The King List of the latter (Simon, 1984:167-168) explains the connections between this Island’s history and
Peruvian chronology: Tupac (or Topac) Inca Yupanqui was the conqueror of Easter Island and Mangareva, around AD 1485, according to Sarmiento de Gamboa. This Inca ruler led a victorious expedition with 20,000 soldiers and balsa rafts. The chiefs Guaman and
Antarqui went with him. Montesinos adds that Huaman Achachi was Tupac’s brother. The king lists of Mangareva and Easter Island agree on these. The inhabitants of the latter remember their kings Haomoana, Taratahi, and Tupa or Topa Ariki. Tupa built temples
and introduced breadfruit. Part of his expedition continued navigating further to the west and returned many months later, bringing some bronze, the jawbone of a horse (that was unknown in the Americas) and many black captives. After these, they returned to
Peru. The black people must have been the population of Melanesia, perhaps the Philippines. Later (c. 1680) their imported “short ears” till a victorious uprising broke out, and finally (c. 1773) a fratricidal war. The
name “orejones” or “long ears” was used for the Incas in the literature for centuries.
It should be added that Otorongo Achache was also a brother of Inca Tupac. (“Otorongo” is understood in Brazil
as a black panther.) His clan’s ancient name appears on Easter Island in pre-Inca times as well, as Ataranga or Aturanga. The first king of Easter Island, Hotu a Matua (c. AD 400) came from a group of islands towards the rising sun where the
climate was hot. Their boat trip took two months. Heyerdahl (1966) thinks that they came from South America’s eastern side.
One of these groups of immigrants or visitors from South America, or the Spanish navigators, may have imported the potato
to the Philippines. The Spanish name for sweet potato (camote) has been borrowed from the (Central American) Nahuatl camotl or kamotl, just like coyote from coyotl, let alone the axolotl. Bezold (1883-1888)
mentions a word “khamotz” from a very ancient Syriac text, related to the life of the first saints two millennia ago, claiming that a person survived only by eating a certain root called “khamotz.” This word may mean
an interesting coincidence that could be well explained by the sea travels of the Phoenicians 2-3 thousand years ago. Number 42 of our Philippine languages has kamutiq for sweet potato, and some of the Manobo languages have ka’muti
for it. We are unable to tell the direction of borrowing, in the case of the sweet potato, between the Nahuatl and the Philippine languages.
Returning to the Phoenicians, their traditions tell that their ships have reached America. This has such a vast
literature that it easier to refer to Simon (1984:32-83), particularly to page 95. This shows two photographs showing statuettes of the naked goddess Ishtar, holding their busts similarly, and even their braided and veiled hair has the same fashion. The first
statuette is dated to the 6th century BCE, in the Museo Archeologico Nazionale in Gagliari. The second one is in the Museo Nacional in Mexico. This proof for the Phoenician visits in Central America is not the only one. Beautiful Maya paintings and representations
have been found that depict a group of nude, white-skinned and circumcised prisoners resembling Phoenicians.
There are hundreds, or rather thousands, of registered and unexplained cases
of lexical similarities between “unrelated” languages. Some examples: the Ilokano and the American Papago-Pima word for “ready,” the words debt or owe in the Hokkien (Taiwanese) and Hebrew, or for lake
in Quechua and Arab. Also, there are similar pairs of cognates for teardrop, rope or hair in Latin and some Philippine languages, for teardrop in Philippine and Toba (South American), or for heron in Comanche and
Hungarian (cusihcua and kócsag). Furthermore, for cooking pot in Amharic, Toba and many Philippine languages, for towards in Hungarian-Estonian vs. Nagamese-Assamese, for full in Croatian, Tagalog, Tolo,
Tongan, or for morning in Turkish and Ojibwe. The words ‘end’ or ‘to end’ are represented in similar forms, possibly cognates, on every continent. Wilfrid Douglas (1976:58) tells of a place in Australia in
which ‘burlong’ means cave that is ‘barlang’ in Hungarian. Could a cave be onomatopoeic?
The iron is called seterika or besi
in Indonesian, while it is called sidero in Greek and vas in Hungarian. It is called haearn in Welsh, while its name is haeana in Maori. (Was the Welsh language official on the ship of Captain Cook?) For the orthodox linguists,
these are all only ennoying coincidences against their beautiful theories, nothing else.
An intersting parallel can be mentioned in the Funk & Wagnalls (1972:175), between the folklore
of some Philippine and Slav nations, regarding the demons called ‘BUSO.’ These horrible looking creatures exist in the beliefs of the Bagobo tribe (Malaon, Southeast Mindanao). They correspond to the similarly names ‘BUSÓ’ demons
in some of the South Slav traditions, and their appearance is exactly the same. The small Bunjevac and Sokac, two groups of Croatian-Illyrian origin, are still living in northern Serbia and southern Hungary. They celebrate a yearly carnival at the end of the
year. They start it by crossing the river by boats, wearing scary masks, particularly in the town of Mohács. (See more information under “Busójárás” at many Internet websites.) In both customs, these monsters have terrible
teet, sometimes dig out the death bodies after burial, and make as mauch noise as possible. Such a long survival of an archaic feature is remarkable, since no other Hungarian or south Slav ethnic group has preserved anything similar. The supposed ancient Balto-Slavic
language and their prehistoric contacts with the ancestors of some Philippine tribes could explain this strange phaenomenon. Without considering that, this is an additional unsolved mystery that breaks out from the full chests (or boxes of Pandora) of linguistic
and ethnological coincidences, or rather evidences.
The conclusions of scholars like Mary Ritchie Kay, Thor Heyerdahl and Helge Ingstad have been considered seriously and have been accepted
by many serious scholars and encyclopaedias but not by the orthodox authorities, or officially. The explanation of this is hiding in the depths of the human nature and not in the sciences. Thus, the prevailing climate in this field of historical study, when
so much evidence about the ancient navigational achievements of the natives is being silenced and disregarded by the orthodox mainstream scholarship, is not making easy any further research to proceed.
The comparison showed that these 43 languages are related, forming two large groups, four smaller ones, and further
two separate languages, Ilongot and Sambal that are loosely related to the other groups.
We have proved that most of the accepted classification is correct. We are glad to report that
massive numerical information, maximum new 903 figures of lexical similarities, can be added to the linguistic publications about the Philippine languages. Or, at least it would enable the professional linguists to double-check the existing figures, percentages.
This study offers a clear diagram and a table demonstrating the similarities and probably the genetic relationships between these Philippine minor languages.
Linguists consider that
genetic classification of a language must be based primarily on its grammar, structure, and syntax. Orthodox scholars regard the comparisons of basic vocabularies as unscientific. Except, of course, the conclusions of SIL and Ethnologue. Their editors
do not seem to be ultraconservative, because they have listed several data showing lexicostatistical similarities. They are competent scholars, and they would not refer to unscholarly results. These authorities seem to share the “golden mid-way”
with Crystal who admits that there is no better tool yet that would substitute lexicostatistics for glottochronology. Please observe that the present study does not lay any glottocronological claim, although its first superficial critic confuses lexicostatistics
with glottochronology. (This study is not pretending to offer or provide absolute separation times, etc. It must be added here that comparative lexicostatistics, in broad meaning, includes the estimation of times at which language splits have occurred, by
the means of glottochronology. However, as we have accepted it in a narrower meaning always insisted on by Isidore Dyen and others, lexicostatistics strictly excludes glottochropnology. The same critic feels unconfortable about the Internet, qualifying
it the ‘wild west’ of information resources, that internet postings are not refereed, and virtually anyone who knows how to push the right buttons can put anything (s)he desires up for public consumption. He claims that this study does not offer
anything new. He adds, “One would normally expect new results, such as those promised at the beginning of the paper, not simply a reiteration of statements which are already available in existing publications. The author speaks of the ‘objective
science lexicostatistics,’ but seems unaware of the many problems that various researchers have encountered in trying to use lexicostatistics for subgrouping purposes.” I have to emphasize once more that the rough and superficial orthodox approach
and the unscholarly urge to quickly classify mixed languages have distorted the original method of lexicostatistics (and glottochronology). Mixed languages cannot be placed in a single drawer. Also, the wide ignorance of some linguists about dialects as remnants
of sunken languages now is still taking its heavy toll. Therefore, “the problems of various researchers” are the logical results of those biased and unscholarly approaches that once had been labeled “scholarly.” Their problems are not
However, if the ultraconservatives are right, my new approach will at least give them an independent controll for the correctness of their conclusions.
Should my classification and their conservative, mainly grammatical and morphological, method give high correlation between the two result groups, it may indicate that both techniques are equally reliable. Also, my proposal to seek relatives of the Philippine
languages abroad is still waiting for their approval.
There is no doubt that our primary academic sources have based the classification of the Philippine minor languages on the complex
and profound studies and comparisons of their grammar, syntax, word order and phonology. Therefore, if those authorities did their home works as they preach, their linguistic classification has been based on those areas, and less than 25% of the comparisons
is related to lexicostatistical similarities. Their long and painstaking studies have yielded exactly the same results as our straightforward and “to the point” lexicostatistical comparisons. This fact means that a properly executed lexicostatistical
comparison is equivalent in efficiency with all the rest of methods combined. Only it is less subjective and more measurable. A properly done lexicostatistical comparison can be used as a valid tool for subgrouping languages “without controls.”
For example, in the case of the Hungarian-Finnish lexicostatistical comparison this “valid control” reveals that the grammatical correspondences are later developments, and the lexical cognates are representing a much deeper linguistic layer. Also,
this and similar problems of classification reveal that the grammatical and structural similarities can be equally accidental than the lexicostatistical coincidences, and they are valid only in a few dialects and not in the majority of the dialects. As for
the obligations for proofs, the orthodox camp can be rightfully questioned to provide more support for their ideas. They must prove the historicity and the logical foundations of their originally racist obsession called Proto-Indo-European “language.”
Or, they should prove that the word order existed before the existence of some basic words, etc.
Of course, these statements and questions cover a politically sensitive area for those
who still prefer the subjective and authoritative approach and ignore the statistical requirements. Our original aim was not to expose but hide the split of opinions between two groups of linguists. The first group shouts “unscientific” for anything
lexicostatistical, while others (Summer Institute of languages, Crystal, etc.) accept its usefulness. We are unaware of the numbers of followers in these opposite camps, and hope that eventually the healthy logic will overcome the prejudices. (All these additional
remarks have been added recently, after the first submission of this allegedly boring paper. Now we have added a little salt and spice to it, upon the request of the first professional critic.)
- 5. About the author and the origin of this study
Zoltan A. Simon was born in Budapest, Hungary, on May 26, 1949. His original profession is land surveyor and geologist. Being a Canadian Hungarian, an editor–proofreader of the Encyclopaedia Hungarica (Calgary) for a decade, he is an impartial
person about any language of the Philippines. He is a Canadian citizen since 1976, now living in Vitória, Brazil.
He has undertaken a similar lexicostatistical evaluation for
the 38 Hungarian dialects in 1983. A special computer program compared 100 words of 395 villages with each other. It was done by punch cards with 7,781,500 comparisons, by publishing the first complex, computer-generated dialect map of the world.
a writer and editor (or publisher) of several books, including a Polyglot dictionary of 116 languages of the world (1998), the author has been always interested in the Philippine languages and in the questions of ethnogenesis. The book entitled Philippine
minor languages: Word lists and Phonologies, written by Lawrence A. Reid (1971), become one of his treasures while he borrowed it from the Vancouver Public Library in Canada. Apparently it is out of print by now, and probably less than one percent of
the libraries have it on their shelves, even in relatively wealthy English-speaking countries.
The systematic evaluation of Reid’s dictionary by the author began in the year 2000.
He did the final evaluation in 2001, during his lunch brakes while taking an AutoCAD 2000 course in the British Columbia Institute of Technology in Burnaby, Canada. It was rather like a pastime or hobby. (Scholars in general do not seem to have this amount
of free time to do such a monotonous work as a slave. Also, most of them are “too busy with more important scholarly research.”) This way the author has removed a considerable workload from their shoulders, so they can spend more time on writing
their books and theses.
Presently two attitudes exist in linguistics and in many human sciences. To express it allegorically, let us imagine that a group of carpenters are working on
the reconstruction of the famous tilting tower of Pisa. They are erecting a temporary wooden frame around it, with a spiral walkway resembling the thread of a huge bolt or screw. Some of them are in positions higher than the rest, working with sophisticated
methods and machines, driving hundreds of medium size screws into the wood. The carpenters with less seniority and without licence use long nails, hammers, and physical strength. The ones using the expensive and fast machines are constantly despising the abilities,
techniques, and hammers of the others. However, it may happen that the long nails are stronger than the quick and superficial screws near the top. Similarly, one authority can despise others claiming that a theory became superseded hundred years ago, looking
always down to others. Those ones looking down can see the past and the optical illusion showing the opposite person in a lower position. But if sometimes they look upwards, they may see a person in the opposite side in a higher position then themselves. The
way of advancement is often like a spiral or like the thread of a screw.
I often felt being in the position of Toto, Dorothy’s little dog that pulled open the mysterious and respectable
curtain in the movie entitled “The wizard of Oz.” (Comparing my size to the empire of linguistics. The only difference was the very friendly attitude of the wizard at the end of the story.)
My only wish is to learn more about the works of linguists on complex linguistic comparisons. The c. 4000 languages of the world yield about 8 million combinations. If each paper has only 50 pages, the number of pages would be 400 million. It must
be guarded in a linguistic Fort Knox of the size of the Pentagon. Anyone knowing of that library, or interested in these results may contact me at firstname.lastname@example.org for details. Any comment, observation, or proposed correction in the subjects would be much
Some of my other titles: Forbidden world history (1999), Desert Island (editor, the autobiography of Bob Robinson
Crusoe, 1996), and 6000 years in biblical illustrations and chronologies (1997).
I would like to express my thanks to the librarians of the Vancouver Public Library and its History and
Languages division, for their helpful assistance. I am expressing gratitude to all colleagues, friends and family members, for offering invaluable comments during the preparation of this material.
Bibliography and suggested reading, including on-line papers and web sites
Bezold, C[arl], (translator and editor). 1883-1888. Cave of treasures or Me*arrath gazze. (Translated
from Syriac. Also called “The origin of the tribes,” according to the Hungarian edition, c.1984)
Black, Paul. 1997. Lexicostatistics and Australasian languages: Problems and prospects. In Darrell Tryon & Michael
Walsh (Eds.), Boundary rider: Essays in honour of Geoffrey O’Grady, 51-69. (Pacific Linguistics C-136.) Canberra: Research School of pacific and Asian Studies, Australian National University.
Blust, Robert A. 1991. The Greater central
Philippines hypothesis. Oceanic Linguistics 30(1, 2): 73-129. (Honolulu: University of Hawaii)
Chretien, Douglas. 1962. A classification of twenty-one Philippine languages. In Philippine Journal of Science, 91: 485 - 506.
Comrie, Bernard and Stephen Matthews, and Maria Polinsky, consulting editors). 1996. The atlas of languages. New York: Facts On File, Inc. (A Quarto Book, foreword by Jean Aitchison.)
Comrie, Bernard(ed). 1987. The World’s Major
Languages. Sydney: Croom Helm.
Crystal, David. 1989, 1991. The Cambridge encyclopedia of language. New York: Cambridge Univ. Press.
Douglas, Wilfrid. 1976. The Aboriginal languages of the south-west of Australia. Canberra:
Australian Institute of Aboriginal Studies.
Dyen, Isidore. 1971. The Austronesian languages and Proto-Austronesian. In Thomas Sebeok (ed)
Current Trends in Linguistics, Vol. 8, Part 1: Linguistics in Oceania.
1965. A lexicostatistical classification of the Austronesian languages. In International
Journal of American Linguistics, Memoir 19. (Indiana University Publications in Anthropology and The Waverly Press)
web pages using the Google.com search: ISO 639 Code: phi. [Also, one can find all SIL classification data of these languages by entering “(any) language” Ethnologue, and hit the search button.]
Funk & Wagnalls, Standard Dictionary
of Folklore, Mythology, and Legend (New York, 1972 reprint)
Green, J. N. 1988. The Romance languages, ed. by M. Harris and N. Vincent. Oxford University Press.
Grimes, Barbara F. (ed). 1996. Ethnologue: Languages of the World,
Dallas, Texas: Summer Institute of Linguistics.
Heyerdahl, Thor. 1966. (Reports of the) Norvegian archaeological expedition to Easter Island and the East Pacific. (In two volumes) London: George Allan Unwin Ltd.
Key, Mary Ritchie. 1984. Polynesian and American linguistic connections. Jupiter Press.
Kruskal, J. B., I. Dyen, and P. Black. 1973. Some results from the vocabulary method of reconstructing language trees. Pages 30-55
in I. Dyen (Ed.), Lexicostatistics in genetic linguistics. The Hague: Mouton.
Llamzon, Teodoro, S.J.1978. Handbook of Philippine Language Groups. Quezon City: Ateneo de Manila Press.
Lord, Robert. 1971. Comparative linguistics.
New York: David McKay Company Inc.
McFarland, Curtis D. 1966. Subgroupings and number of Philippine languages or How many
Philippine languages are there? In Maria Lourdes S. Bautista (ed) Readings in Philippines Sociolinguistics.
Manila: De La Salle University Press.
Newbrook, Mark. 1999. Personal communication: e-mail message of 18 May.
Pei, Mario. 1971. Invitation to linguistics. Chicago: Henry Regnery Company.
Reid, Lawrence A., Philippine
minor languages: Word lists and phonologies ([Honolulu]: University of Hawaii Press, 1971)
Renfrew, Colin, April McMahon and Larry Trask (Eds.) 2000. Time depth in historical linguistics. 2 vols. Cambridge, England: The McDonald Institute
for Archaeological Research.
Ruhlen, Merritt. 1987. A Guide to the World’s Languages (Volume 1: Classification). California: Standford University Press.
Rubino, Carl. c. 2002. Philippine languages. (Internet web
page, by Google search)
Rubrico, Jessie Grace U. (c. 2002) The languages of the Philippines. Internet (by Google.com search)
Simon Zoltan. 1984. Atlantis, the seven seals (Vancouver: Robinson Expeditions Publishing,
same as below)
Simon, Zoltan. 1998. Polyglot dictionary of 116 languages. North Vancouver: Robinson Crusoe Enterprises.
Swadesh, Morris. 1952. Lexicostatistic dating of prehistoric ethnic contacts: with special reference
to North American Indians and Eskimos. Proceedings of the American Philosophical Society 96: 152-63 or 452-463.
Thomas, David and Allan Healy. 1962. Some Philippine language subgroupings and reconstruction: A lexicostatistical study.
Anthropological Linguistics, 4 (1 or 9): 21-33.
Walton, Charles. 1979. A Philippine language tree. Anthropological Linguistics 21(2): 70-98.
Webster’s Illustrated Encyclopedic Dictionary. 1990. Montreal: Tormont Publications
Zorc, David Paul R. 1975. The Bisayan dialect of the Philippine subgroupings and reconstruction.
Dissertation (Ph.D), Cornell University. Published in 1977. (Canberra: The Australian National University.)
Zorc, R. David. 1986.
The genetic relationships of Philippine languages. In FOCAL II: Papers from the Fourth International Conference on Austronesian Linguistics, ed. by Paul Geraghty, Lois Carrington, and S.A. Wurm. Pacific Linguistics C-94: 147-173.