Using computational criteria to extract large Swadesh lists for lexicostatistics

DSpace Repository


Dokumentart: ConferencePaper
Date: 2016-03-02
Language: English
Faculty: 5 Philosophische Fakultät
5 Philosophische Fakultät
Department: Allgemeine u. vergleichende Sprachwissenschaft
DDC Classifikation: 400 - Language and Linguistics
Keywords: Sprachstatistik , Phylogenetik
Other Keywords:
Swadesh lists
phylogenetic linguistics
Order a printed copy: Print-on-Demand
Show full item record


We propose a new method for empirically determining lists of basic concepts for the purpose of compiling extensive lexicostatistical databases. The idea is to approximate a notion of “swadeshness” formally and reproducibly without expert knowledge or bias, and being able to rank any number of concepts given enough data. Unlike previous approaches, our procedure indirectly measures both stability of concepts against lexical replacement, and their proneness to phenomena such as onomatopoesia and extensive borrowing. The method provides a fully automated way to generate customized Swadesh lists of any desired length, possibly adapted to a given geographical region. We apply the method to a large lexical database of Northern Eurasia, deriving a swadeshness ranking for more than 5,000 concepts expressed by German lemmas. We evaluate this ranking against existing shorter lists of basic concepts to validate the method, and give an English version of the 300 top concepts according to this ranking.

This item appears in the following Collection(s)