Using computational criteria to extract large Swadesh lists for lexicostatistics

DSpace Repository


Dateien:

URI: http://hdl.handle.net/10900/68640
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-686406
http://dx.doi.org/10.15496/publikation-10058
Dokumentart: InProceedings (Aufsatz / Paper einer Konferenz etc.)
Date: 2016-03-02
Language: English
Faculty: 5 Philosophische Fakultät
5 Philosophische Fakultät
Department: Allgemeine u. vergleichende Sprachwissenschaft
DDC Classifikation: 400 - Language and Linguistics
Keywords: Sprachstatistik , Phylogenetik
Other Keywords:
Lexicostatistics
Swadesh lists
phylogenetic linguistics
License: Publishing license including print on demand
Order a printed copy: Print-on-Demand
Show full item record

Abstract:

We propose a new method for empirically determining lists of basic concepts for the purpose of compiling extensive lexicostatistical databases. The idea is to approximate a notion of “swadeshness” formally and reproducibly without expert knowledge or bias, and being able to rank any number of concepts given enough data. Unlike previous approaches, our procedure indirectly measures both stability of concepts against lexical replacement, and their proneness to phenomena such as onomatopoesia and extensive borrowing. The method provides a fully automated way to generate customized Swadesh lists of any desired length, possibly adapted to a given geographical region. We apply the method to a large lexical database of Northern Eurasia, deriving a swadeshness ranking for more than 5,000 concepts expressed by German lemmas. We evaluate this ranking against existing shorter lists of basic concepts to validate the method, and give an English version of the 300 top concepts according to this ranking.

This item appears in the following Collection(s)