Factoring lexical and phonetic phylogenetic characters from word lists

Date: 2015-11-04
Language: English
Computational historical linguistics is a young and new field. Among it’s major challenge is the collection and preparation of suitable data resources. Here we present an approach that takes lexical data taken from a large collection of publicly available wordlists as input and infers automatic assessments regarding the cognacy of words and sounds. We illustrate the workflow and test it by comparing the results obtained from the computation of Maximum Likelihood trees with those provided by experts. The results show that our workflow still lags behind simpler approaches which analyze the data within a distance-based framework. However, since distance-based analyses bear a blackbox character, not allowing for a rigorous check of the individual decisions which lead to a certain classification proposal, we think that our experiments are an important contribution towards the establishment of more transparent methods in quantitative historical linguistics.

