Morpho-syntactic annotation and dependency parsing of German

Trushkina, Julia

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
5 Philosophische Fakultät
→
Dokumentanzeige

dc.contributor.advisor	Hinrichs Erhard	de_DE
dc.contributor.author	Trushkina, Julia	de_DE
dc.date.accessioned	2004-12-20	de_DE
dc.date.accessioned	2014-03-18T09:52:00Z
dc.date.available	2004-12-20	de_DE
dc.date.available	2014-03-18T09:52:00Z
dc.date.issued	2004	de_DE
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:21-opus-15239	de_DE
dc.identifier.uri	http://hdl.handle.net/10900/46243
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-462439	de_DE
dc.description.abstract	The parsing of natural language relies on the syntactic characteristics of words. The part of speech category is one of the most common sources of information in parsing. In the parsing of highly inflectional languages, morphological information, such as case, number and gender, also plays an important role. It helps to resolve syntactic ambiguity in shallow parsing and is particularly useful in dependency parsing of languages with free word order, since it partly determines the argument structure of the sentence. For German, a highly inflectional language with partially free word order, the problem of assigning morpho-syntactic categories, such as part of speech, case, number, gender, person, tense} and mood, i.e. the problem of morpho-syntactic annotation, is complicated by the high ambiguity inherent in tokens. Moreover, the partially paradigm-dependent case syncretism of this language makes the problem particularly intricate. This thesis is concerned with the automatic morpho-syntactic annotation of German. Different approaches to the task are investigated in this thesis. A hybrid system with rule-based and statistical modules that combines the relative strengths of the rule-based and statistical methods involved is presented. The rule-based module is based on the Xerox Incremental Deep Parsing System and provides a novel constraint-based framework that integrates phrase-internal concord rules and phrase-external syntactic heuristics into one uniform architecture. The rule-based module successfully reduces the candidate analyses provided by a morphological analyzer. The statistical module is based on a novel use of probabilistic phrase-structure grammars for morpho-syntactic annotation. The module resolves the remaining cases of ambiguity, providing unambiguous and highly accurate output. The usefulness of morpho-syntactic information is evaluated empirically in the creation of a dependency parser for German. The input to the parser is limited to tokens and their morpho-syntactic characteristics. The parser reaches state-of-the-art performance.	en
dc.description.abstract	Das Parsing natürlicher Sprache hängt von den syntaktischen Kategorien der Wörter ab: Die POS-Kategorie ist eine der am häufigsten verwendeten Informationsquelle für das Parsing. Beim Parsing stark flektierender Sprachen spielt morphologische Information, wie Kasus, Numerus und Genus, ein wichtige Rolle. Sie hilft dabei, syntaktische Ambiguität beim Shallow Parsing aufzulösen und stellt sich als besonders nützlich heraus, wenn sie auf Sprachen mit relativ freier Wortfolge angewandt wird, da sie die Argumentenstruktur eines Satzes teilweise mitbestimmt. Im Deutschen, einer stark flektierenden Sprache mit teilweise freier Wortfolge, ist das Problem der Zuordung morphosyntaktischer Kategorien, wie POS, Kasus, Numerus, Genus, Person, Tempus und Modus, schwierig, da die Tokens eine hohe Ambiguität besitzen. Zusätzlich verkompliziert wird das Problem durch einen teilweise paradigmaabhängigen Synkretismus im Kasus, der dieser Sprache eigen ist. Diese Arbeit beschäftigt sich mit der automatischen morphosyntaktischen Annotation im Deutschen. Verschiedene Ansätze, diese Aufgabe zu bewältigen, wurden erarbeitet und ein hybrides System mit einem regelbasierten und einem statistischen Modul wird vorgestellt, das die Stärken regelbasierter und statistischer Methoden vereint. Das regelbasierte Modul basiert auf dem Xerox Incremental Deep Parsing System und bildet ein neues constraint-basiertes System, das phraseninterne Kongruenzregeln und phrasenexterne syntaktische Heuristiken in eine einheitliche Architektur integriert. Das regelbasierte Modul reduziert die von der morphologischen Analyse gelieferten möglichen Analysen erfolgreich. Das statistische Modul basiert auf einer neuartigen Nutzung probabilistischer Phrasenstrukturgrammatiken zur morphosyntaktischen Annotation. Es löst die verbleibenden Fälle von Ambiguität und liefert präzise und vollständig desambiguierte Analysen. Der Nutzen morphosyntaktischer Information wird durch den Aufbau eines Dependenz-Parsers für das Deutsche empirisch evaluiert. Die Eingabe für den Parser ist auf die Tokens und deren morphosyntaktische Eigeschaften beschränkt. Der Paser erreicht eine State-Of-The-Art-Performanz.	de_DE
dc.language.iso	en	de_DE
dc.publisher	Universität Tübingen	de_DE
dc.rights	ubt-podok	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en	en
dc.subject.classification	Morphosyntax , Syntaktische Analyse , Ambiguität	de_DE
dc.subject.ddc	430	de_DE
dc.subject.other	Morphosyntaktische Annotation , Dependenzparsing , morphologische Ambiguität	de_DE
dc.subject.other	probabilistic phrase-structure grammars , morpho-syntactic annotation , tagging , morphological ambiguity , dependency parsing	en
dc.title	Morpho-syntactic annotation and dependency parsing of German	en
dc.title	Morphosyntaktische Annotation und Dependenzparsing des Deutschen	de_DE
dc.type	PhDThesis	de_DE
dc.date.updated	2004-12-20	de_DE
dcterms.dateAccepted	2004-12-13	de_DE
utue.publikation.fachbereich	Sonstige - Neuphilologie	de_DE
utue.publikation.fakultaet	5 Philosophische Fakultät	de_DE
dcterms.DCMIType	Text	de_DE
utue.publikation.typ	doctoralThesis	de_DE
utue.opus.id	1523	de_DE
thesis.grantor	09 Neuphilologische Fakultät	de_DE
utue.publikation.noppn	yes	de_DE

Dateien:	trushkina.pdf 2.63 MB PDF

Das Dokument erscheint in:

5 Philosophische Fakultät [1767]

Zur Kurzanzeige

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

Morpho-syntactic annotation and dependency parsing of German

DSpace Repositorium (Manakin basiert)

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto