Hanzi, concept and computation: a preliminary survey of Chinese characters as a knowledge resource in NLP

DSpace Repository


URI: http://nbn-resolving.de/urn:nbn:de:bsz:21-opus-22798
Dokumentart: Dissertation
Date: 2006
Language: English
Faculty: 5 Philosophische Fakultät
Department: Sonstige - Neuphilologie
Advisor: Hinrichs, Erhard
Day of Oral Examination: 2006-02-06
DDC Classifikation: 400 - Language and Linguistics
Keywords: Linguistische Datenverarbeitung , Chinesische Schrift / Schriftzeichen
Other Keywords:
Chinese characters , Ontology , Natural Language Processing , Lexical Resource
License: Publishing license excluding print on demand
Show full item record


Die Arbeit befasst sich mit Schriftsystemen im Allgemeinen und dem Schriftsystem des Chinesischen im Besonderen. Der Verfasser argumentiert dafür, das Lautsystem und das Schriftsystem als zwei zwar nicht gänzlich unabhängige, aber gleichwertige Medien für die Realisierung kommunikativer Akte und die Übermittlung von Inhalten zu betrachten. Dies öffnet eine andere Perspektive auf die Elemente von Schriftsystemen, die als Medien betrachtet werden können, in denen Konzepte kodiert werden. Für die Computerlinguistik ergibt sich daraus die Perspektive, dass zu einem gewissen Grad die Bedeutung komplexer sprachlicher Zeichen aus ihren graphischen Bestandteilen erschlossen werden kann. Dies ist die Arbeitshypothese des Verfassers, die er in der vorliegenden Arbeit experimentell überprüft.


This thesis deals with Chinese characters (Hanzi): their key characteristics and how they could be used as a kind of knowledge resource in the (Chinese) NLP. Part 1 deals with basic issues. In Chapter 1, the motivation and the reasons for reconsidering the writing system will be presented, and a short introduction to Chinese and its writing system will be given in Chapter 2. Part 2 provides a critical review of the current, ongoing debate about Chinese characters. Chapter 3 outlines some important linguistic insights from the vantage point of indigenous scriptological and Western linguistic traditions, as well as a new theoretical framework in contemporary studies of Chinese characters. The focus of Chapter 4 concerns the search for appropriate mathematical descriptions with regard to the systematic knowledge information hidden in characters. The subject matter of mathematical formalization of the shape structure of Chinese characters is depicted as well. Part 3 illustrates the representation issues. Chapter 5 addresses the design and construction of the HanziNet, an enriched conceptual network of Chinese characters. Topics that are covered in this chapter include the ideas, architecture, methods and ontology design. In Part 4, a case study based on the above mentioned ideas will be launched. Chapter 6 presents an experiment exploring the character-triggered semantic class of Chinese unknown words. Finally, Chapter 7 summarizes the major findings of this thesis. Next, it depicts some potential avenues in the future, and assesses the theoretical implications of these findings for computational linguistic theory.

This item appears in the following Collection(s)