Abstract:
This thesis deals with Chinese characters (Hanzi): their key characteristics and how they could be used as a kind of knowledge resource in the (Chinese) NLP.
Part 1 deals with basic issues.
In Chapter 1, the motivation and the reasons for reconsidering the writing system will be presented, and a short introduction to Chinese and its writing system will be given in Chapter 2.
Part 2 provides a critical review of the current, ongoing debate about Chinese characters. Chapter 3 outlines some important linguistic insights from the vantage point of indigenous scriptological and Western linguistic traditions, as well as a new theoretical framework in contemporary studies of Chinese characters.
The focus of Chapter 4 concerns the search for appropriate mathematical descriptions with regard to the systematic knowledge information hidden in characters. The subject matter of mathematical formalization of the shape structure of Chinese characters is depicted as well.
Part 3 illustrates the representation issues.
Chapter 5 addresses the design and construction of the HanziNet, an enriched conceptual network of Chinese characters. Topics that are covered in this chapter include the ideas, architecture, methods and ontology design.
In Part 4, a case study based on the above mentioned ideas will be launched.
Chapter 6 presents an experiment exploring the character-triggered semantic class of Chinese unknown words. Finally, Chapter 7 summarizes the major findings of this thesis. Next, it depicts some potential avenues in the future, and assesses the theoretical implications of these findings for computational linguistic theory.