N-Attributes Stochastic Classifier Combination For Arabic Morphological Disambiguation
Abstract
Morphological disambiguation is the ability to computationally determine which morphological tag of a word is activated by its use in a particular context. The main problem in statistical morphological disambiguation of rich morphological languages is data sparseness, where the level of ambiguity is high and the potential tagset size is very large. This paper investigates several fully supervised stochastic morphological disambiguation approaches for morphologically rich languages, with a specific application to Arabic. First, this paper evaluates the direct statistical disambiguation method in which only one tagging model is used. In this approach, each word is assigned a complex morphological tag. In addition, this paper introduces the single-attribute classifiers combination method in which the problem is decomposed into several single-attribute disambiguation sub-problems. Then, a classifier combination method, which consists of several trigram HMM tagging models and a module which combines them, is used. Results show that the first method suffers from data sparseness and has large tagging time and the second one has low tagging accuracy. Finally, the paper present a novel approach based on the combination of several N-attributes morpheme-based probabilistic classifiers. First, the morphological disambiguation problem is decoupled into several N-attributes tagging sub-problems. Then, several classifiers are used to solve each sub-problem. Finally, the outcomes of all N-attributes classifiers are combined. Several problem decomposition methods and classifiers combination algorithms are investigated. The triple-attributes (N=3) stochastic classifier combination model provides an overall tagging accuracy of 91.5%, reduce the data sparseness problem and saves run time over the direct approaches.
Full Text:
PDFRefbacks
- There are currently no refbacks.
Follow me on Academia.edu