ACHIEVING HIGHER ACCURACY IN CLASSIFYING UZBEK WORDS INTO GRAMMATICAL CATEGORIES USING THE CRF MODEL

Main Article Content

Abstract:

The classification of words into grammatical categories (part-of-speech tagging) is a fundamental task in natural language processing (NLP). For morphologically rich languages such as Uzbek, this process becomes more challenging due to complex affixation, agglutinative word forms, and limited resources. This paper investigates the application of the Conditional Random Fields (CRF) model for Uzbek word classification, aiming to achieve higher accuracy compared to traditional approaches such as Hidden Markov Models (HMM). By incorporating contextual and morphological features, CRF achieves more reliable tagging. Several Uzbek sentence examples are analyzed, with CRF applied step by step to demonstrate its advantages. Experimental results show that CRF significantly improves accuracy, achieving 92.7% compared to 84.3% for HMM.

Article Details

How to Cite:

Kobilov , S. ., Nazarov , J. ., & Rabbimov , I. . (2025). ACHIEVING HIGHER ACCURACY IN CLASSIFYING UZBEK WORDS INTO GRAMMATICAL CATEGORIES USING THE CRF MODEL. Science and Innovation, 3(34), 35–38. Retrieved from https://in-academy.uz/index.php/si/article/view/60203

References:

Rabbimov I.M., Umirova S.M., Kholmukhamedov B.F. The problem of POS tagging in the corpus of the Uzbek language. Proceedings of the international scientific and practical conference on the topic "Theoretical and practical issues of creating Uzbek national and educational corpora". Tashkent, 97-100 pp, 2021. (in Uzbek)

Sutton C., McCallum A. An introduction to conditional random fields. Foundations and Trends in Machine Learning, 4(4), 267-373, 2012.

Chiche A., Yitagesu B. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data, – 2022. Vol. 9, №1, Pp. 1-25.

Sharipov, M., Kuriyozov, E., Yuldashev, O., & Sobirov, O. (2023). UzbekTagger: The rule-based POS tagger for Uzbek language. arXiv preprint arXiv:2301.12711.

Rabbimov I. M., Umirova S. M., Xolmuxamedov B. F. O’zbek tili korpusida so’z turkumlarini teglash masalasi. O’zbek milliy va ta’limiy korpuslarini yaratishning nazariy va amaliy masalalari X alqaro konferensiya materiallari. Pp. 97-100. – 2021.

Besharati S., Veisi H., Darzi A., Saravani S.H.H. A hybrid statistical and deep learning based technique for Persian part of speech tagging // Iran Journal of Computer Science. – 2021. – Т. 4.– С. 35-43.