ACHIEVING HIGHER ACCURACY IN CLASSIFYING UZBEK WORDS INTO GRAMMATICAL CATEGORIES USING THE CRF MODEL

Sami  Kobilov; Javohir  Nazarov; Ilyos  Rabbimov

Authors

Sami Kobilov Samarkand State University named after Sharof Rashidov, dots Author
Javohir Nazarov Samarkand State University named after Sharof Rashidov, master Author
Ilyos Rabbimov Center for Economic Research and Reform under the Administration of the President of the Republic of Uzbekistan, Tashkent, Uzbekistan, dots Author

Keywords:

Uzbek language, part-of-speech tagging, word classification, CRF model, natural language processing.

Abstract

The classification of words into grammatical categories (part-of-speech tagging) is a fundamental task in natural language processing (NLP). For morphologically rich languages such as Uzbek, this process becomes more challenging due to complex affixation, agglutinative word forms, and limited resources. This paper investigates the application of the Conditional Random Fields (CRF) model for Uzbek word classification, aiming to achieve higher accuracy compared to traditional approaches such as Hidden Markov Models (HMM). By incorporating contextual and morphological features, CRF achieves more reliable tagging. Several Uzbek sentence examples are analyzed, with CRF applied step by step to demonstrate its advantages. Experimental results show that CRF significantly improves accuracy, achieving 92.7% compared to 84.3% for HMM.

References

Rabbimov I.M., Umirova S.M., Kholmukhamedov B.F. The problem of POS tagging in the corpus of the Uzbek language. Proceedings of the international scientific and practical conference on the topic "Theoretical and practical issues of creating Uzbek national and educational corpora". Tashkent, 97-100 pp, 2021. (in Uzbek)

Sutton C., McCallum A. An introduction to conditional random fields. Foundations and Trends in Machine Learning, 4(4), 267-373, 2012.

Chiche A., Yitagesu B. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data, – 2022. Vol. 9, №1, Pp. 1-25.

Sharipov, M., Kuriyozov, E., Yuldashev, O., & Sobirov, O. (2023). UzbekTagger: The rule-based POS tagger for Uzbek language. arXiv preprint arXiv:2301.12711.

Rabbimov I. M., Umirova S. M., Xolmuxamedov B. F. O’zbek tili korpusida so’z turkumlarini teglash masalasi. O’zbek milliy va ta’limiy korpuslarini yaratishning nazariy va amaliy masalalari X alqaro konferensiya materiallari. Pp. 97-100. – 2021.

Besharati S., Veisi H., Darzi A., Saravani S.H.H. A hybrid statistical and deep learning based technique for Persian part of speech tagging // Iran Journal of Computer Science. – 2021. – Т. 4.– С. 35-43.

ACHIEVING HIGHER ACCURACY IN CLASSIFYING UZBEK WORDS INTO GRAMMATICAL CATEGORIES USING THE CRF MODEL

Authors

Keywords:

Abstract

References

Published

Issue

Section

How to Cite

Innovative Academy RSC