IMPROVING RESULTS IN MULTI-CLASS CLASSIFICATION FOR IMBALANCED DATA BY ASSIGNING WEIGHTS TO CLASSES BASED ON MACHINE LEARNING MODELS
;
LightGBM, HistGB, StochasticGB, MLP.Abstrak
This study analyzed strategies for assigning weights to classes in the problem of imbalanced classification and elucidated the theoretical and practical aspects of applying these strategies in ensemble and neural network models. The imbalance in the proportion of classes for rare positive cases of medical diagnostics, the low result of the accuracy indicator, as a result, leads to a decrease in the results of the assessment criteria of the minority class recall in the models and is eliminated by assigning weight to the classes of the models to prevent the occurrence of an incorrect diagnosis. This study describes the weighted gradient-Hessian approach for gradient boosting family algorithms such as LightGBM and HistGradientBoosting, as well as weighted cross-entropy and, when necessary, threshold-moving and calibration methods for MLP. In mathematical form, the updating of leaf values using the Newton step, the calculation of cumulative gradients at the histogram bin level, and cross-entropy weighting for MLP based on output logits are presented. Additionally, boosting and weighted derivatives are included.
Iqtiboslar
N. Fayzullo, S. Sariyev and Y. Sherzodjon, "Analyzing the Effectiveness of Ensemble Methods in Solving Multi-Class Classification Problems," 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russian Federation, 2025, pp. 788-793, doi: 10.1109/SmartIndustryCon65166.2025.10986248.
Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232.
Friedman, J. H. (2002). Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), 367–378.
Ke, G., Meng, Q., Finley, T., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine Learning in Python. JMLR, 12, 2825–2830.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection. ICCV.
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-Balanced Loss Based on Effective Number of Samples. CVPR.
Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. NeurIPS.
Mekhriddin Nurmamatov1, Shokhrukh Sariyev1. (2025). Intelligent data analysis and hyperparameter tuning using genetic algorithms in machine learning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.16325952
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JAIR, 16, 321–357.
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE TSMC A, 40(1), 185–197.
Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving Prediction of the Minority Class in Boosting. PKDD.
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. ICML.
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting Good Probabilities with Supervised Learning. ICML.
Davis, J., & Goadrich, M. (2006). The Relationship Between Precision-Recall and ROC Curves. ICML.
M. Nurmamatov, S. Sariyev and B. Eshonkulov, "Application of Evolutionary Algorithms to Enhance the Efficiency of Neural Networks and Machine Learning Algorithms," 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russian Federation, 2025, pp. 533-537, doi: 10.1109/SmartIndustryCon65166.2025.10986257.
Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3): e0118432.
Matthews, B. W. (1975). Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochimica et Biophysica Acta, 405, 442–451. (MCC metrikasi)
Elkan, C. (2001). The Foundations of Cost-Sensitive Learning. IJCAI.
M. Nurmamatov, S. Sariyev and I. Uddin, "Methods of Using Artificial Intelligence Algorithms in Human Resource Management," 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russian Federation, 2025, pp. 566-571, doi: 10.1109/SmartIndustryCon65166.2025.10986087
А. Axatov, M. Nurmamatov, F. Nazarov, and Sh. Sariyev, “Genetic algorithm application technology in multi-parameter optimization problems,” AIP Conf. Proc., vol. 3244, art. no. 030025, 2024, doi: 10.1063/5.0242074
Arik, S. Ö., & Pfister, T. (2021). TabNet: Attentive Interpretable Tabular Learning. AAAI (arXiv:1908.07442).
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Neural Networks, 106, 249–259.