MULTI- STAGE MOMENT-BASED OPTIMIZATION: ANALYSIS AND APPLICATION OF THE ADAM ALGORITHM
Main Article Content
Аннотация:
In the era of deep learning and large-scale artificial intelligence systems, the importance of efficient optimization algorithms has significantly increased. Neural networks, particularly those with deep and complex architectures, rely heavily on gradient-based iterative methods to update model parameters by minimizing a loss function. Among these methods, the Adam (Adaptive Moment Estimation) algorithm has emerged as a widely adopted solution due to its adaptive learning capability and robust convergence behavior. Originally introduced by D. Kingma and J. Ba in 2015, Adam integrates the advantages of both Stochastic Gradient Descent (SGD) and RMSprop algorithms, addressing several limitations of traditional approaches, such as fixed learning rates, slow convergence, oscillatory updates, and sensitivity to noisy gradients [1][2].
Article Details
Как цитировать:
Библиографические ссылки:
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade (pp. 437–478). Springer.
Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. 2017.
Tieleman, T., & Hinton, G. (2012). Lecture 6.5 – RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.
LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (2012). Efficient backprop. In Neural Networks: Tricks of the Trade (pp. 9–48). Springer.
Zhang, J., & Mitliagkas, I. (2020). Lookahead optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems, 32.
Bock, C., & Gumbsch, P. (2020). Optimization of deep neural networks: Recent advances and applications. Journal of Computational Science, 45, 101182.
Li J., Li X., & Hoi, S.C. (2018). Learning to optimize: A primer and a benchmark.
Reddi, S.J., Kale, S., & Kumar, S. (2019). On the convergence of Adam and beyond. In International Conference on Learning Representations (ICLR).
