The Adam Optimizer - Conceptual Introduction
Optimizers have a pivotal role in machine learning. With no shortage of choices for which to pick, I wanted to highlight the conceptual workings of the famous Adam optimizer
Stochastic gradient-based optimization is of core practical importance in many fields of science and engineering. Many problems in these fields can be cast as the optimization of some scalar parameterized objective function requiring maximization or minimization with respect to its parameters [1]. You may be familiar with the related framework in engineering optimization projects for design work;— for example, I want to maximize the lift of my airfoil while minimizing the drag, through variation of parameter values that specify the geometrical design of my airfoil.
Similarly, in many machine learning models, stochastic gradient-based optimization is used to minimize loss functions that measure the difference between predicted (by the ML model) and actual values (ground truth), guiding the model to improve its predictions. This approach allows models to iteratively adjust millions of parameters in complex architectures, such as neural networks, to achieve optimal performance when making predictions. [1].