Differentiable optimization

4. Differentiable optimization#

Optimization is the mathematical problem of computing a minimum of a function, shortly denoted as

\[\operatorname*{minimize}_{ {\bf w} \in \mathbb{R}^N} \; J({\bf w}).\]

We refer to it as a continuous optimization problem when the variable \({\bf w}\) is a vector of real numbers belonging to a vector space such as \(\mathbb{R}^N\). An optimization problem can be further classified according to the nature of the cost function, which may be convex or nonconvex, differentiable or nondifferentiable, and so on. Another important distinction is whether an optimization problem imposes constraints on the optimal solution. Among all these variants, unconstrained differentiable optimization problems are the easiest to solve. This is because the descent directions of a function can be locally deduced from its negative gradient, leading to a powerful algorithm called gradient descent.

Contents: This chapter presents the gradient descent method, one of the most efficient algorithms for finding a local minimum of a differentiable function. As this approach relies on the gradient of the cost function to find descent directions, it is much more efficient than derivative-free methods, but at the cost of narrowing its scope to a smaller class of optimization problems.

You will see that the negative gradient of a function indicates a descent direction.
You will study gradient descent and its parameters, such as initialization, step-size, and number of iterations.
You will understand in what circumstances the initialization matters for gradient descent.

Acknowledgment: Part of this chapter is based on the supplementary material for the textbook Machine Learning Refined (1st ed., 2016) written by Jeremy Watt, Reza Borhani, and Aggelos K. Katsaggelos.