← Lecture Series

Lecture No. 1

Intro to Iterative Optimization

In many introductory calculus courses, we learn how to find local and absolute minima and maxima of functions as an important application of differentiation. Defining a "cost" function and finding an input that reduces it is one of the most useful applications of basic calculus — and the process behind many ML applications and network design.

\(f\left(x\right)=\dfrac{x^{2}}{2}-\sin x\)
\(f'\left(x\right)=x-\cos x\)

Take the function \(f(x)\) and its derivative above. The only inflection point on \(f(x)\) sits at \(x=\cos x\). Because of the mix of rational and irrational terms, there is no good way to solve this by hand. In cases like this, an iterative root-finding method is the right tool.

\(\mathrm{x}_{0}=\text{initial guess}\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \dfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{n})}\)

Newton's method is a simple iterative algorithm for approximating local roots of a function. The basic premise is to draw a tangent line at point \(\mathrm{x}_{n}\) and approach where that tangent line's root is, giving \(\mathrm{x}_{n+1}\).

Watch the process step-by-step for a function whose root we know. The point moves closer and closer to zero as it "rolls" down the parabola.

Newton's method approximates a root rather than strictly finding it — no matter how close we get, some error remains. The effectiveness increases the closer you start to the real zero. Try pressing play on \(x_0\) to see this in action.

Now we apply Newton's method to a function we can't solve explicitly. Depending on the starting value, \(x_5\) sometimes appears to find an exact zero — a floating-point artifact, since we know \(x\) must be irrational.

Proof that \(x\) in \(x = \cos x\) is irrational: Assume some rational \(r\) such that \(r = \cos(r)\). If \(r\) is rational, then \(\cos(r)\) must be irrational. If \(\cos(r)\) is rational, then \(r\) must be irrational. No rational can satisfy \(r = \cos(r)\), so \(x\) must be irrational. QED.

Desmos doesn't catch the floating-point error and calls it a zero. We have \(0.739085133215\) as a root accurate to 12 significant digits.

We apply this to the original function \(f\left(x\right)=\frac{x^{2}}{2}-\sin x\). Having "found" the root of \(f'\left(x\right)=x-\cos x\), we have found an inflection point and in turn a minimum of \(f(x)\).

With a function that has multiple local minimums and a local maximum, pressing play on \(x_{0}\) shows that Newton's algorithm approaches different points depending on the initial value. Newton's method finds the closest zero of \(f'(x)\), which here are \(\exists\left(-\frac{\sqrt{2}}{2},\ 0,\ \frac{\sqrt{2}}{2}\right)\).

Localization is an important thing to keep in mind — your starting point determines which critical point you converge to.

These methods generalize to multivariable functions in any dimension. With the proper background in multivariable calculus and a touch of linear algebra, it comes together intuitively.

Here is a collection of related methods and their definitions, with simple Python scripts for each. Experiment by trying different functions and variables.

Bisection

\(a_0, b_0 = \text{initial endpoints}\)
\(x_n = \frac{a_n+b_n}{2}\)
\(\text{if } \text{sign}(f(a_n)) = \text{sign}(f(x_n)):\)
\(a_{n+1} = x_n,\quad b_{n+1} = b_n\)
\(\text{if } \text{sign}(f(b_n)) = \text{sign}(f(x_n)):\)
\(a_{n+1} = a_n,\quad b_{n+1} = x_n\)

bisection.py

Newton

\(\mathrm{x}_{0}=\text{initial guess}\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \dfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{n})}\)

newton.py

Simple Newton

\(\mathrm{x}_{0}=\text{initial guess}\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \dfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{0})}\)

simple_newton.py

Secant

\(\mathrm{x}_{0} = \text{initial guess}\)
\(\mathrm{x}_{1} = \mathrm{x}_0 - \dfrac{f(\mathrm{x}_{0})}{f'(\mathrm{x}_{0})}\)
\(\mathrm{x}_{n+1} = \dfrac{f(\mathrm{x}_{n})}{\mathrm{a}_{n}}\)
where \(\mathrm{a}_{n} = \dfrac{f(\mathrm{x}_{n})-f(\mathrm{x}_{n-1})}{\mathrm{x}_{n}-\mathrm{x}_{n-1}}\)

secant.py