INTRO TO ITERATIVE OPTIMIZATION

In many introductory calculus courses, we learn how to local and absolute minimum and maximums of functions as an import application of differentiation. Defining a "cost" function and being able to find an input that reduces said cost as much as possible is one of the most useful applications of basic calculus. This is the process behind many ML applications and network design, for instance.

Take the following function \(f(x)\) and its derivative:

\(f\left(x\right)=\cfrac{x^{2}}{2}-\sin x\).

\(f'\left(x\right)=x-\cos x\)

We can see that the only inflection point on \(f\left(x\right)\) is at \(x=\cos x\). Because of the mix of rational and irrational terms, there is no good way to solve this by hand. In instances like this, a good option for solving this equation is an iterative root finding method.

Newton's method is a simple iterative algorithm for approximating local roots of a function. It can be simply defined as:

\(\mathrm{x}_{0}=initial\ guess\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \cfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{n})}\)

for step \(n\). The basic premise of the algorithm is to draw a tangent line (being the linear approximation of the function) at point \(\mathrm{x}_{n}\) and "approaching" where that tangent line's root is, being \(\mathrm{x}_{n+1}\).

We can watch the process happen step-by-step for functions that we know the roots for. We can see the point move closer and closer to zero as it "rolls" down the parabola.

An important note here is that Newton's method approximates a root rather than strictly finding it. You can experiment with the example and see that no matter how close to zero we get, we do not quite get there; there is some amount of error.

It's also important to note that the effectiveness of the approximation increases the closer you start to the real zero. Try pressing play on \(x_0\) and watch the other values. The better your initial guess is, the more accurate your result is after \(n\) iterations.

Now we can watch Newton's method work on a function that we can't explicitly find the root for.

We can see that depending on the starting value, sometimes \(x_5\) displays as finding an actual zero. This is a kind of floating point error, as we know that x must be irrational (and thus not fully computable). We know this fact via a very simple proof via contradiction:

Proof that the value \(x\) in \(x = cos(x)\) is irrational:
- Assume some rational \(r\) such that \(r = cos(r)\)
- if \(r\) is rational, then \(cos(r)\) must be irrational
- if \(cos(r)\) is rational, then \(r\) must be irrational
- since no rational can satisfy such that \(r = cos(r)\), \(x\) must be irrational
- Q.E.D.

Because \(x\) is an irrational term of no real significance (Pi, E, and other important irrationals are stored in your computer) Desmos doesn't catch the floating point error and calls it a zero. Thus, we have \(0.739085133215\) as a root that is accurate to 12 significant digits.

We can then apply this process to the original function posed, \(f\left(x\right)=\cfrac{x^{2}}{2}-\sin x\). Since we have 'found' the root of \(f'\left(x\right)=x-\cos x\), we have 'found' an inflection point and in turn a minimum of \(f\left(x\right)\). Here, I inputted a value I computed that is slightly more accurate then what we had computed before.

Now we can watch Newton's method work on a function that has multiple local minimums, as well as a local maximum.

Pressing the play button on \(x_{0}\) shows that Newton's algorithm will approach different points depending on the input. Newton's method is finding the closest zero of \(f'(x)\), \(\exists (-\frac{\sqrt{2}}{2}, 0, \frac{\sqrt{2}}{2})\). Localization is an important thing to keep in mind.

Another useful benefit of these kinds of algorithms is multivariable functions. You can use these methods in any dimension. Though, this idea is a little beyond the scope of what I want to cover here. With the proper background in multivariable calculus and a touch of linear algebra, just like the basic understanding of single variable calculus needed here, it comes together really intuitively.

Here's a collection of similar methods and their definitions. I've included simple Python scripts that perform the methods. Experiment by trying out different functions and variables.

Bisection

\(a_0, b_0 = initial\ endpoints\)

\(x_n = \cfrac{a_n+b_n}{2}\)

\(if\ sign(f(a_n)) = sign(f(x_n)):\)
\(a_{n+1} = x_n\)
\(b_{n+1} = b_n\)

\(if\ sign(f(b_n)) = sign(f(x_n)):\)
\(a_{n+1} = a_n\)
\(b_{n+1} = x_n\)

bisection.py

Newton

\(\mathrm{x}_{0}=initial\ guess\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \cfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{n})}\)

newton.py

Simple Newton

\(\mathrm{x}_{0}=initial\ guess\)

\(\mathrm{x}_{n+1}=\mathrm{x}_{n} - \cfrac{f(\mathrm{x}_{n})}{f'(\mathrm{x}_{0})}\)

simple_newton.py

Secant

\(\mathrm{x}_{0} = initial\ guess\)

\(\mathrm{x}_{1} = \mathrm{x}_0 - \cfrac{f(\mathrm{x}_{0})}{f'(\mathrm{x}_{0})}\)

\(\mathrm{x}_{n+1} = \cfrac{f(\mathrm{x}_{n})}{\mathrm{a}_{n}}\)

\(where\ \mathrm{a}_{n} = \cfrac{f(\mathrm{x}_{n})-f(\mathrm{x}_{n-1})}{\mathrm{x}_{n}-\mathrm{x}_{n-1}}\)

secant.py