Lecture No. 3
Neural networks are a cornerstone of modern machine learning, but their foundations are surprisingly simple. This lecture walks through how they work, how they learn, and how to understand their structure step-by-step.
01
Consider a classification task: identify fruit from three features — \(x_1\) = color, \(x_2\) = weight, \(x_3\) = sugar content. We design a network with:
We use the sigmoid function \(\sigma(z)\) as our activation throughout.
02
Each neuron computes a weighted sum of its inputs plus a bias, passed through the activation function. The hidden layer produces values \(h_1, h_2\), which are then combined to produce the outputs \(o_1, o_2\).
03
Try it out with all weights and biases set to 1, and inputs \([1, 0, 1]\). Press the button to run a full forward pass and see the hidden and output activations.
04
Suppose our target output is \([1, 0]\). We define a loss function \(L\) and update weights via gradient descent. The derivative of loss with respect to each weight follows the chain rule back through the network, where \(\sigma'(z) = \sigma(z)(1 - \sigma(z))\).
05
Instead of writing each term by hand, we group computations into matrices. This notation generalizes immediately to any number of layers and neurons — training uses matrix calculus to update \(\mathbf{W}\) and \(\mathbf{b}\) in batch.
Summary
Neural networks are sequences of matrix multiplications and non-linear activations. Training is the process of reducing error via gradient descent and backpropagation. Though the name sounds complicated, they're layered functions composed of simple pieces — and building one by hand is the clearest way to see that.