← Lecture Series

Lecture No. 3

Intro to Neural Networks

Neural networks are a cornerstone of modern machine learning, but their foundations are surprisingly simple. This lecture walks through how they work, how they learn, and how to understand their structure step-by-step.

\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]
inputs → hidden layer → outputs

01

Consider a classification task: identify fruit from three features — \(x_1\) = color, \(x_2\) = weight, \(x_3\) = sugar content. We design a network with:

  • 3 input nodes
  • 1 hidden layer with 2 neurons
  • 2 output nodes for two possible fruit types

We use the sigmoid function \(\sigma(z)\) as our activation throughout.

\[ h_1 = \sigma(w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + b_1) \]
\[ h_2 = \sigma(w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + b_2) \]
\[ o_1 = \sigma(w_{31}h_1 + w_{32}h_2 + b_3) \]

02

Each neuron computes a weighted sum of its inputs plus a bias, passed through the activation function. The hidden layer produces values \(h_1, h_2\), which are then combined to produce the outputs \(o_1, o_2\).


      

03

Try it out with all weights and biases set to 1, and inputs \([1, 0, 1]\). Press the button to run a full forward pass and see the hidden and output activations.

\[ L = \frac{1}{2}\left[(o_1 - 1)^2 + (o_2 - 0)^2\right] \]
\[ \frac{\partial L}{\partial w_{31}} = (o_1 - y_1) \cdot \sigma'(z_{o_1}) \cdot h_1 \]

04

Suppose our target output is \([1, 0]\). We define a loss function \(L\) and update weights via gradient descent. The derivative of loss with respect to each weight follows the chain rule back through the network, where \(\sigma'(z) = \sigma(z)(1 - \sigma(z))\).

\[ \mathbf{h} = \sigma(\mathbf{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)}) \]
\[ \mathbf{o} = \sigma(\mathbf{W}^{(2)}\mathbf{h} + \mathbf{b}^{(2)}) \]

05

Instead of writing each term by hand, we group computations into matrices. This notation generalizes immediately to any number of layers and neurons — training uses matrix calculus to update \(\mathbf{W}\) and \(\mathbf{b}\) in batch.

Neural networks are sequences of matrix multiplications and non-linear activations. Training is the process of reducing error via gradient descent and backpropagation. Though the name sounds complicated, they're layered functions composed of simple pieces — and building one by hand is the clearest way to see that.