Basic Calculus for AI, part 5
The point of calculus – or most math, frankly – is to come up with cool shortcuts to bypass complicated, repetitive, and error-prone techniques. The idea of slopes and gradients, for example, makes it unnecessary to go through every .. single .. coordinate .. possible in a function to find minimums. Likewise, the chain rule is a short cut to calculating the derivatives of clustered functions.A.6 The Chain Rule
DL uses the chain rule for backpropagation, its version of machine learning.
The formal definition is:
If \(g\) is differentiable at \(x\) and \(f\) is differentiable at \(g(x)\), then the composite function \(F = f \circ g\) defined by \(F(x) = f(g(x)))\) is differentiable at \(x\) and \(F’\) is given by the production \(F’(x) = f’(g(x))g’(x)\).
That’s straight from the course slide, but it hurts the brain. What does it mean?
- Differentiable: the derivative exists at the given input variable1.
- If \(f\) is differentiable at \(g(x)\)…: If you feed the results of \(g(x)\) at a given point into f and the function still has a derivative…
- Composite function \(F = f \circ g\): the function where you stuff the results of g into f.
- \(F’(x) = f’(g(x))g’(x)\): let’s do an example:
A.6.1 Example
$$ F(x) = (x^3 - 1)^2 $$
Ooh look. An exponent that works on a bracketed expression. That screams that a function is being passed to another function.
- The inner function is \(g(x) = x^3 - 1\), i.e. the expression inside the brackets.
- The outer function is \(f(x) = g(x)^2\), i.e. the expression outside the brackets.
The derivative is:
$$\begin{aligned} f’(g(x))g’(x) &= 2(x^3 - 1)3x^2 \\ &= 6x^2(x^3 -1) \end{aligned}$$
Just to show it works, let’s do the derivative the hard way, without the chain rule:
$$\begin{aligned} F(x) &= (x^3 - 1)^2 \\ &= (x^3-1)(x^3-1) \\ &= x^6 - x^3 - x^3 + 1 \\ &= x^6 - 2x^3 + 1 \\ F’(x) &= 6x^5-6x^2 \\ &= 6x^2(x^3-1) \end{aligned}$$
This ends differential calculus. I won’t do integral calculus unless I find it really is necessary to understand area under the curve.