A Linear Arithmetic One-Pager
MIT Press has made their Deep Learning textbook available and downloadable for free. The authors, Goodfellow, Bengio and Courville, have a chapter devoted to reviewing linear arithmetic, which is vital to understanding deep learning. This is my one-pager on what one needs to know.A. Understand These Symbols
Before tackling the textbook, understand the following symbols.
Symbol | Meaning | Notes |
---|---|---|
\(\{\}\) | Set | A set is a collection of different (i.e. unique) things. |
\(\in\) | Element of, i.e. “belongs to” | \(5\in\{1, 2, 3, 5, 7, 11\}\) |
\(\mathbb{R}\) | The set of real numbers | Basically, any number with or without decimals. |
\(\mathbb{Z}\) | The set of integers | \(\{-\infty, …, -2, -1, 0, 1, 2, …, \infty\}\) |
\(\lVert x \rVert\) | The norm of x | See 2.5 norms |
\(\forall{x}\) | For all x | Every value representable by x |
\(A=\{x:…\}\) | Set builder notation | \(A=\{x: x \in \mathbb{Z}, x > 0\}\) means that A is the set of all integers over 0, i.e. \(\{1, 2, 3, …, \infty\}\). |
A programmer’s note on the latter: Python list comprehensions sure look a lot like set builder notation. Replacing “:” with for, the comma with if, “\(\in\)” with in and representing \(\mathbb{Z}\) with a limited array []:
# Representing A = {x : x ∈ Z, x > 0} in Python
A = [x for x in [-3, -2, -1, 0, 1, 2, 3, 4, 5] if x > 0]
print(A)
[1, 2, 3, 4, 5]
B. Scalars, Vectors, Matrices, Tensors
Scalar
A scalar is a single number. Its variable is a small-case letter:
$$ s=5 $$
Vector
A vector is a 1d array of numbers. Its variable is a small-case letter in bold:
$$ \mathbf{s}=[1, 2, 3, 5, 7] $$
Matrix
A matrix is a 2d array of numbers. Its variable is a capital letter in bold: $$ \mathbf{S}= \begin{bmatrix} 3.2 & 5.7 & 7.3 \\ 6.4 & 1.2 & -0.2 \end{bmatrix} $$
The matrix below has height 2 and width 3, i.e., two rows and three columns. If all its elements belong to the set of real numbers, then we could write the following:
$$ \begin{align} \mathbf{S} &\in \mathbb{R}^{2\times 3} \\ \mathbf{S}&= \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \\ f &= \mathbf{S}_{2,3}\\ \end{align} $$
Cross-sections are sets derived from whole rows or columns.
$$ \begin{align} \mathbf{S}_{2,:} &= [d, e, f] \end{align} $$
$$ \begin{align} \mathbf{S}_{:,3} &= [c, f] \end{align} $$
By convention, height is represented by m and width by n. For example, a generic matrix of real numbers would be represented by \(\mathbf{A}=\mathbb{R}^{m \times n}\).
Square Matrix
A square matrix is a matrix whose height is the same as its width, i.e. \(m = n\).
Tensors
A tensor is effectively an array with more than two axes.
The Transpose of a Matrix
The transpose of a matrix \(\mathbf{S}^T\) is the mirror of the matrix \(\mathbf{S}\) across the main diagonal:
$$ \begin{align} \mathbf{S}&= \begin{bmatrix} \color{red}{a}\ & b & c \\ d & \color{red}{e} & f \end{bmatrix} \\ \mathbf{S}^T&= \begin{bmatrix} \color{red}{a} & d \\ b & \color{red}{e} \\ c & f \end{bmatrix} \\ \end{align} $$
The transpose of a horizontal vector is the same vector in column form. The transpose of a scalar is just the same scalar.
Basic Matrix Arithmetic
Two matrices may be added or subtracted if they are the same shape.
$$ \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} + \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 10 & 14 & 22 \end{bmatrix} $$
Matrices may be multiplied or divided by a scalar.
$$ 2 \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 10 & 14 & 22 \end{bmatrix} $$
Broadcasting
Broadcasting allows the addition of a matrix and a vector by adding the vector to each row of the vector.
$$ \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} + \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 6 & 9 & 14 \end{bmatrix} $$
In Python’s numpy package, broadcasting works with the same operator as basic addition.
import numpy as np
A = np.array([[1, 2, 3],[4, 5, 6]])
v = np.array([7, 8, 9])
print(A + v)
[[ 8, 10, 12],
[11, 13, 15]]
C. Matrix Multiplication
Matrices can only be multiplied if the number of columns of matrix A is equal to the number of rows of matrix B.
$$ \mathbf{A} \in \mathbb{R}^{m \times n} \times \mathbf{B} \in \mathbb{R}^{n \times p} = \mathbf{AB} \in \mathbb{R}^{m \times p} $$
Dot Product of Vectors
The the dot product of two vectors with the same dimensionality \(x^Ty\) is \(\sum_{i,j=1}x^T_j y_i \):
$$ \begin{align} \mathbf{x^T} &= \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \\ \mathbf{y} &= \begin{bmatrix} 5 \\ 7 \\ 11 \\ \end{bmatrix} \\ x^T \cdot y &= 1 \times 5 + 2 \times 7 + 3 \times 11 = 52 \end{align} $$
Standard Product of Matrices
The standard product AB is the dot product of the vectors with the same dimensionality \(x^Ty\).
$$ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ \end{bmatrix} = \begin{bmatrix} 1 \times 7 + 2 \times 9 + 3 \times 11 & 1 \times 8 + 2 \times 10 + 3 \times 11 \\ 4 \times 7 + 5 \times 9 + 6 \times 11 & 4 \times 8 + 5 \times 10 + 6 \times 11 \end{bmatrix} = \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix} $$
In Python, one must use the function np.dot on matrices for standard product. Using the multiplication operator attempts the Hadamard product, which will fail if the matrices don’t have the same shape.
import numpy as np
A = np.array([[1, 2, 3],[4, 5, 6]])
B = np.array([[7, 8], [9, 10], [11, 12]])
print(np.dot(A, B))
[[ 58 64]
[139 154]]
Hadamard Product of Matrices
The Hadamard product \(A \bigodot B\) is completely different from the standard product: it is the multiplication of the elements \(A_{i,j}B_{i,j}\) of two matrices of the same shape. In Python, use the multiplication operator.
import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9], [10, 11, 12]])
print(A * B)
[[ 7 16 27]
[40 55 72]]
Properties of Matrix Multiplication
- Distributive: \(A(B + C) = AB + AC\)
- Associative: \(A(BC) = (AB)C\)
- NOT commutative \(AB \neq BA\)
- \((AB)^T = A^TB^T\)
System of Linear Equations
Up to now, we’ve used the slope-intercept form for linear equations, \(y = mx + b\). For matrices, we use the standard form, Ax + By = C. To solve a system of linear equations (e.g. if there is one and only one point in space that satisfies all the variables in a group of equations), one can use matrices:
$$ \begin{align} 4x - 3y &= 12 \\ 2x + y &= 6 \\ \begin{bmatrix} 4 & -3 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} 12 \\ 6 \end{bmatrix} \end{align} \\ \text{Solution:} x = 3, y = 0 $$
D. Identity and Inverse Matrices
Diagonal Matrix
A diagonal matrix \(diag(v)\) is a square matrix where its non-diagonal entries are 0.
Example:
$$ \begin{align} v &= \{13, -2.5, 3.1459\}\\ diag(v) &= \begin{bmatrix} 13 & 0 & 0\\ 0 & -2.5 & 0\\ 0 & 0 & 3.1459 \end{bmatrix} \end{align} $$
Identity Matrix
An identity matrix is a matrix that does not change any vector when that vector is multiplied by the matrix. Basically, it is a diagonal matrix of with each element = 1. It is denoted by \(I_N\), with the subscript denoting the dimensions.
$$ \begin{align} I_1 &= \begin{bmatrix} 1 \end{bmatrix} \\ I_2 &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \\ I_3 &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ \end{align} $$
Inverse Matrix
The inverse of matrix \(A\) is denoted \(A^{-1}\), and
$$ A^{-1}A = I_N $$
Example:
$$ \begin{align} A &= \begin{bmatrix} 2 & 3 \\ 4 & 5 \end{bmatrix}\\ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} &= A^{-1} \begin{bmatrix} 2 & 3 \\ 4 & 5 \end{bmatrix}\\ A^{-1} &= \begin{bmatrix} -{5 \over 2} & 3 \over 2 \\ 2 & -1 \end{bmatrix} \end{align} $$
Actually finding the inverse matrix is out of scope of this one-pager and computationally expensive, but in a nutshell: find the adjoint of the matrix \(adj(A)\) and its non-zero determinant \(det(A)\), then:
$$ A^{-1} = {adj(A) \over det(A)} $$
E. Linear Dependence and Span
Linear Combination
A linear combination is the summation of a series of vectors multiplied by their corresponding scalars. For example: \(2\vec{v} + 3\vec{w}\) is a linear combination. So is \(5\vec{u} + \vec{v} - 3\vec{w}\). The vectors could be one-, two-, three-, or more dimensions, i.e., \(\vec{v}\) could be represented by <3, 5, 7, -8> in a 4-dimensional feature set.
Span of a Set of Vectors
The span of a set of vectors is the set of all points obtainable by linear combinations of the original vectors. The span in, say, a 2D space will:
- Be stuck at the origin
- Form a line because the vectors point in the same direction
- Reach all points in \(\mathbb{R}\).
Linear Dependence
Linear dependence means that one or more vectors in a set is a linear combination of another. Conversely, linear independence means no vector in a set is a linear combination of another.
Say \(\vec{v} = <2, 3>\) and \(\vec{w} = <3, 4.5>\).
\(\vec{v}\) and \(\vec{w}\) are linearly dependent because they are linear combinations of each other: \(\vec{w} = {3 \over 2}{\vec{v}}\) and \(\vec{v} = {2 \over 3}{\vec{w}} \).
Singular
A singular is a square matrix with linearly dependent columns.
Checking for Solutions to a Set of Linear Equations
A set of linear equations \(\mathbf{Ax}=\mathbf{b}\) has a unique solution if all the following are met:
- If \(\mathbf{b} \in \mathbb{R}^m\), then A’s column space must be all of \(\mathbb{R}^m\)
- A must have at least m columns, i.e. \(n \geq m\)
- The rows must not be redundant, i.e. linearly dependent. Each redundant row reduces the height m of A
- A must have exactly m rows to have a unique solution.
Essentially, one wants a square matrix with height m (the same height as b) to get a unique solution.