A Linear Arithmetic One-Pager

MIT Press has made their Deep Learning textbook available and downloadable for free. The authors, Goodfellow, Bengio and Courville, have a chapter devoted to reviewing linear arithmetic, which is vital to understanding deep learning. This is my one-pager on what one needs to know.

A. Understand These Symbols

Before tackling the textbook, understand the following symbols.

Symbol Meaning Notes
\(\{\}\) Set A set is a collection of different (i.e. unique) things.
\(\in\) Element of, i.e. “belongs to” \(5\in\{1, 2, 3, 5, 7, 11\}\)
\(\mathbb{R}\) The set of real numbers Basically, any number with or without decimals.
\(\mathbb{Z}\) The set of integers \(\{-\infty, …, -2, -1, 0, 1, 2, …, \infty\}\)
\(\lVert x \rVert\) The norm of x See 2.5 norms
\(\forall{x}\) For all x Every value representable by x
\(A=\{x:…\}\) Set builder notation \(A=\{x: x \in \mathbb{Z}, x > 0\}\) means that A is the set of all integers over 0, i.e. \(\{1, 2, 3, …, \infty\}\).

A programmer’s note on the latter: Python list comprehensions sure look a lot like set builder notation. Replacing “:” with for, the comma with if, “\(\in\)” with in and representing \(\mathbb{Z}\) with a limited array []:

# Representing A = {x : x ∈ Z, x > 0} in Python
A = [x for x in [-3, -2, -1, 0, 1, 2, 3, 4, 5] if x > 0]
print(A)
[1, 2, 3, 4, 5]

B. Scalars, Vectors, Matrices, Tensors

Scalar

A scalar is a single number. Its variable is a small-case letter:

$$ s=5 $$

Vector

A vector is a 1d array of numbers. Its variable is a small-case letter in bold:

$$ \mathbf{s}=[1, 2, 3, 5, 7] $$

Matrix

A matrix is a 2d array of numbers. Its variable is a capital letter in bold: $$ \mathbf{S}= \begin{bmatrix} 3.2 & 5.7 & 7.3 \\ 6.4 & 1.2 & -0.2 \end{bmatrix} $$

The matrix below has height 2 and width 3, i.e., two rows and three columns. If all its elements belong to the set of real numbers, then we could write the following:

$$ \begin{align} \mathbf{S} &\in \mathbb{R}^{2\times 3} \\ \mathbf{S}&= \begin{bmatrix} a & b & c \\ d & e & f \end{bmatrix} \\ f &= \mathbf{S}_{2,3}\\ \end{align} $$

Cross-sections are sets derived from whole rows or columns.

$$ \begin{align} \mathbf{S}_{2,:} &= [d, e, f] \end{align} $$

$$ \begin{align} \mathbf{S}_{:,3} &= [c, f] \end{align} $$

By convention, height is represented by m and width by n. For example, a generic matrix of real numbers would be represented by \(\mathbf{A}=\mathbb{R}^{m \times n}\).

Square Matrix

A square matrix is a matrix whose height is the same as its width, i.e. \(m = n\).

Tensors

A tensor is effectively an array with more than two axes.

The Transpose of a Matrix

The transpose of a matrix \(\mathbf{S}^T\) is the mirror of the matrix \(\mathbf{S}\) across the main diagonal:

$$ \begin{align} \mathbf{S}&= \begin{bmatrix} \color{red}{a}\ & b & c \\ d & \color{red}{e} & f \end{bmatrix} \\ \mathbf{S}^T&= \begin{bmatrix} \color{red}{a} & d \\ b & \color{red}{e} \\ c & f \end{bmatrix} \\ \end{align} $$

The transpose of a horizontal vector is the same vector in column form. The transpose of a scalar is just the same scalar.

Basic Matrix Arithmetic

Two matrices may be added or subtracted if they are the same shape.

$$ \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} + \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 10 & 14 & 22 \end{bmatrix} $$

Matrices may be multiplied or divided by a scalar.

$$ 2 \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 10 & 14 & 22 \end{bmatrix} $$

Broadcasting

Broadcasting allows the addition of a matrix and a vector by adding the vector to each row of the vector.

$$ \begin{bmatrix} 1 & 2 & 3 \\ 5 & 7 & 11 \end{bmatrix} + \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 6 & 9 & 14 \end{bmatrix} $$

In Python’s numpy package, broadcasting works with the same operator as basic addition.

import numpy as np
A = np.array([[1, 2, 3],[4, 5, 6]])
v = np.array([7, 8, 9])
print(A + v)
[[ 8, 10, 12],
 [11, 13, 15]]

C. Matrix Multiplication

Matrices can only be multiplied if the number of columns of matrix A is equal to the number of rows of matrix B.

$$ \mathbf{A} \in \mathbb{R}^{m \times n} \times \mathbf{B} \in \mathbb{R}^{n \times p} = \mathbf{AB} \in \mathbb{R}^{m \times p} $$

Dot Product of Vectors

The the dot product of two vectors with the same dimensionality \(x^Ty\) is \(\sum_{i,j=1}x^T_j y_i \):

$$ \begin{align} \mathbf{x^T} &= \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \\ \mathbf{y} &= \begin{bmatrix} 5 \\ 7 \\ 11 \\ \end{bmatrix} \\ x^T \cdot y &= 1 \times 5 + 2 \times 7 + 3 \times 11 = 52 \end{align} $$

Standard Product of Matrices

The standard product AB is the dot product of the vectors with the same dimensionality \(x^Ty\).

$$ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ \end{bmatrix} = \begin{bmatrix} 1 \times 7 + 2 \times 9 + 3 \times 11 & 1 \times 8 + 2 \times 10 + 3 \times 11 \\ 4 \times 7 + 5 \times 9 + 6 \times 11 & 4 \times 8 + 5 \times 10 + 6 \times 11 \end{bmatrix} = \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix} $$

In Python, one must use the function np.dot on matrices for standard product. Using the multiplication operator attempts the Hadamard product, which will fail if the matrices don’t have the same shape.

import numpy as np
A = np.array([[1, 2, 3],[4, 5, 6]])
B = np.array([[7, 8], [9, 10], [11, 12]])
print(np.dot(A, B))
[[ 58  64]
 [139 154]]

Hadamard Product of Matrices

The Hadamard product \(A \bigodot B\) is completely different from the standard product: it is the multiplication of the elements \(A_{i,j}B_{i,j}\) of two matrices of the same shape. In Python, use the multiplication operator.

import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9], [10, 11, 12]])
print(A * B)
[[ 7 16 27]
 [40 55 72]]

Properties of Matrix Multiplication

System of Linear Equations

Up to now, we’ve used the slope-intercept form for linear equations, \(y = mx + b\). For matrices, we use the standard form, Ax + By = C. To solve a system of linear equations (e.g. if there is one and only one point in space that satisfies all the variables in a group of equations), one can use matrices:

$$ \begin{align} 4x - 3y &= 12 \\ 2x + y &= 6 \\ \begin{bmatrix} 4 & -3 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} 12 \\ 6 \end{bmatrix} \end{align} \\ \text{Solution:} x = 3, y = 0 $$

D. Identity and Inverse Matrices

Diagonal Matrix

A diagonal matrix \(diag(v)\) is a square matrix where its non-diagonal entries are 0.

Example:

$$ \begin{align} v &= \{13, -2.5, 3.1459\}\\ diag(v) &= \begin{bmatrix} 13 & 0 & 0\\ 0 & -2.5 & 0\\ 0 & 0 & 3.1459 \end{bmatrix} \end{align} $$

Identity Matrix

An identity matrix is a matrix that does not change any vector when that vector is multiplied by the matrix. Basically, it is a diagonal matrix of with each element = 1. It is denoted by \(I_N\), with the subscript denoting the dimensions.

$$ \begin{align} I_1 &= \begin{bmatrix} 1 \end{bmatrix} \\ I_2 &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \\ I_3 &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ \end{align} $$

Inverse Matrix

The inverse of matrix \(A\) is denoted \(A^{-1}\), and

$$ A^{-1}A = I_N $$

Example:

$$ \begin{align} A &= \begin{bmatrix} 2 & 3 \\ 4 & 5 \end{bmatrix}\\ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} &= A^{-1} \begin{bmatrix} 2 & 3 \\ 4 & 5 \end{bmatrix}\\ A^{-1} &= \begin{bmatrix} -{5 \over 2} & 3 \over 2 \\ 2 & -1 \end{bmatrix} \end{align} $$

Actually finding the inverse matrix is out of scope of this one-pager and computationally expensive, but in a nutshell: find the adjoint of the matrix \(adj(A)\) and its non-zero determinant \(det(A)\), then:

$$ A^{-1} = {adj(A) \over det(A)} $$

E. Linear Dependence and Span

Linear Combination

A linear combination is the summation of a series of vectors multiplied by their corresponding scalars. For example: \(2\vec{v} + 3\vec{w}\) is a linear combination. So is \(5\vec{u} + \vec{v} - 3\vec{w}\). The vectors could be one-, two-, three-, or more dimensions, i.e., \(\vec{v}\) could be represented by <3, 5, 7, -8> in a 4-dimensional feature set.

Span of a Set of Vectors

The span of a set of vectors is the set of all points obtainable by linear combinations of the original vectors. The span in, say, a 2D space will:

Linear Dependence

Linear dependence means that one or more vectors in a set is a linear combination of another. Conversely, linear independence means no vector in a set is a linear combination of another.

Say \(\vec{v} = <2, 3>\) and \(\vec{w} = <3, 4.5>\).

\(\vec{v}\) and \(\vec{w}\) are linearly dependent because they are linear combinations of each other: \(\vec{w} = {3 \over 2}{\vec{v}}\) and \(\vec{v} = {2 \over 3}{\vec{w}} \).

Singular

A singular is a square matrix with linearly dependent columns.

Checking for Solutions to a Set of Linear Equations

A set of linear equations \(\mathbf{Ax}=\mathbf{b}\) has a unique solution if all the following are met:

Essentially, one wants a square matrix with height m (the same height as b) to get a unique solution.