The Red Pill

Linear Algebra is a very broad and encompassing subject in mathematics, with quite a long history, dating back to Babylonian, and possibly Ancient Egyptian sources. It is difficult to find any other part of mathematics that does not rely on linear algebra. Apropos, a Hollywood trope of “the red pill” traces through The Matrix (1999), Total Recall (1990), and probably originated from a Philip K. Dick story (1966). It signifies both an embrace of reality and a call to action. A mathematician could not ask for better.

Definitions

Let’s start with the notion of a vector, which is simply a one-dimensional array of values:

\(x=\left[ 2,3,1,0 \right] \)

Next, we’ll contrast that with the notion of a scalar, which is simply a value:

\(a = 3\)

An addition operation performed on a vector and a scalar produces another vector:

\(\begin{eqnarray} a+x & = & \left[ 3+2,3+3,3+1,3+0 \right] \\ & = & { \left[ 5,6,4,3 \right] } \end{eqnarray}\)

Or we might do a multiplication to produce another vector:

\(\begin{eqnarray} b & = & 2 \\ b\cdot x & = & \left[ 2\cdot 2,2\cdot 3,2\cdot 1,2\cdot 0 \right] \\ & = & { \left[ 4, 6, 2, 0 \right] } \end{eqnarray}\)

Or we could add two vectors together to produce another vector:

\(\begin{eqnarray} y & = & \left[ 5,1,5,0 \right] \\ x+y & = & \left[ 2+5,3+1,1+5,0+0 \right] \\ & = & { \left[ 7,4,6,0 \right] } \end{eqnarray}\)

Recall from the basic definitions in abstract algebra:

ring – semigroup with two binary associative operations, addition and multiplication

The closure property is here – gotcha, and there are binary associative operations for addition and multiplication – gotcha. This is familiar turf. Check out the definition of an abstract vector space, which begins to bridge between abstract algebra and linear algebra.

Let’s push beyond that – let’s multiply two vectors. There are a couple ways to do that. First, we can use a dot product, sometimes called an inner product, to produce a scalar:

\(\begin{eqnarray} x & = & \left[ 2,3,1,0 \right] \\ y & = & \left[ 5,1,5,0 \right] \\ x\cdot y & = & 2\cdot 5+3\cdot 1+1\cdot 5+0\cdot 0 \\ & = & { 10+3+5+0 } \\ & = & 18 \end{eqnarray}\)

In Python, we can use the NumPy package to perform a dot product – using combinations of both vectors and scalars:

import numpy as np
x = [ 2, 3, 1, 0 ]
y = [ 5, 1, 5, 0 ]

a = 3
b = 2

print(np.add(a, x))
print(np.dot(b, x))
print(np.add(x, y))
print(np.dot(x, y))

Next, let’s consider the notion of a matrix – not the blockbuster movie franchise, but the thing that was a spreadsheet long before spreadsheets became Apple II software. We will define a matrix A as:

\(\textbf{A}=\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\)

Then another matrix B as:

\(\textbf{B}=\begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}\)

In Python, we can “reshape” an array into a matrix, then perform matrix addition and other operations:

import numpy as np
A = np.array([1, 2, 3, 4]).reshape(2, 2)
print(A)

B = np.array([5, 6, 7, 8]).reshape(2, 2)

print(np.add(A, B))

Pretty much all of this code shown in Python can be done in Scala using the Matrix API from Twitter – with the added benefit of getting to apply the monoids, etc., from the earlier chapter. Python is simpler to show in small examples, so we’ll stick with that.

Here’s an example of a plot!

Here’s how you’d do a gaussian distribution, which has nothing to do with the topic at hand but is still pretty cool...

%pylab inline
import matplotlib.pyplot as plt
from numpy.random import normal
gaussian_numbers = normal(size=1000)
plt.hist(gaussian_numbers)
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

Now let’s introduce some notation to reference the elements in matrix \(\textbf{A}\):

\(\textbf{A}=\begin{bmatrix} { A }_{ 11 } & { A }_{ 12 } \\ { A }_{ 21 } & { A }_{ 22 } \end{bmatrix}\)

Given our definition of the values in matrix \(\textbf{A}\)and matrix \(\textbf{B}\)above, that’s another way of stating:

\(A_{11} = 1,\ A_{12} = 2,\ A_{21} = 3, \ A_{22} = 4\)

as well as:

\(B_{11} = 5,\ B_{12} = 6,\ B_{21} = 7, \ B_{22} = 8\)

… just in a much more compact form. We can use a similar notation for vectors:

\(\textbf{x} = \begin{bmatrix} { x }_{ 1 } & { x }_{ 2 } \end{bmatrix}\)

Given a definition of the values in vector \(\textbf{x} = \begin{bmatrix} 2 & 3 \end{bmatrix}\)

that’s another way of stating:

\(x_1 = 2,\ x_2 = 3\)

... again, just in a much more compact form.

We can multiply matrix \(\textbf{A}\)and vector \(\textbf{x}\), stated as \(\textbf{Ax}\), based on using a sequence of dot products for each of the elements in the resulting matrix:

\(\begin{align*} \textbf{Ax} &= A \cdot x \\ \textbf{Ax} &= \begin{bmatrix} \begin{pmatrix}A_{11} \ A_{12}\end{pmatrix} \cdot \begin{pmatrix}x_{1} \ x_{2}\end{pmatrix} \ ,& \begin{pmatrix}A_{21} \ A_{22}\end{pmatrix} \cdot \begin{pmatrix}x_{1} \ x_{2}\end{pmatrix} \end{bmatrix} \\ &= \begin{bmatrix} \begin{pmatrix}A_{11} \cdot x_1 + A_{12} \cdot x_2\end{pmatrix} ,& \begin{pmatrix}A_{21} \cdot x_1 + A_{22} \cdot x_2\end{pmatrix} \end{bmatrix} \end{align*}\)

Using our example:

\(\begin{align*} \textbf{Ax} &= \begin{bmatrix} \begin{pmatrix}1, 2\end{pmatrix} \cdot \begin{pmatrix}2, 3\end{pmatrix} & \begin{pmatrix}3, 4\end{pmatrix} \cdot \begin{pmatrix}2, 3\end{pmatrix} \end{bmatrix} \\ &= \begin{bmatrix} \begin{pmatrix}1 \cdot 2 + 2 \cdot 3\end{pmatrix} &\begin{pmatrix}3 \cdot 2 + 4 \cdot 3\end{pmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 2 + 6 & 6 + 12 \end{bmatrix} \\ &= \begin{bmatrix} 8 & 18 \end{bmatrix} \end{align*}\)

In other words, we could solve for an equation \(\textbf{Ax} = \textbf{y}\), where \(\textbf{y} = \begin{bmatrix}8 & 18\end{bmatrix}\) in this case. Using the NumPy package in Python:

import numpy as np
A = np.array([1, 2, 3, 4]).reshape(2, 2)
x = [2, 3]
print(np.dot(A, x))

Extending this notion, we can multiply matrix \(\textbf{A}\)and matrix \(\textbf{B}\). To perform that operation in Python:

A = np.array([1, 2, 3, 4]).reshape(2, 2)
B = np.array([5, 6, 7, 8]).reshape(2, 2)
print(np.dot(A, B))

Recall the definition of a monoid as a semigroup with a unique identity element. We define an identity matrix \(\textbf{I}\), which has the value 1 in the diagonal and 0 for all other elements:

\(\textbf{I}=\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\)

We can use this for matrix multiplication, such that \(\textbf{AI} = \textbf{A}\):

import numpy as np
A = np.array([1, 2, 3, 4]).reshape(2, 2)
I = np.array([1, 0, 0, 1]).reshape(2, 2)
print(np.dot(A, I))

One more definition before moving back into reality… We denote the determinant of a matrix \(\textbf{A}\)as \(det\left( \textbf{A} \right)\) or \(|\textbf{A}|\), which gets used in many different ways. Referencing the elements in matrix \(\textbf{A}\):

\(\textbf{A}=\begin{bmatrix} { A }_{ 11 } & { A }_{ 12 } \\ { A }_{ 21 } & { A }_{ 22 } \end{bmatrix}\)

Then \(det\left( \textbf{A} \right)\)is defined as:

\(\begin{eqnarray} det\left( \textbf{A} \right) & = & \begin{bmatrix} { A }_{ 11 } & { A }_{ 12 } \\ { A }_{ 21 } & { A }_{ 22 } \end{bmatrix} \\ & = & { A }_{ 11 }\cdot { A }_{ 22 }-{ A }_{ 12 }\cdot { A }_{ 21 } \end{eqnarray}\)

and using the values we defined earlier:

\(\begin{eqnarray} det\left( \textbf{A} \right) & = & \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \\ & = & 1\cdot 4-2\cdot3 \\ & = & -2\end{eqnarray}\\ \)

Using the NumPy package in Python:

print (np.linalg.det(A) -2.0)

Let’s take a look at a common problem in algebra – solving for two equations and two unknowns:

\(\begin{align*} 3x_1 + 9x_2 &= 5 \\ 4x_1 + 8x_2 &= 12 \end{align*}\)

Substituting for x1, we can rearrange one of these equations to simplify the problem:

\(\begin{align*} 4x_1 + 8x_2 &= 12 \\ x_1 + 2x_2 &= 3 \\ x_1 &= -2x_2 + 3 \end{align*}\)

Then:

\(\begin{align*} 3x_1 + 9x_2 &= 5 \\ 3(-2x_2 + 3) + 9x_2 &= 5 \\ -6x_2 + 9 + 9x_2 &= 5 \\ (9 - 6)x_2 &= 5 - 9 \\ 3x_2 &= -4 \\ x_2 &= -{4\over3} \end{align*}\)

Now, substituting this known variable to solve for the remaining unknown variable:

\(\begin{align*} x_2 &= -{4\over3} \\ x_1 &= -2x_2 + 3 \\ x_1 &= (-2 \cdot -{4\over3}) + 3 \\ x_1 &= {8\over3} + 3 \\ x_1 &= 5{2\over3} \\ \end{align*}\)

Be sure to check our math:

\(\begin{align*} 3 \cdot 5{2\over3} + 9 \cdot -{4\over3} = 17 -12 &= 5 \\ 4 \cdot 5{2\over3} + 8 \cdot -{4\over3} = 22{2\over3} - 10{2\over3} &= 12 \end{align*}\)

Good, that worked. What a relief. Generally speaking, if we have N equations and N variables, we can solve for that system of equations. When we have more or less than N equations, life becomes interesting. In any case, we can use linear algebra to make this a bit simpler. The same problem could have been stated as one equation:

\(\textbf{Ax = y}\)

where:

\(\begin{align*} \textbf{A} &= \begin{vmatrix} 3 & 9 \\ 4 & 8 \end{vmatrix} \\ \textbf{y} &= \begin{bmatrix} 5 & 12 \end{bmatrix} \end{align*} \)

then we just need to solve for one variable, vector \(\textbf{x}\). Given the equation \(\textbf{Ax = y}\), how about if we simply divide by \(\textbf{A}\)to get the answer? Rather than divide by \(\textbf{A}\), we must multiply both sides of the equation by its matrix inverse. That is denoted as \(\textbf{A}^{\textbf{-1}}\), such that \(\textbf{AA}^{\textbf{-1}} \textbf{= I}\) holds.

Finding the matrix inverse can be tricky business. In general, to find \(\textbf{A}^{\textbf{-1}}\)we take the rearranged elements of \(\textbf{A}\):

\(\begin{bmatrix} { A }_{ 22 } & -{ A }_{ 12 } \\ { A }_{ 21 } & { A }_{ 11 } \end{bmatrix}\)

Then we divide by the determinant \(det(\textbf{A})\), which is a scalar. Using the value for matrix \(\textbf{A}\) given in our system of equations, the rearranged elements become:

\(\begin{bmatrix} 8 & -9 \\ -4 & 3 \end{bmatrix}\)

Then dividing by the determinant \(det(\textbf{A}) = -12\), we get:

\(\begin{bmatrix} -{2\over3} & {3\over4} \\ {1\over3} & -{1\over4}\end{bmatrix}\)

Using the NumPy package in Python is a much simpler way to find \(\textbf{A}^{\textbf{-1}}\) and solve for \(\textbf{x}\):

import numpy as np
A = np.array([3, 9, 4, 8]).reshape(2, 2)
print(np.linalg.det(A))

# -12.0
invA = np.linalg.inv(A)
print(invA)

#array([[-0.667,  0.75],
#       [ 0.333, -0.25]])

y = [ 5, 12 ]
print(np.dot(invA, y))

# array([5.667, -1.333])

That looks right. Frankly, this a small example. These kinds of linear systems in practice may grow to be very, very large. Finding a matrix inverse in those cases can be quite costly and can become problematic in other ways – especially for approximations, which sometimes are the only feasible approach at scale. So we try to find ways of performing the math without calculating an inverse. More about that in a bit.