Linear Algebra Part 01: Identities

Linear Algebra is so interesting and so fun. In this post, we are gonna explore some of its most important identities with proofs and also shed the light on some interesting and closely related results.

Definitions of Scalars, Vector, and Matrices

Scalars: Scalars are numerical values from $\mathbb{R}$
Vectors: Vectors are arrays of scalars where one of the dimensions is 1. A $1 \times d$ is called a row vector, whereas a $d \times 1$ is called a column vector. The entries in a vector are called “components”.
Matrices: Matrices are the logical extension of vectors and can be thought of as rectangular arrays of scalars. To get an element inside the matrix, one has to define the row $i$ and column $j$ to obtain element $a_{ij}$ from matrix $A = [a_{ij}]_{n \times d}$

Operations with Scalars and Vectors

Let 2 $d-$dimensional vectors be $\bar x = [x_1 \dots x_d]$ and $\bar y = [y_1 \dots y_d]$. They can be added component-wise as such:

$$ \bar x + \bar y = [x_1 + y_1 \dots x_d + y_d] $$

and subtracted in the same way

$$ \bar x - \bar y = [x_1 - y_1 \dots x_d - y_d] $$

A scalar $a$ can be multiplied by vector as such:

$$ a\bar x = [ax_1 \dots ax_d] $$

The dot product is defined as the sum of the component-wise multiplication:

$$ \bar x \cdot \bar y = \sum_{i=1}^{d} x_i y_i $$

The Euclidean norm is defined in terms of the dot product

$$ \lVert \bar x \rVert^2 = \bar x \cdot \bar x = \sum_{i=1}^{d} x_i^2 $$

The Euclidean norm is called the $L_2$-norm which can be generalized to any $L_p$-norm as follows

$$ \lVert \bar x \rVert_p = \bar x \cdot \bar x = \left( \sum_{i=1}^{d} |x_i|^p \right)^{(1/p)} $$

The dot product satisfies the Cauchy-Schwarz inequality which sets an upper bound for the value of the dot product

$$ \lvert \bar x \cdot \bar y \rvert \leq \lVert \bar x \rVert \lVert \bar y \rVert $$

Proof: Let $\bar x^\prime$ and $\bar y^\prime$ be unit vectors constructed from $\bar x$ and $\bar y$ such that $$ \bar x^\prime = \dfrac{\bar x}{\lVert \bar x \rVert} $$ and $$ \bar y^\prime = \dfrac{\bar y}{\lVert \bar y \rVert} $$

We show that $\lvert \bar x^\prime \cdot \bar y^\prime \rvert \leq 1$ by calculating $\lVert \bar x^\prime - \bar y^\prime \rVert^2$ and $\lVert \bar x^\prime + \bar y^\prime \rVert^2$ as such

$$ \begin{align*} \lVert \bar x^\prime - \bar y^\prime \rVert^2 &= (\bar x^\prime - \bar y^\prime) \cdot (\bar x^\prime - \bar y^\prime) = \lVert \bar x^\prime \rVert + \lVert \bar y^\prime \rVert - 2 \bar x^\prime \cdot \bar y^\prime = 2 - 2 \bar x^\prime \cdot \bar y^\prime \\ \lVert \bar x^\prime + \bar y^\prime \rVert^2 &= (\bar x^\prime + \bar y^\prime) \cdot (\bar x^\prime + \bar y^\prime) = \lVert \bar x^\prime \rVert + \lVert \bar y^\prime \rVert + 2 \bar x^\prime \cdot \bar y^\prime = 2 + 2 \bar x^\prime \cdot \bar y^\prime \end{align*} $$

Since both $\lVert \bar x^\prime - \bar y^\prime \rVert^2$ and $\lVert \bar x^\prime + \bar y^\prime \rVert^2$ are nonnegative, then $\lvert \bar x^\prime \cdot \bar y^\prime \rvert \leq 1$

Now, we scale the unit vectors to obtain the general inequality

$$ \begin{align*} \lvert \bar x^\prime \cdot \bar y^\prime \rvert &\leq 1 \\ \lVert \bar x \rVert \lVert \bar y \rVert \lvert \bar x^\prime \cdot \bar y^\prime \rvert &\leq \lVert \bar x \rVert \lVert \bar y \rVert \\ \lvert \bar x \cdot \bar y \rvert &\leq \lVert \bar x \rVert \lVert \bar y \rVert \end{align*} $$

Using the Cauchy-Schwarz inequality, we can also prove the triangle inequality which states that, in a triangle formed by $\bar x$ and $\bar y$, the side length of $\lVert \bar x - \bar y \rVert$ is no longer than the sum of the two other sides

$$ \lVert \bar x - \bar y \rVert \leq \lVert \bar x \rVert + \lVert \bar y \rVert $$

Proof: Since both sides of the triangle inequality are nonnegative, we only need to show that it still holds after squaring both sides.

$$ \begin{align*} \lvert \bar x \cdot \bar y \rvert &\leq \lVert \bar x \rVert \lVert \bar y \rVert \\ - \lVert \bar x \rVert \lVert \bar y \rVert &\leq \bar x \cdot \bar y \leq \lVert \bar x \rVert \lVert \bar y \rVert \\ \bar x \cdot \bar y &\geq - \lVert \bar x \rVert \lVert \bar y \rVert \\ -2\bar x \cdot \bar y &\leq 2 \lVert \bar x \rVert \lVert \bar y \rVert \\ \lVert \bar x \rVert^2 + \lVert \bar y \rVert^2 -2\bar x \cdot \bar y &\leq \lVert \bar x \rVert^2 + \lVert \bar y \rVert^2+ 2 \lVert \bar x \rVert \lVert \bar y \rVert \\ (\bar x - \bar y) \cdot (\bar x - \bar y) &\leq (\lVert \bar x \rVert +\lVert \bar y \rVert)^2 \\ (\lVert \bar x - \bar y \rVert)^2 &\leq (\lVert \bar x \rVert + \lVert \bar y \rVert)^2 \end{align*} $$

Operations with Vectors and Matrices

The transpose of a matrix is obtained by flipping its rows and columns such that the $(i, j)th$ entry of the transpose is the same as the $(j, i)th$ entry of the original matrix. Therefore, the transpose of an $n \times d$ matrix is a $d \times n$ matrix. The transpose of a matrix $A$ is denoted by $A^T$. An example of the transpose of $A$ is shown:

$$ \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}^T = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} $$

It immediately follows that $(A^T)^T = A$ and that the transpose of the sum is the sum of the transpose: $$ (A+B)^T = A^T + B^T $$

For the matrix-vector product of $n \times d$ matrix $A$ with $d \times 1$ vector $\bar x$, we can observe that the elements of $\bar x$ act as weights for each column of $A$

$$ \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_1 \ x_2 \end{bmatrix} = \begin{bmatrix} x_1 a_{11} + x_2 a_{12} \\ x_1 a_{21} + x_2 a_{22} \\ x_1 a_{31} + x_2 a_{32} \end{bmatrix} $$

As a general form: $ A \bar x = \sum_{i=0}^{d} x_i \bar a_i$

Now, we define outer products which can be performed between vectors of different lengths

$$ \bar x \bar y ^T = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \begin{bmatrix} y_1 & y_2 & y_3 \end{bmatrix}= \begin{bmatrix} y_1x_1 & y_2x_1 & y_3x_1 \\ y_1x_2 & y_2x_2 & y_3x_2 \\ y_1x_3 & y_2x_3 & y_3x_3 \\ \end{bmatrix} $$

We can also show that if the outer product between an $n\times1$ vector is and a $1\times d$ vector result in an $n\times d$ matrix with the following properties: (i) Every row is a multiple of every other row, and (ii) every column is a multiple of every other column:

$$ \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix}_{n \times 1} \begin{bmatrix} b_1 & b_2 & \dots & b_d \end{bmatrix}_{1 \times d} = \begin{bmatrix} a_1b_1 & a_1b_2 & \dots & a_1b_d \\ a_2b_1 & a_2b_2 & \dots & a_2b_d \\ \vdots & \ddots & \ddots & \vdots \\ a_nb_1 & a_nb_2 & \dots & a_nb_d \\ \end{bmatrix}_{n \times d} $$

It is clear that each row $\bar R_i = \dfrac{\bar R_j a_i}{a_j}$ and that each column $\bar C_i = \dfrac{\bar C_j b_i}{b_j}$

Exercise: Let A be an $1000000 \times 2$ matrix. Suppose you have to compute the $2 \times 1000000$ matrix $A^T AA^T$ on a computer with limited memory. Would you prefer to compute $(A^T A)A^T$ or would you prefer to compute $A^T (AA^T)$?.

Sol:

$$ \begin{align*} (A^T A)A^T &\rightarrow (2 \times 2) (2 \times 1000000) \ \text{ [lower memory usage]} \\ A^T (AA^T) &\rightarrow (2 \times 1000000) (1000000 \times 1000000) \end{align*} $$

Exercise: Let $D$ be an $n \times d$ matrix for which each column sums to 0. Let $A$ be an arbitrary $d \times d$ matrix. Show that the sum of each column of $DA$ is also zero.

Sol:

$$ \begin{align*} \sum_{i=1}^{n} (DA)_{ij} = \sum_{i=1}^{n} \sum_{k=1}^{d} D_{ik} A_{kj} = \sum_{k=1}^{d} A_{kj} \sum_{i=1}^{n} D_{ik} = 0 \end{align*} $$

Exercise: Show that $$(A_1A_2A_3 \dots A_n)^T = A_n^T A_{n-1}^T \dots A_{2}^T A_{1}^T$$

Sol:

$$ (A_1A_2A_3 \dots A_n)^T = A_n^T A_{n-2}^T (A_1A_2A_3 \dots A_{n-2})^T = A_n^T A_{n-1}^T \dots A_{2}^T A_{1} $$

A matrix is called symmetric if it is square and equal to its transpose $$A=A^T$$

Exercise: If A and B are symmetric matrices, then show that AB is symmetric if and only if AB = BA

Sol:

$$ AB = (AB)^T = B^T A^T = BA $$