Wednesday, 21 June 2023

What is a matrix?

The following post is mostly aimed at high school students interested in mathematics (and even budding undergrads). We have learned in our schools that a matrix is a rectangular array of numbers. But is that all?

In physics we have seen vector quantities like force, displacement etc. A vector like $2i+4j+7k$ can also be represented by ordered triple $(2,4,7)$. We can generalize to ordered $n$-tuples and talk about vectors in $n^{th}$ dimensional space. We consider the set of all ordered $n$-tuples of real numbers $\mathbb{R}^n$ (each individual $n$-tuple will be referred as a vector). Let $X = \left( x_1, x_2,...,x_n \right)$, $Y = \left( y_1, y_2,...,y_n \right)$ and $k$ be a real number (also called a scalar). Then, $X+Y$ defined as the vector $\left( x_1 + y_1, x_2 + y_2,...,x_n + y_n \right)$ and $k \cdot X = \left( kx_1, kx_2,...,kx_n \right)$. The operations $+$ and $ \cdot $ are called vector addition and scalar multiplication respectively. We usually write $k \cdot X$ as $kX$. For $n=2$ and $3$, this coincides with the usual notion of vectors in classical physics and for $n=1$, it is the usual addition and multiplication of real numbers.

Some properties which we can easily verify are:
  • $X+Y = Y+X$
  • $X+(Y+Z) = (X+Y)+Z$
  • There exist $\textbf{0} \in \mathbb{R}^n$ such that for all $X \in \mathbb{R}^n$, $X +\textbf{0} = X$. Just choose $\textbf{0} = \left( 0, 0,...,0 \right)$.
  • For all $X \in \mathbb{R}^n$ there exist $X' \in  \mathbb{R}^n$ such that $X+X'=\textbf{0}$. Take $X' = (-1)X$ and it is not hard to see that this is the only possible choice for $X'$.
  • $(k_1k_2) X = k_1 (k_2  X)$
  • $(k_1 + k_2)  X = k_1  X + k_2 X$
  • $k (X +Y) = k  X + k  Y$
  • $0X = \textbf{0}$ and $1X = X$
It is easy to see that every vector $X$ in $\mathbb{R}^n$ can be represent uniquely in the form $\sum_{i=1}^{n} x_i E_i$ where $X = \left( x_1, x_2,...,x_n \right)$ and $E_i$ is the $n$-tuple whose $i^{th}$ entry is $1$ and $0$ elsewhere. These $E_i$'s are said to form a basis for $\mathbb{R}^n$. In case of confusion we shall denote them as $E_i^n$ to indicate the dimension of the space.

We say that a function $T: \mathbb{R}^n \to \mathbb{R}^m$ is linear if for all $X, Y \in \mathbb{R}^n$ and $k \in \mathbb{R}$, $$T(kX +Y) = kT(X) + T(Y)$$.
Can you think of some examples of linear maps. Since $T(X) = \sum_{i=1}^{n} x_i T(E_i)$, values at basis elements $E_i, i = 1,2,...,n$ determine the linear map. Morever, if you wish to define a linear map $T$ satisfying $T(E_i) = F_i, i=1,2,...,n$, define $T(X) =  \sum_{i=1}^{n} x_i F_i$. It is not hard to verify that this map is well defined and linear. (Try this out! Read the paragraph on basis again). If $T: \mathbb{R}^n \to \mathbb{R}^m$ and $S: \mathbb{R}^m \to \mathbb{R}^l$ are maps, we can define the composition $S \circ T: \mathbb{R}^n \to \mathbb{R}^l$ as the map $S \circ T(X) = S(T(X))$. It is an easy exercise to verify that if $T$ and $S$ are linear, then $S \circ T$ is linear. (We usually write $ST$ for $S \circ T$). If  $T_1: \mathbb{R}^n \to \mathbb{R}^m$ and $T_2: \mathbb{R}^n \to \mathbb{R}^m$ are maps, then we can define $T_1 + T_2:\mathbb{R}^n \to \mathbb{R}^m$ as $(T_1 + T_2)(X) = T_1(X) + T_2(X)$. If $T_1$ and $T_2$ are linear, so is $T_1 + T_2$.

In previous paragraph, we saw how values of a linear map $T$ at basis elements completely determine the map and how to construct a linear map with given values at basis elements (and this construction is unique by first statement). We can interpret elements of $R^n$ as column vectors ie. $n \times 1$ matrices. Let  $T: \mathbb{R}^n \to \mathbb{R}^m$ is linear. Now, form a $m \times n$ matrix with its $i^{th}$ column being $T(E_i^n)$ (since $T(E_i^n) \in \mathbb{R}^m$, it is interpreted as an $m \times 1$ matrix). We denote this matrix as $\left[T \right]$. Procedure for constructing a linear map with given values at basis elements tells is that every $m \times n$ matrix is of the form $\left[T \right]$ for some linear map $T: \mathbb{R}^n \to \mathbb{R}^m$. So there is a bijective (one to one and onto) correspondence between set of all linear functions from $\mathbb{R}^n$ to $\mathbb{R}^m$ (denoted as $\mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^m \right)$) and set of all $m \times n$ matrices (denoted as $\mathcal{M}_{m \times n} (\mathbb{R})$).

Let $A$ be an $m \times n$ matrix. We denote the $(i,j)^{th}$ entry (ie. number in $i^{th}$ row and $j^{th}$ column) of $A$ by $A_{ij}$. Recall that if $A$ is an $m \times n$ and $B$ is an $n \times p$ matrix, then we can define the matrix product $AB$ by $(AB)_{ij} = \sum_{k=1}^{n} A_{ik}B_{kj}$. Usual properties like commutativity need not hold ie. It is not always true that $AB$ = $BA$. However associativity still holds ie. If  $A$ is an $m \times n$, $B$ is an $n \times p$ and $C$ is an $p \times q$ matrix, then $A(BC) = (AB)C$ still holds. (It requires little work). Distributive property is also valid ie. $A(B+C) = AB +AC$ and $(A+B)C = AC + BC$, whenever they are defined. (In addition of matrices, we add entrywise ie. $(A+B)_{ij} = A_{ij} + B_{ij}$).

Let us look at the matrix $\left[T \right]$ in more detail. Denote $\left[T \right]_{ij}$ by $t_{ij}$. From our above discussion, we see that $T(E_j^n) = \sum_{i=1}^{m} t_{ij}E_i^m$. Let $X \in \mathbb{R}^n$ as given previously. We shall compute $\left[T \right]X$ (the product will give a $m \times 1$ matrix). $$(\left[T \right]X)_{i1} =  \sum_{k=1}^{n} t_{ik}X_{k1} = \sum_{k=1}^{n} t_{ik}x_k$$
Now, $T(X) = \sum_{k=1}^{n} x_k T(E_k^n) = \sum_{k=1}^{n} x_k \left( \sum_{i=1}^{m} t_{ik}E_i^m \right) =  \sum_{i=1}^{m} \left( \sum_{k=1}^{n} t_{ik}x_k \right) E_i^m$
$$\therefore T(X) = \sum_{i=1}^{m}( \left[T \right]X)_{i1}E_i^m$$
Thus $\left[T \right]X$ gives $T(X)$ in column vector form.

It is easy to verify that $\left[ T_1 + T_2 \right] =  \left[T_1 \right] +  \left[T_1 \right]$. But it takes slightly more work to show that $ \left[ST \right] =  \left[S \right] \left[T \right]$ (take it as a challenging exercise, it is similar to the calculation we have done above). Note that the identity function $id_n:\mathbb{R}^n \to \mathbb{R}^n \in \mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^n \right)$ and $\left[id_n \right] $ is the $n \times n$ identity matrix $I_n$. The zero map $\mathbb{0}:\mathbb{R}^n \to \mathbb{R}^m$ given by $\mathbb{0}(X) = \textbf{0}$ is linear and unsurprisingly $\left[ 0 \right]$ is the $m \times n$ matrix with all entries $0$.

We have seen there is a deep connection between linear maps and matrices. The correspondence $T \mapsto \left[T \right]$ is a bijection $\mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^m \right) \to \mathcal{M}_{m \times n} (\mathbb{R})$ that preserves structure (the technical term is isomorphism). In conclusion, matrices are precisely linear maps.

PS: We worked with set of real numbers $\mathbb{R}$. We could also work with $\mathbb{Q}$ or $\mathbb{C}$ (ie. set of rationals and complex numbers respectively). Instead of $\mathbb{R}^n$ or $\mathbb{C}^n$, the same reasoning applies when we work with algebraic structures called finite dimensional vector spaces. Pick up a book on linear algebra to learn more. Hope this was a good motivation to study matrices.

References:

  • Linear Algebra $4^{th}$ ed. by Friedberg, Insel and Spence
  • Linear Algebra Done Right $3^{rd}$ ed. by Axler
  • What Is Mathematics? by Courant and Robbins




No comments:

Post a Comment

Mathematical Biscuit I

 Can you find two irrational numbers $a,b$ such that $a^b$ is rational? Surprisingly, yes and the argument is very easy.  If $\sqrt{2}^{\sq...