Processing math: 100%

Wednesday, 21 June 2023

What is a matrix?

The following post is mostly aimed at high school students interested in mathematics (and even budding undergrads). We have learned in our schools that a matrix is a rectangular array of numbers. But is that all?

In physics we have seen vector quantities like force, displacement etc. A vector like 2i+4j+7k can also be represented by ordered triple (2,4,7). We can generalize to ordered n-tuples and talk about vectors in n^{th} dimensional space. We consider the set of all ordered n-tuples of real numbers \mathbb{R}^n (each individual n-tuple will be referred as a vector). Let X = \left( x_1, x_2,...,x_n \right), Y = \left( y_1, y_2,...,y_n \right) and k be a real number (also called a scalar). Then, X+Y defined as the vector \left( x_1 + y_1, x_2 + y_2,...,x_n + y_n \right) and k \cdot X = \left( kx_1, kx_2,...,kx_n \right). The operations + and \cdot are called vector addition and scalar multiplication respectively. We usually write k \cdot X as kX. For n=2 and 3, this coincides with the usual notion of vectors in classical physics and for n=1, it is the usual addition and multiplication of real numbers.

Some properties which we can easily verify are:
  • X+Y = Y+X
  • X+(Y+Z) = (X+Y)+Z
  • There exist \textbf{0} \in \mathbb{R}^n such that for all X \in \mathbb{R}^n, X +\textbf{0} = X. Just choose \textbf{0} = \left( 0, 0,...,0 \right).
  • For all X \in \mathbb{R}^n there exist X' \in  \mathbb{R}^n such that X+X'=\textbf{0}. Take X' = (-1)X and it is not hard to see that this is the only possible choice for X'.
  • (k_1k_2) X = k_1 (k_2  X)
  • (k_1 + k_2)  X = k_1  X + k_2 X
  • k (X +Y) = k  X + k  Y
  • 0X = \textbf{0} and 1X = X
It is easy to see that every vector X in \mathbb{R}^n can be represent uniquely in the form \sum_{i=1}^{n} x_i E_i where X = \left( x_1, x_2,...,x_n \right) and E_i is the n-tuple whose i^{th} entry is 1 and 0 elsewhere. These E_i's are said to form a basis for \mathbb{R}^n. In case of confusion we shall denote them as E_i^n to indicate the dimension of the space.

We say that a function T: \mathbb{R}^n \to \mathbb{R}^m is linear if for all X, Y \in \mathbb{R}^n and k \in \mathbb{R}, T(kX +Y) = kT(X) + T(Y)
.
Can you think of some examples of linear maps. Since T(X) = \sum_{i=1}^{n} x_i T(E_i), values at basis elements E_i, i = 1,2,...,n determine the linear map. Morever, if you wish to define a linear map T satisfying T(E_i) = F_i, i=1,2,...,n, define T(X) =  \sum_{i=1}^{n} x_i F_i. It is not hard to verify that this map is well defined and linear. (Try this out! Read the paragraph on basis again). If T: \mathbb{R}^n \to \mathbb{R}^m and S: \mathbb{R}^m \to \mathbb{R}^l are maps, we can define the composition S \circ T: \mathbb{R}^n \to \mathbb{R}^l as the map S \circ T(X) = S(T(X)). It is an easy exercise to verify that if T and S are linear, then S \circ T is linear. (We usually write ST for S \circ T). If  T_1: \mathbb{R}^n \to \mathbb{R}^m and T_2: \mathbb{R}^n \to \mathbb{R}^m are maps, then we can define T_1 + T_2:\mathbb{R}^n \to \mathbb{R}^m as (T_1 + T_2)(X) = T_1(X) + T_2(X). If T_1 and T_2 are linear, so is T_1 + T_2.

In previous paragraph, we saw how values of a linear map T at basis elements completely determine the map and how to construct a linear map with given values at basis elements (and this construction is unique by first statement). We can interpret elements of R^n as column vectors ie. n \times 1 matrices. Let  T: \mathbb{R}^n \to \mathbb{R}^m is linear. Now, form a m \times n matrix with its i^{th} column being T(E_i^n) (since T(E_i^n) \in \mathbb{R}^m, it is interpreted as an m \times 1 matrix). We denote this matrix as \left[T \right]. Procedure for constructing a linear map with given values at basis elements tells is that every m \times n matrix is of the form \left[T \right] for some linear map T: \mathbb{R}^n \to \mathbb{R}^m. So there is a bijective (one to one and onto) correspondence between set of all linear functions from \mathbb{R}^n to \mathbb{R}^m (denoted as \mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^m \right)) and set of all m \times n matrices (denoted as \mathcal{M}_{m \times n} (\mathbb{R})).

Let A be an m \times n matrix. We denote the (i,j)^{th} entry (ie. number in i^{th} row and j^{th} column) of A by A_{ij}. Recall that if A is an m \times n and B is an n \times p matrix, then we can define the matrix product AB by (AB)_{ij} = \sum_{k=1}^{n} A_{ik}B_{kj}. Usual properties like commutativity need not hold ie. It is not always true that AB = BA. However associativity still holds ie. If  A is an m \times n, B is an n \times p and C is an p \times q matrix, then A(BC) = (AB)C still holds. (It requires little work). Distributive property is also valid ie. A(B+C) = AB +AC and (A+B)C = AC + BC, whenever they are defined. (In addition of matrices, we add entrywise ie. (A+B)_{ij} = A_{ij} + B_{ij}).

Let us look at the matrix \left[T \right] in more detail. Denote \left[T \right]_{ij} by t_{ij}. From our above discussion, we see that T(E_j^n) = \sum_{i=1}^{m} t_{ij}E_i^m. Let X \in \mathbb{R}^n as given previously. We shall compute \left[T \right]X (the product will give a m \times 1 matrix). (\left[T \right]X)_{i1} =  \sum_{k=1}^{n} t_{ik}X_{k1} = \sum_{k=1}^{n} t_{ik}x_k
Now, T(X) = \sum_{k=1}^{n} x_k T(E_k^n) = \sum_{k=1}^{n} x_k \left( \sum_{i=1}^{m} t_{ik}E_i^m \right) =  \sum_{i=1}^{m} \left( \sum_{k=1}^{n} t_{ik}x_k \right) E_i^m
\therefore T(X) = \sum_{i=1}^{m}( \left[T \right]X)_{i1}E_i^m
Thus \left[T \right]X gives T(X) in column vector form.

It is easy to verify that \left[ T_1 + T_2 \right] =  \left[T_1 \right] +  \left[T_1 \right]. But it takes slightly more work to show that \left[ST \right] =  \left[S \right] \left[T \right] (take it as a challenging exercise, it is similar to the calculation we have done above). Note that the identity function id_n:\mathbb{R}^n \to \mathbb{R}^n \in \mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^n \right) and \left[id_n \right] is the n \times n identity matrix I_n. The zero map \mathbb{0}:\mathbb{R}^n \to \mathbb{R}^m given by \mathbb{0}(X) = \textbf{0} is linear and unsurprisingly \left[ 0 \right] is the m \times n matrix with all entries 0.

We have seen there is a deep connection between linear maps and matrices. The correspondence T \mapsto \left[T \right] is a bijection \mathcal{L} \left( \mathbb{R}^n , \mathbb{R}^m \right) \to \mathcal{M}_{m \times n} (\mathbb{R}) that preserves structure (the technical term is isomorphism). In conclusion, matrices are precisely linear maps.

PS: We worked with set of real numbers \mathbb{R}. We could also work with \mathbb{Q} or \mathbb{C} (ie. set of rationals and complex numbers respectively). Instead of \mathbb{R}^n or \mathbb{C}^n, the same reasoning applies when we work with algebraic structures called finite dimensional vector spaces. Pick up a book on linear algebra to learn more. Hope this was a good motivation to study matrices.

References:

  • Linear Algebra 4^{th} ed. by Friedberg, Insel and Spence
  • Linear Algebra Done Right 3^{rd} ed. by Axler
  • What Is Mathematics? by Courant and Robbins




No comments:

Post a Comment

Mathematical Biscuit I

 Can you find two irrational numbers a,b such that a^b is rational? Surprisingly, yes and the argument is very easy.  If $\sqrt{2}^{\sq...