The Matrix of a Linear Map - Linear Maps - Introduction to the Theory of Computing I.

2.6 Linear Maps

2.6.3 The Matrix of a Linear Map

Theorem 2.6.6. Assume that b₁, . . . , b_n ∈ Rⁿ is a basis in Rⁿ and c₁, . . . , c_n ∈ R^k are arbitraryvectors. Then there is exactly one linear map f :Rⁿ→R^k for which f(b_i) =c_i for every 1≤i≤n. That is, the image of the basis elements determines a linear map uniquely.

Proof. Assume first that f : Rⁿ → R^k is a linear map for which f(b_i) = c_i holds for every 1 ≤ i ≤ n. If x ∈ Rⁿ, then by Theorem 2.2.14 it can be written uniquely as a linear combination of the basis elements b₁, . . . , b_n, i.e.

x=λ1b₁+· · ·+λnb_n,

where the scalarsλ₁, . . . , λ_n∈Rare determined uniquely byxand the basisB ={b₁, . . . , b_n}.

Note that the scalars λ_i are the coordinates of x relative to B. Hence by Proposition 2.6.1 we have

f(x) = f(λ₁b₁+· · ·+λ_nb_n)

=λ₁f(b₁) +· · ·+λ_nf(b_n) (13)

=λ₁c₁+· · ·+λ_nc_n.

This means that the values off are determined uniquely, so there is at most one linear map f :Rⁿ→R^k that satisfiesf(b_i) = c_i for every i.

We show that the map defined by (13) is linear and satisfies the conditions in the state-ment. This will complete the proof of the theorem. So for everyx∈Rⁿ we define

f(x) =λ₁c₁+· · ·+λ_nc_n,

where λ1, . . . , λn ∈ R are the (uniquely determined) coordinates of x relative to B. First of all, the coordinates of a basis element b_i relative to B are zero except for λ_i = 1, hence f(b_i) = c_i holds for every 1≤i≤n.

It remains to show that f is linear. Assume that x, y ∈Rⁿ and the coordinate vectors of them relative toB are [x]B = (λ1, . . . , λn)^T and [y]B = (µ1, . . . , µn)^T. Then

x+y = (λ₁b₁+· · ·+λ_nb_n) + (µ₁b₁+· · ·+µ_nb_n)

= (λ₁+µ₁)b₁+· · ·+ (λ_n+µ_n)b_n, hence we have that [x+y]_B = (λ₁+µ₁, . . . , λ_n+µ_n)^T. Therefore,

f(x+y) = (λ₁+µ₁)c₁+· · ·+ (λ_n+µ_n)c_k

= (λ₁c₁+· · ·+λ_nc_n) + (µ₁c₁+· · ·+µ_nc_n)

=f(x) +f(y),

which means thatf is additive. Similarly, if α∈R, then the coordinates of αx relative toB are αλ₁, . . . , αλ_n, hence

f(αx) =αλ₁c₁+· · ·+αλ_nc_n

=α(λ₁c₁ +· · ·+λ_kc_n)

=αf(x),

so f is homogeneous, and together with additivity this yields that f is linear and we are done.

Now we are going to assign a matrix to a linear map f :Rⁿ →R^k once a basis is chosen in bothRⁿ and R^k.

Definition 2.6.3. Assume that f : Rⁿ → R^k is a linear map, B₁ = {v₁, . . . , v_n} is a basis of Rⁿ and B₂ = {w₁, . . . , w_k} is a basis of R^k. If f(v_i) = a_1,iw₁ +· · ·+a_k,iw_k, that is, the uniquely determined coordinates ofv_i relative to B2 are a1,i, . . . , ak,i, then the matrix of the linear map f with respect to the basesB₁ and B₂ is

[f]_B

1,B2 =







a1,1 a1,2 . . . a1,n

a_2,1 a_2,2 . . . a_2,n ... ... . .. ... a_k,1 a_k,2 . . . a_k,n





 .

In the special case when B₁ is the standard basis of Rⁿ and B₂ is the standard basis of R^k we omit the indecesB₁ and B₂ in the notation and write simply[f]for the matrix of f with respect to the standard bases.

The matrix above depends not just on f, but also on the chosen bases. The theorem above together with the uniqueness of the coordinates of a vector relative to a basis assures that once the bases B₁ and B₂ are fixed, then the matrix [f]_B₁_,B₂ is determined uniquely by f. In other words, keeping the bases B₁, B₂ fixed, there is a one-to-one correspondence between the linear maps fromRⁿ toR^k and the matrices in R^k×n (but again, we get different matrices for the same linear map if we chose different bases).

This means that we can give the linear map by giving its matrix, and an advantage of this is that the matrix can be used to calculate the values of the map for an arbitrary vector:

Theorem 2.6.7. Assume that f :Rⁿ →R^k is a linear map, B₁ ={v₁, . . . , v_n} is a basis of Rⁿ and B₂ ={w₁, . . . , w_k} is a basis of R^k. If x∈Rⁿ, then

(14) [f(x)]_B₂ = [f]_B₁_,B₂ ·[x]_B₁.

That is, if we multiply the matrix off with respect to B₁ andB₂ by the coordinate vector ofx relative to B₁ from the right, then we obtain the coordinate vector of the vector f(x) relative to the basis B₂. In the special case when B₁ and B₂ are the standard bases in Rⁿ and R^k, respectively, we obtain

f(x) = [f]·x.

Proof. If

[x]_B₁ =





 λ₁ λ2

... λ_n







and [f]_B₁_,B₂ =







a_1,1 a_1,2 . . . a_1,n a2,1 a2,2 . . . a2,n

... ... . .. ... a_k,1 a_k,2 . . . a_k,n





 ,

then

f(x) = f(λ₁v₁+· · ·+λ_nv_n)

=λ₁f(v₁) +· · ·+λ_nf(v_n)

=λ₁(a_1,1w₁+· · ·+a_k,1w_k) +· · ·+λ_n(a_1,nw₁+· · ·+a_k,nw_k)

= (a_1,1λ₁+a_1,2λ₂+· · ·+a_1,nλ_n)w₁ +· · ·+ (a_k,1λ₁+· · ·+a_k,nλ_n)w_k, and this gives (14).

If B₁ is the standard basis in Rⁿ and B₂ is the standard basis in R^k, then [x]_B₁ =x and [f(x)]B2 =f(x), hence the second statement follows.

Corollary 2.6.8. Assume that f : Rⁿ → R^k is a linear map and [f] ∈ R^k×n is its matrix with respect to the standard bases of Rⁿ and R^k. If c₁, . . . , c_n are the columns of [f], then Imf = span{c₁, . . . , c_n}. Moreover, rank(f) = rank([f]) holds.

Proof. Let us introduce the notation W = span{c₁, . . . , c_n}. If x ∈ Rⁿ, then f(x) = [f]·x by the previous theorem, and the product on the right hand side is a linear combination of the columns of[f], soImf ⊂W. On the other hand, ify=x₁c₁+· · ·+x_nc_n ∈W is a linear combination of the columns, theny = [f]·x=f(x) for x= (x₁, . . . , x_n)^T, hence W ⊂ Imf and the first statement of the theorem holds. Moreover,

rank(f) = dim Imf = dimW = rank([f]) holds by Theorem 2.5.17.

As we have promised, now we give a formula for the rotation about the origin on the plane by the angle α. More precisely, we give the matrix of this map:

Proposition 2.6.9. Let f_α :R² →R² be the rotation about the origin by the angle α, then fα is linear and its matrix with respect to the standard bases is

[fα] =

cosα −sinα sinα cosα

Proof. We have already seen in Section 2.6.1 thatf_α is linear. The first column of [f_α] is f_α((1,0)^T) = (cosα,sinα)^T

by the definition of cosα and sinα. As (0,1)^T is obtained when rotating (1,0) about the origin by the angle 90^◦, we get its image from (cosα,sinα)^T in the same way, hence

f_α((0,1)^T) = (−sinα,cosα)^T, which is the second column of [f_α], so we are done.

Consider the following problem: given a linear map f, we are looking for a basis inImf.

We get the matrix[f]∈R^k×n of f by writing the coordinates of f(e_i)in the ith column of a matrix (1≤i≤n), wheree₁, . . . , e_n∈Rⁿis the standard basis. By the previous corollary we need to find a basis of the spanned subspace of the columns of [f]. As we have already seen in the proof of Theorem 2.5.17, this means that we have to find a maximal set of independent column vectors of [f], and the details of the algorithm for this task was given at the end of Section 2.5.4.

Now we handle the same problem for the subspace kerf. By Theorem 2.6.4 we have dim kerf =n−dim Imf =n−rank([f]).

The subspacekerf consists of those vectorsx∈Rⁿ for which the equation[f]·x= 0 holds.

Hence we need to findn−rank([f])independent vectors among the solutions of the equation above, which is equivalent to a system of linear equations. When we apply the Gaussian elimination for this system, we get n−rank(f) free parameters which can be chosen freely and after that the values of the other variables are defined uniquely. Every solution gives the coordinates of a vectorxwhich solves the matrix equation[f]·x= 0. It is easy to see that if we take those vectors that come from the solutions where exactly one of the free parameters is1while the other free parameters are zero, then we get n−rank([f]) independent vectors, so they form a basis in kerf. Indeed, assume, that m = n−rank(f) and x_j₁, . . . , x_j_m are the free parameters, where 1 ≤ j1 < j2 < · · · < jm ≤ k are the indeces of them. For an 1 ≤ i ≤ m, let y

i be the solution of [f]·x = 0 whose j_ith coordinate is 1 but whose j_lth coordinate is 0 for every 1≤l ≤ m, l 6=i. Then the matrix Y ∈R^k×m whose ith column is yi contains the identity matrix I_m as a sub-matrix, hence m =r_d(Y) = rank(Y) =r_c(Y), so the columns ofY are independent.

Note that there is no significance of the special choice of the basesB₁ andB₂here, we only made these changes for simplicity. The argument above can be told (with some appropriate minor changes) for the matrix [f]_B₁_,B₂ where B₁ and B₂ are arbitrary bases of Rⁿ and R^k, respectively. The details are left to the reader.

Example 2.6.4. Let f :R⁵ →R⁴ be the linear map given by the matrix

[f] =A=







2 8 6 4 2

1 2 −1 12 7

−1 −1 3 −12 0

5 22 19 4 7





 .

After applying Gaussian elimination for the system given by the matrix(A|0)we obtain the following reduced row echelon form:





1 0 −5 0 −31 0

0 1 2 0 7 0

0 0 0 1 2 0



.

The columns that contain a leading coefficient are independent, and the corresponding columns ofA (that is, the first, second and fifth column) give a basis of Imf.

The free parameters are the third and the fifth variables (say x3 and x5). The solution that comes from x₃ = 1 and x₅ = 0 is (5,−2,1,0,0)^T, while the solution that comes from x₃ = 0 and x₅ = 1 is (31,−7,0,−2,1)^T. These two vectors form a basis inkerf.

Exercise 2.6.1. Give an alternative proof of the dimension theorem: show (without the usage of it) that the vectors in kerf that are obtained by the method above are not only independent, but in fact they span kerf. Deduce the statement of the theorem from this.

In document Introduction to the Theory of Computing I. (Pldal 95-99)