LINEAR EQUATIONS AND MATRIX ALGEBRA I

(1)

I

LINEAR EQUATIONS A N D MATRIX ALGEBRA

M a n y physical problems are described by sets of simultaneous algebraic equations. Further, more difficult physical problems lead to approximations involving sets of these equations. For instance, the numerical approximation for the multigroup diffusion method results in rather simple algebraic equations.

T h e frequency with which sets of simultaneous algebraic equations arise motivates the introduction of a matrix notation. T h i s notation provides a compact and convenient statement of physical and mathe

matical relationships, and lends itself readily to theoretical investigations and deductions. Furthermore, matrix notation leads to useful inter

pretation of simultaneous equations and greater understanding, which in turn induces improved methods of solution.

In this chapter we shall introduce this simplified formulation of linear algebra. W e first define matrices and operations with matrices and then discuss properties of special matrices. Following the introduction of a geometric interpretation of matrix equations, we shall derive many matrix relations applied later in the text. Special attention is directed to relations of use in nuclear engineering.

1.1 Linear Equations and Matrix Notation

A simple set of linear equations in three variables might be given as 3x + 2y + ζ

χ — 2y + 4z

—x — y + 2z 2 ,

1 ,

- 1 .

(1.1.1)

1

(2)

Xl

x2 y2

X = y =

_^Xn_ J>n.

as column matrices. Similarly, we define the one row array [âil ιâi2 > · · · >âiri\

(1.1.6) T h e solution of Eqs. (1.1.1) may be found by substitution, determinants, or other means. For the moment, we postpone a discussion of solving the equations. In problems with more than three variables, the notation of Eqs. (1.1.1) is inconvenient, and we adopt a more general subscript notation. Equations (1.1.1) are written in the form

(1.1.2)

T h e quantities X-^ , x% y ^ 3 are the variables or unknowns. T h e elements on the right-hand side, yx , y2 , y3 , are assumed known, as are the coefficients ai} . T h e notation of Eqs. (1.1.2) is conveniently extended to problems of many unknowns. Each equation of (1.1.2) is represented by one line or row of the set of equations. T h e first equation can be written in the compact form

(1.1.3) N o t e the summation is over the index identifying the column of the set of equations. In a similar manner, the entire set of equations may be written

(1.1.4) For η equations in η unknowns, the set of equations may be written

(1.1.5) T h e notation may be simplified even further by defining several arrays of elements. W e define the one-column arrays

(3)

1.1 L I N E A R E Q U A T I O N S A N D M A T R I X N O T A T I O N 3 as a row matrix. T h e ith equation of the set (1.1.5) may then be written

[an , ai2, . . . , ain] (1.1.7)

T h e definition (1.1.7) implies that the element in the 7th column of the row matrix multiplies the element in the jth row of the column matrix.

W e define the entire array of coefficients as the square matrix

T h e entire set of equations may then be written

a.n a2.y am

α,ίΛ att9 ... aη ηn _J L η _J

(1.1.8)

T h e ith equation of the set is found by multiplying the elements of the ith row of the square matrix into the column matrix of x/s.

T h e notation may be further simplified by denoting the one column matrices as single quantities, such as

:= [Xj] ,

Similarly, we denote the square array as

- [yi\ (1.1.9)

(1.1.10)

_^anl^an2

(4)

T h e set of equations (1.1.8) become

X*»xj = [ y d , ( i - i . i i )

3=1

or, equivalently

A x = y . (1.1.12) T h e form of equation (1.1.12) suggests that the quantity A multiplies

the quantity x. W e shall call this multiplication of a column matrix by a square matrix. Obviously the multiplication is defined only when the number of columns of A equals the number of rows of x. It is easily seen that the definition of multiplication may be extended to the case where the matrix A is rectangular rather than square, provided only that the number of columns of A equals the number of rows of x. A matrix of m rows and η columns is referred to as an m by η matrix.

1.2 Matrix Operations

Matrices may be manipulated in a manner similar to numbers. T h e rules for manipulation are derivable from previous results. W e define two matrices as equal if corresponding elements are equal. T h e rule for the addition of matrices can be derived by noting that

Χ

^{*,·Λ +}

X

^hjXj

= X

^{a^iS

+

^b^i}^)xj,

j J j

and hence

K ] + [bfj] = [au + bi}] .

Thus addition of matrices is performed by adding corresponding elements. T h e definition applies only when A and Β have the same number of rows and columns. Addition of matrices is commutative and associative.

A + B - B + A , (1.2.3) A + ( B + C ) = ( A + B ) + C. (1.2.4) T h e rule for multiplication of two matrices may be derived by

considering two sets of simultaneous equations. Consider the sets of equations

A x = y , (1.2.5) and

B y = z , (1.2.6) (1.2.1)

(1.2.2)

(5)

1.2 M A T R I X O P E R A T I O N S 5 where the products are assumed to exist. T h e ith equation of (1.2.5) is

Ji = 2)^aa^xj , (1-2.7)

•j

whereas the kth equation of (1.2.6) is

*

^fc

**=X*«^-**

^(1.2.8)

i

Thus,

Zk = X hi Χ Λ , · Λ = Χ (Χ . (1.2.9)

i j i i

In matrix notation we have

ζ = By, (1.2.10) y = A x , (1.2.11) ζ = B A x . (1.2.12) Consequently,

[ ( B A )W] = . (1.2.13)

T h e summation in Eq. (1.2.13) is to extend over the columns of Β and the rows of A . Therefore, matrix multiplication is defined only when the number of columns of the first matrix equals the number of rows of the second matrix. T h e product matrix will have as many rows as Β and as many columns as A .

It is easily seen that matrix multiplication is associative and distributive

A ( B C ) - ( A B) C , (1.2.14)

A ( B + C ) = A B + A C . (1.2.15) It is easily shown that matrix multiplication is not commutative; that is,

A B ^ B A (1.2.16) in general. N o t e that if A and Β are not square, the products cannot be

equal. Even for square matrices, the matrices do not commute in general.

For the special case when A B = B A , we say the matrices are commuta

tive.

(6)

Occasionally it is convenient to partition a matrix into smaller matrices or submatrices. Thus, if

(1.2.17) au a12 «13

A = _«21 a22 «23 _«3i «32 «33.

then a partition of A might be

" « 1 1 «12 «13

A = ^«21 ^{«22 «23}

_«31 «32 «33 _

[ Α21 A2 2J

where the submatrices A n , A1 2, A21 , A2 2 are

«12

«22

= «13]

_«23j

A21 — [ #3 1, <z32], A22 = [#3 3].

(1.2.18)

T h e matrix A is called a supermatrix. Although we shall not use more than the two levels of matrices illustrated here, it is apparent that any number of levels could be used. T h e usual rules of matrix algebra apply at all levels.

1.3 Determinants

A determinant may be associated with any square matrix. Whereas a matrix is an ordered collection of numbers, a determinant represents a quantity having just one value. T h e determinant of a square matrix is the sum of the nl terms, all formed differently, each of which is con

structed of η factors, one and only one factor being chosen from each row and column. N o two factors may come from either the same row or the same column. T h e sign of each term is determined by drawing straigth lines connecting each factor with every other factor in the given term. If the number of these lines from all factors sloping upward to the

(7)

1.3 D E T E R M I N A N T S 7 right is odd, the sign of the term is negative, and if the number of these lines is even, the sign of the term is positive. T h e determinant is represented as follows:

|A| Η Κ · ] | =

«11

«21

(1.3.1)

A s an example, note that

(«11«22«33) — («11«23«32)

= + K2« 2 3 « 3 l ) — («12«21«33) + («13«21«32) — («13«22«3l).

T h e determinant formed from an η by η array of numbers is said to be of order n.

T h e r e are a number of useful theorems that facilitate the evaluation of a determinant:

1. I f all elements of a row or column are zero, the determinant is zero.

2. If all elements of a row or column are multiplied by the same factor, the determinant is multiplied by that factor.

3. T h e interchange of two rows or two columns changes the sign of the determinant but otherwise leaves its value unaltered.

4. Interchanging the rows and columns of a determinant does not change the value of a determinant.

5. I f each element of a row of a determinant is the sum of two terms, then the value of the determinant equals the sum of the values of two determinants, one formed by omission of the first term of each binomial and the other formed by omission of the second term in each binomial.

6. T h e value of a determinant is not altered by adding to the elements of any row a multiple times the corresponding elements of any other row. Likewise, for columns.

T h e proof of these theorems is left to the problems.

T h e Laplace development of a determinant will enable us to find the rule for solving an array of linear equations. T h e minor Mi} of an element atj is the determinant of the matrix formed by deleting the ith row and the ^th column of the original determinant. T h e cofactor C{j

of ai} is then defined by

C„ = (-)«> Μ u. (1.3.2)

(8)

T h e Laplace development of a determinant is then given by

I A I =f,aijCiji (1.3.3a)

3=1

or

I A I = i) « « C « . (l-3.3b)

t= l

In words, the determinant is equal to the sum of the products of the elements in any row or column by their corresponding cofactors. T h e validity of this theorem follows immediately from the definition of the determinant since a{j is just the sum over all terms containing the element .

T h e sum of the products of the elements in any row by the cofactors of corresponding elements in another row is zero:

i) « , A , = 0 (i^k). (1.3.4a)

3 = 1 Similarly for columns

«.-Α* = 0 (k Φ j). (1.3.4b) i=l

T h e proof follows from the observation that the sum (1.3.4) is merely the determinant itself with one of its original rows replaced by another of its original rows. Such a determinant is zero since by T h e o r e m 6 above relating to the evaluation of determinants, we could reduce one of the identical rows to zero by subtracting the other from it. T h e n by T h e o r e m 1 above, the determinant would be zero. A similar development for columns applies.

T h e unknown xk in a set of η linear equations in η unknowns is easily found by multiplying the equations (1.1.5) by Cik , by summing over i from 1 to nf and by use of the relation (1.3.4b) above.

\A\xk=^Cikyi. (1.3.5)

T h i s result is known as Cramer's rule. A solution exists only if

| A | Φ0. (1.3.6)

(9)

1.4 S O L U T I O N O F S I M U L T A N E O U S E Q U A T I O N S 9 Matrices satisfying this last condition are called nonsingular; matrices whose determinants are zero are called singular. W e note that the solution exists and is unique if the number of unknowns equals the number of equations and if the determinant of the matrix formed from the coef

ficients is nonsingular.

T h e product of two determinants | A | and | Β | is equal to the determinant | A B | of the product. T h i s fact is proved in a straight

forward manner. By T h e o r e m 5 for the evaluation of determinants, the determinant of the product can be expanded in nⁿ determinants of the form

a2kfikii a2kfik22 alkpknn a2kJ>knn

\_ânkpkilânkpk₂2 —ânkpknn\

(1.3.7)

where the kx, k2, ... , kn stands for any η values of the subscript j.

Only the determinants in which the values of ally are different contribute to the sum in the expansion of the determinant of the product by Theorems 1, 2, and 6 above. ( I f any two columns are multiples of each other, the determinant is zero.) Therefore, the sum of nⁿ terms in the expansion of the determinant of the product consists of only n\ terms each of the form

a21z.

"IK a2kn

(1.3.8)

in which all kj are different. A n interchange of the columns of the deter

minant shown reduces it to the exact form of | A |. W e have thus n\

terms, all different, in the expansion of | A B | made above, each comprising | A | times one term of | Β | together with the correct sign.

T h u s

I A B I = I A¹ Β I, (1.3.9) as was to be proved.

1.4 Solution of Simultaneous Equations

W e now consider a systematic procedure for solving sets of equations and determine conditions under which solutions do exist. T h e procedure

(10)

to be outlined is called the Gauss reduction. Consider a set of m equations in η unknowns

% al i X j=y i ( t = 1,2, . . . , / « ) , (1.4.1) j = l

or

anxx + al2x2 + . . . + alnxn = yx ,

«21*1 ~Ί~ «22*2 ~t~ *·· ~H «2«*/i —JV2 >

« m l * l + «m2*2 "t" ··· 4~^amn^Xn — J⁷//

(1.4.2)

W e assume the coefficient an Φ 0, otherwise renumber the equations so that we have an axl Φ 0. W e may eliminate the variable χλ from the other m — 1 equations. T o this end divide the first equation by au to obtain

xl + ^ x 2 + ... + ^ x n = ^^]- , (1.4.3a) or

*i + a'\2^x2 + - + « i A = y'i · (1.4.3b)

W e multiply Eq. (1.4.3b) successively by a21, a31 , ... , aml and sub

tract the resultant equations from the second, third, etc. equations of (1.4.2). T h e result is a set of equations of the form

*1 + «12*2 + — +^α'\η^χη = /ι ,

#22*2 + «23*3 ~T" · · · ~\~^a2n^Xn⁼ ^ 2 »

; (1.4.4)

«m2*2 «m3*3 ··· 4~^amn^Xn⁼ J'm ·

W e now divide the second equation of (1.4.4) by a22 and eliminate x2

from the remaining m-2 equations as before. W e continue in this manner to eliminate the unknowns xi . I f m = n, the set of equations takes the form

xx + a'12x2 + ... + a'Vlx.„ = y[ ,

*2 «23*3 4 " · · · 4 " a2nXn⁼ 3^2 >

: (1.4.5)

(11)

1.4 S O L U T I O N O F S I M U L T A N E O U S E Q U A T I O N S 11

I f α"ίη Φ 0, then by back substitution w e may evaluate the xi . I f a"nn = 0 and y\[ = 0, then xn is indeterminate, and w e do not obtain a unique solution. It is easily shown a"Hll — 0 only if | A | — 0. I f a^n = 0 and y7[ Φ 0, then no solution to the equations exists.

T h e results may be generalized for m Φ η. I f m > η the reduction process will lead to a set of equations of the form

x1 + a12x2 + ... + a'lnxn = y{ , X2 + #23X3 ~\~ — ~\~^a2n^Xn — ^ 2 >

(1.4.6)

xn Jn » 0 = j £ u ,

ο

I f the j y ^+l ,jVn+2> ··· j j m ^a ^{e a}^r l l zero, then w e again have a unique solution; the last m-n equations are merely linear combinations of the first η equations. O n the other hand, if any y^ (n <p < m) are not zero, then the equations are inconsistent and no solution exists. In like manner, for m < η the reduction leads to

Xl^al2^X2

X2 H~^a2Z^X3

+ alnxn — yl, + a2nxn = y2 ,

(1.4.7)

Xm ~\~ «rn-+

In this case the variables

Xm+1 >^Xm+2 y y^xn ^m ^v^a be assigned arbitrarily and the remaining xi determined in terms of the arbitrary variables.

Obviously there is not a unique solution in this case.

T h e above results may be expressed in a compact theorem. T o this end w e introduce the concept of the rank of a matrix and define the coefficient and augmented matrices associated with a set of linear equations. Consider the set of equations (1.4.1). T h e coefficient matrix associated with this set of equations is

« 1 2

(1.4.8)

(12)

T h e augmented matrix is defined as the m by η + 1 matrix formed by appending the column matrix [y{] to the coefficient matrix. Thus, the augmented matrix associated with Eq. (1.4.1) is

Vi J<2

(1.4.9)

\_^aml^am2

···

^u^{mv. J\}

T h e rank of a matrix is defined to be the order of the largest nonvanishing determinant contained in the matrix. Obviously, the rank of the coefficient matrix can never exceed the rank of the augmented matrix.

T h e rank of a matrix is unaltered by multiplying all the elements of a row by a constant or by adding a row times a constant to another row of the matrix. T h e result follows from Theorems 2 and 6 relating to the evaluation of determinants.

T h e simple theorem relating to the solution of a system of linear equations may now be stated as follows: a solution to a system of linear equations exists if and only if the ranks of the coefficient and augmented matrices are equal. T h e proof of the theorem follows from the Gauss reduction. T h e ranks of the coefficient matrices of the array of Eqs.

(1.4.2) and (1.4.6) are equal by Theorems 2 and 6. Likewise, the ranks of the augmented matrices are equal. Accordingly, consider Eq. (1.4.6);

the coefficient matrix is of rank n. If any yn φ 0, η < p ^ m, then the augmented matrix is of rank greater than n. But under these circum

stances, no solution exists and hence the theorem follows.

From the previous work, it is also clear that if the common rank, r, of the augmented and coefficient matrices is less than the number, w, of unknowns, then n-r of the unknowns may have their values assigned arbitrarily. In this case, the remaining variables are uniquely determined as linear functions of the n-r unknowns whose values have been arbitrarily chosen.

A special case occurs if all the inhomogeneous terms yi are zero. In this case the coefficient and augmented matrices are always of the same rank, and hence a solution always exists. However, this result is evident, since in this case we have the trivial solution xi = 0. A nontrivial solution will exist only if the rank of the coefficient matrix is less than the number of unknowns, otherwise only the trivial solution exists.

1.5 Special Matrices and Their Properties

T h e r e are many special matrices that are of frequent interest. W e shall assume throughout the chapter that all of the matrices have real ele-

(13)

1.5 SPECIAL M A T R I C E S A N D T H E I R PROPERTIES 13 merits. T h e r e are generalizations of the results to matrices with complex elements, but the generalizations are not of interest for this work. T h e zero matrix 0 is a square matrix, each of whose elements is zero. T h e product of the zero matrix and any other matrix is a zero matrix.

N o t e that if the product of two matrices is zero, we cannot conclude one of the matrices is zero, however. Consider the simple example

[ - ! - ! ] [ ! : ] - Π ·

T h e unit matrix I is a square matrix, each of whose nondiagonal elements is zero, and each of whose diagonal elements is unity:

I = [Su], (1.5.2) where 8H is the Kronecker delta function

8 i i' \ \ , i=j.^{( 1}·⁵·^{3 )}

T h e product of the unit matrix with any other matrix A of the same order is merely A . Further, the unit matrix commutes with any other matrix.

A diagonal matrix is a matrix each of whose nondiagonal elements is zero.

D = (1.5.4) T w o diagonal matrices commute, but a diagonal matrix does not

commute with other matrices in general. A scalar matrix is a diagonal matrix all of whose diagonal elements are equal. A scalar matrix com

mutes with any other matrix.

T h e transpose of a matrix A , denoted A^r, is formed from A by inter

changing rows and columns of A . Therefore

A^T = [ 4 ] = [ « « ] . (1.5.5)

T h e transpose of a product of matrices satisfies the relation

( Α Β )^Γ = B^rA^r, (1.5.6)

a result readily proved. N o t e that

( Α Β ) ' = [X aikbki\^T = α,φΜ] = B^TA^T. (1.5.7) I f A^r = A , then the matrix is said to be symmetric. If A⁷ = — A , then

the matrix is said to be antisymmetric.

(14)

T h e adjoint¹ of a matrix is defined only for square matrices. W e define the adjoint of A , written adj A , as the matrix formed by replacing each element of A by the cofactor of its transpose:

adj A = [adj au] = [C„] , (1.5.8)

where Ci} is the cofactor of the ι jth element of A . It is easily proved that

adj(AB) - (adj B)(adj A ) . (1.5.9) T h e inverse of a matrix A , written A^{- 1}, is a matrix such that

A A -¹ = I. (1.5.10)

N o t e that

Α -^χΑ = I. (1.5.11)

Since

I A I I A -¹! H A A -¹! H I | = 1> (1-5.12) the inverse of a matrix exists only for nonsingular square matrices.

L e t a]l be the jkth element rof the inverse matrix A^{- 1}. T h e n

T o find the elements of A^{- 1}, we recall the Laplace expansion theorem (1.3.3), which can be written

% ^β« ] Χ | = Ι · ( ! ·⁵·^{1 4})

Hence, if

A- i = J * i L , (1.5.16)

Eq. (1.5.13) will be satisfied. T h e uniqueness of A^{- 1} is proved by sup

posing that there were a second inverse, say B . In this case, A ( A^{_ 1} - Β ) = I - I = 0.

1 In this book we have no need to define the Hermitian adjoint, often called merely the adjoint, of a matrix. The Hermitian adjoint and the adjoint are not related.

(15)

1.6 V E C T O R I N T E R P R E T A T I O N 15

N o w , multiply on the left by either inverse to learn that Β = A¹

and the two inverses are identical. It is easily shown that

(1.5.17) T h e inverse matrix is essentially that which has been calculated in Cramer's rule (1.3.5). If

then the matrix A is called orthogonal. N o t e that then

If, for a real matrix A

A x = y, χ = A^{- 1}y - A^T = A -¹,

(1.5.18) (1.5.19) (1.5.20)

(1.5.21) (1.5.22) and

and consequently the determinant of an orthogonal matrix is ± 1.

1.6 Vector Interpretation

Matrix equations may be given a very convenient and useful inter

pretation in terms of vectors and operations among these vectors. Vectors may be interpreted as special cases of matrices of such importance that a special abbreviated notation is used. As we shall see, operations on these vectors may then be given a geometric interpretation. W e recall that a vector² in three dimensions may be written

(1.6.1) where i, j , and k are unit vectors along three mutually perpendicular coordinate axes, and tx , t2 and t3 are the components of t along the various axes. If we define the row matrix Ε as

(1.6.2) Ε = ( i , j , k ) ,

2 W e define a vector here as an ordered collection of η entities, called components, in an w-dimensional space, without implying any particular transformation proper

ties. [A vector is also often defined to be a quantity whose components transform as the coordinates. W e do not use this definition in this book].

(16)

then Eq. (1.6.1) can be written

t = E (1.6.3)

It is usually convenient to assume the underlying coordinate system Ε is fixed throughout the discussion and to denote the vector t as a column matrix

t = (1.6.4)

W e shall adopt this shorthand notation and shall further assume the coordinate system Ε is constructed of mutually orthogonal axes.³

T h e scalar product of two vectors, t and u , in vector analysis is ( t , u ) = txux + t2u2 + tzu3.

In matrix notation the scalar product is ( t , u ) = u^Tt - t^ru ,

(1.6.5)

(1.6.6) where the transpose of a column matrix is a row matrix. Frequently we shall refer to a column matrix as a column vector.

Matrix equations may also be given a useful vector interpretation.

T h e equation

y - A x (1.6.7) is interpreted as a relation between two vectors y and x. In particular,

the matrix A acts as a transformation which transforms the vector χ into another vector y . A n alternative viewpoint is to consider χ and y as the same vector expressed in two different coordinate systems. T h e matrix A then specifies the relation between the components of the vector in the two different coordinate systems. A geometric portrayal of the two different interpretations is given in Figs. 1.6.1 and 1.6.2.

Either interpretation of the equation is found to be useful. For our later purposes, the first viewpoint will be more frequently employed.

T h e concepts of the vector interpretation of matrices may be extended

3 If the coordinate system is not an orthogonal system, the results to be obtained subsequently must be generalized. See Section 1.12.

(17)

1.6 V E C T O R I N T E R P R E T A T I O N 17

F I G . 1.6.2. Geometric view of the matrix equation A x = y considered as a transformation of the coordinate system.

F I G . 1.6.1. Geometric view of the matrix equation A x = y considered as a transformation of a vector.

(18)

to w-dimensional spaces in a straightforward manner. T h e column matrix

χ = (1.6.8)

is interpreted as a vector in η space where xi are components of χ along the ith coordinate axis.

1.7 Matrix Functions and Similarity Transformations

W e can now define under certain conditions a function / ( A ) of a nonsingular matrix A , since both positive and negative integral powers of this matrix are available. I f f(x) may be expanded in a Laurent series so that

/ ( * ) = f ) biX\ (1.7.1)

then

/ ( A ) = b,A\ (1-7.2)

t=-oc

where bi is the coefficient of A*, bi not being a matrix. If A is symmetric, then / ( A ) will be symmetric. W e observe that two functions / and g of the same matrix A commute:

/ ( Α Μ Α ) = * ( Α ) / ( Α ) . (1.7.3) T w o matrices A and Β are called equivalent if and only if they are

related by two nonsingular matrices R and Q as follows:

R A Q = B. (1.7.4) T h e factor R merely causes each new row of Β to be a linear combination

of the original rows of A , and the factor Q merely linearly combines the old columns of A into new columns of B , as follows from the definition of a product. T h e matrix operators R may also exchange rows; the matrix Q may exchange columns. Since these operations leave the rank of a matrix unchanged, A and Β have the same rank.

T h e matrices R and Q that linearly combine the rows or columns of¥

A in a particular way are easily constructed by linearly combining the

(19)

1.7 M A T R I X F U N C T I O N S A N D S I M I L A R I T Y T R A N S F O R M A T I O N S 19 rows or columns, respectively, of the unit matrix in the same way. T h e first and second rows, for example, are interchanged by the nonsingular operator

" 0 1 0 0 ..."

1 0 0 0 . . . 0 0 1 0 . . . 0 0 0 1 . . . '

Again, a multiple C of the second row of A is added to the first row of A by the operator

"1 C 0 0 ..."

0 1 0 0 . . . 0 0 1 0 ... *

T h e matrix is nonsingular. Since exchanging the rows or columns of a matrix and since linearly combining the rows or columns of a matrix do not alter the value of any minor, the matrices R and Q are clearly nonsingular since the unit matrix is.

If R = Q "¹, the transformation is called a similarity transformation:

Β = Q ^ A Q . (1.7.5) If, on the other hand, R = Q^r, the transformation is called a congruence

transformation:

Β - Q^TA Q . (1.7.6)

I f R = Q "¹ = Q^r, so that Q is orthogonal, the transformation is called an orthogonal transformation.

A l l matrix relations are equally valid if all matrices occurring in these relations are subjected to the same similarity transformation. If A B = C

Q^{_ 1}C Q = ( Q ^ A Q X Q^XB Q ) , (1.7.7) and if A + Β = C ,

Q - i C Q = Q * A Q + Q ^ B Q . (1.7.8) Again, suppose we had two vectors x0 and y0 related by

y , = A x0. (1.7.9)

(20)

If we introduce new vectors, χ and y , defined by

x0 = Q x, V o - Qy, (1.7.10)

where Q is nonsingular, then

y ^ Q ^ A Q x ^ B x , (1.7.11) whence we see that the two new vectors, χ and y , are related to each

other exactly like the old ones, x0 and y0 , providing the new and old operators are related by

Β = Q^XA Q . (1.7.12)

If Q be a real orthogonal matrix, then Q satisfies the definition (1.5.20), and the scalar product of two vectors x0 and y0 is given by

y f o = y ^ Q x = y * x (1.7.13) from which we see that the length of a vector is unaltered (i.e., if we let

y0 = x0, then the present result shows that the length of χ equals the length of x0) , and the angle between two original vectors is also un

changed by an orthogonal transformation. Thus, unit vectors which are originally orthogonal will remain orthogonal unit vectors, hence the name orthogonal transformation.

A particularly useful orthogonal transformation is the permutation transformation. A permutation matrix is any matrix for which there is one and only one nonzero element in each row and column of the matrix, and the nonzero element is unity. Thus, the unit matrix is a permutation matrix. If we denote a permutation matrix as P, then a permutation transformation is a similarity (orthogonal) transformation of the form

P A P^r = P A P¹.

A permutation matrix merely interchanges certain rows and columns of a matrix.

T h e trace of a matrix is the sum of the diagonal elements:

T r A = £ « , , · . (1.7.14)

i

T h e trace of the product of two matrices is independent of the order of the factors

T r ( A B ) = J αφ„ = T r ( B A ) . (1.7.15)

i = l j=l

(21)

1.8 I N D E P E N D E N C E A N D O R T H O G O N A L I Z A T I O N O F V E C T O R S 21

T h e trace of a matrix is unaltered by a similarity transformation:

T r ( Q - i A Q ) = Τ ( Q -¹ W Q ) « = % « « = T r A . (1.7.16)

i,j,k=l j=i

1.8 Linear Independence of Vectors and Orthogonalization of Vectors

A n array x^, 1 < i ^ n, of vectors is said to be linearly dependent when

X* A = 0 , (1.8.1)

i = l

where not all bi are zero. W h e n no set of bi exists in which at least one bi differs from zero for which equation (1.8.1) is true, the array of vectors is said to be linearly independent. W e can easily generate a linearly independent array of vectors from a linearly dependent array by discarding all zero vectors (which are not very interesting anyway), by examining each of the remaining vectors one by one, and by keeping only those which are linearly independent of all the vectors already selected. T h e remaining vectors are then linearly related to those selected, since otherwise they would have been selected.

A test for linear independence is readily constructed by observing that the equation (1.8.1) may be regarded as an array of η linear homo

geneous equations in which the components xj{ of the vectors x{ are the coefficients of the unknowns b{:

i ) M n = 0 ( / = 1. · · · . » ) . (1-8.2)

i= l

Indeed, one can associate the jith element xj{ of a matrix with the 7th component of the ith vector, in which case each vector forms one column of a matrix, or with the ith component of the 7th vector, in which case each vector forms one row of a matrix. T h e vectors will then be linearly dependent if and only if nontrivial solutions bi of the array of linear homogeneous equations (1.8.2) exist. By Section 1.4, we have seen that the necessary and sufficient condition for the existence of nontrivial solutions of such equations is that the determinant | xH | of xH vanish.

Consequently, an array of vectors is linearly dependent if and only if the determinant formed from their components vanishes. T h e square of this determinant is called the Gram determinant of the vector array.

If and only if the Gram determinant vanishes, the array of vectors is

(22)

linearly dependent. T h e present test requires only that the components of each vector along the others be known.

T h e condition

x2) (xi x3) ( X l , X n )

x2 2 (x> > x3)

...

( x², x^w) (x* >x²) x²3 (^x3 >^Xn )

(Xfl >x2) (Xn >x3)

...

x²

Xl l ^X12 ^{X 13}

...

xln X21 ^x22 ^23 ^x2n

*31 ^X32 *33

...

^x3n

= 0

Xn2 ^Xn3 ^xn η

= 0.

requires that the components along some arbitrary coordinate system be known.

T h e r e cannot be more than η linearly independent vectors each of which is of dimension n. I f then a space has η dimensions, any vector u can be expanded in terms of any set of η linearly independent vectors.

u = J£ bjXi (1.8.3)

where the bi can be found by Cramer's rule if the equation (1.8.3) be written out in component form. A set of η linearly independent vectors in a space of η dimensions and in terms of which other vectors are expanded is called a basis. A n incomplete set of r vectors is said to be of rank r for evident reasons and to be of defect η — r. A basis is usually chosen to be orthogonal and normal.

Should the basis not be orthogonal, it may be made orthogonal quite easily by the Schmidt procedure, which essentially consists in subtracting the projection of any particular vector on any previously orthogonalized vectors from that particular vector in forming a new vector. Consider the set of vectors x^ which are not orthogonal. T h e first vector of the orthogonal set, say the tt set, is defined by

t i = X i > (1.8.4)

(23)

1.9 E I G E N V A L U E S A N D E I G E N V E C T O R S 23

and the second by

t2 = x 2 - % ^ ^ - . (1.8.5)

T h e vector t2 is orthogonal to tx because any component of x2 that lies along tx has been subtracted from x2. T h e third orthogonal vector t3

is then given by

4- v (^X3 y * l ) + (^x3 y ^2) + /1 o n

*° =

^χ

* - 1 ϊ ^

^ι

' - 7 ^ ί

^ί ²

·

^{( 1}^·⁸^·^{6 )}

T h e remaining vectors of the set tt- are computed in like manner. I f there are as many vectors tt as dimensions of the space, then these vectors form an orthogonal, linearly independent set which span the space, i.e., are such that any arbitrary vector can be expressed in terms of them.

T h e method of orthogonalization cannot fail. Suppose it were to fail.

T h e n some vector t,. would be zero. T h u s , xr would be some linear combination of X j , x2 , ... , x r l contrary to the hypothesis that the original basis was linearly independent. Therefore, all tr must differ from zero. T h e new vectors may now be normalized by dividing them by their own length.

1,9 Eigenvalues and Eigenvectors

T h e transformation applied to a vector by a matrix may conceivably merely lead to a multiple of the original vector.

A x - λχ. (1.9.1) Such a vector χ is called an eigenvector and the multiple λ is called an

eigenvalue. These two concepts are of transcendent importance in theoretical work. T h e r e may be a number of eigenvectors and eigenvalues associated with a particular operator.

W e see that the eigenvector-eigenvalue equation (1.9.1) actually represents a series of linear, homogeneous equations. I n order that there be a nontrivial solution in Section 1.3, we have seen it is necessary and sufficient that

I A — λΐ I = 0. (1.9.2) T h i s equation determines the possible eigenvalues and is called the

characteristic equation. In a space of η dimensions, it is a polynomial equation of order w, which will therefore have η roots. T h e roots will occur in complex conjugate pairs; some of the roots may have the same

(24)

value. T h e number that do is called the multiplicity of the root. If the η roots are distinct, then there are η associated eigenvectors. For repeated roots, there may be less than η eigenvectors.

A similarity transformation does not change the eigenvalues, since the characteristic equation is unaltered.

I Q -¹ A Q _ yi j = I Q- i( A - y I ) Q j

= j

Q -¹1

I

A - yi

I I

Q

I = I

A - yi | = 0.

Therefore, yi — \t if roots of the two polynomials be properly ordered.

Since Eq. (1.9.1) is homogeneous, only the directions of the eigen

vectors are determined. T h e eigenvectors may be multiplied by any arbitrary constant and still be eigenvectors. It is usually convenient to scale the eigenvectors so that they have unit length.

T h e eigenvalues of a real symmetric matrix are real. T o prove the result, let be such that

Α χ , = Xixi. (1.9.4)

Since the characteristic equation is a polynomial with real coefficients, there is also a root \ , which is the complex conjugate of \t . T h e cor

responding eigenvector x{ will have components which are complex conjugate to those of x{ . Therefore, we also have

A x , = λ,χ,. (1.9.5) W e multiply Eq. (1.9.4) by χΤ, Eq. (1.9.5) by xf, subtract and obtain

x[Axt. - χ/Άχ, = (λ, - λ χ χ , . (1.9.6) But

x f A x . = xJA^Tx{ = x J A x . , (1.9.7) the last result since A is symmetric. Equation (1.9.6) becomes

( λ , - λ χ ^ Ο . (1.9.8) T h e quantity

xfx,. = ( xi ( X i) (1.9.9)

is the generalization of the length of a vector for complex components.

Since the elements are complex conjugate, the length is a positive real number. Equation (1.9.8) can be true only if

λ , = λ , , (1.9.10) which proves the theorem.

(25)

1.9 E I G E N V A L U E S A N D E I G E N V E C T O R S 25 T h e eigenvectors associated with eigenvalues of different value of a real symmetric matrix are orthogonal. T o prove this, let λλ, χ λ and λ2, x2

be such that

A^{X l} = λ¹χ^{1 >} (1.9.11)

Α χ² = λ²χ². (1.9.12)

with λχ Φ λ2. W e again multiply by x j and x f respectively and subtract.

W e have

( x * AXl - X lrA x2) = (\ - λ2) χ [Χι = 0. (1.9.13) Since λ1 Φ λ2 , we must have

x j x , = x2rXl = 0. (1.9.14)

I f the eigenvalues of a real symmetric matrix are all distinct, then for each eigenvalue there is an eigenvector which is orthogonal to all of the other eigenvectors. If there are η vectors in all, then these η vectors are complete: that is, the vectors span the w-dimensional space and may therefore be used as a basis. T h e orthogonal basis of eigenvectors is a particularly useful coordinate system for a given problem. A s an example, suppose we desire to study the effect of a transformation A on an arbitrary vector x. I f the eigenvectors of A are the complete orthonormal set e{, then we may expand χ in the form

χ = 2 ) β

Λ (1.9.15)

i

where the a{ are expansion coefficients given by

β, =x^Te{. (1.9.16)

W e then have

A x = A fc£ α&Λ = fliAe, = 2)^flA*< · (1.9.17)

i i i

Hence the operation of multiplying by A merely multiplies the various components of χ by the corresponding eigenvalues. In our later work we shall make frequent use of this result.

In the event that not all the eigenvalues of a real symmetric matrix are distinct, it is still possible to construct a set of complete orthogonal eigenvectors. For any repeated root of multiplicity k, there are k asso

ciated eigenvectors, which may be made orthogonal.⁴

4 The proof of these remarks is simple but detailed. See Reference 1, pp. 59-61.

(26)

A real symmetric matrix A may be transformed into a particularly simple form by a similarity transformation. L e t the components of the eigenvectors be written as column matrices, thus

*<1 ei2

(1.9.18)

L e t the matrix Μ be defined as

Μ = [ ex, e2, . . . , e»] =

^12 ^2!

(1.9.19)

T h e eigenvectors are orthogonal, and we assume they are normalized.

T h e matrix Μ is called the normalized modal matrix. T h e product A M is then

A M =

^1^β11 ^2^e21 ··· ^nenl

\ê12 ^2ê22 ··· ^nên2

Y.\^eln λ2β2η ... ληβη

= M D , (1.9.20)

where

Thus, we have

D =

\ 0 0 λ2

0 0 0 0

(1.9.21)

M^{_ 1}A M = D. (1.9.22)

T h e inversion of Μ is always possible since Μ cannot vanish by the orthogonality and consequent independence of the .

T h e result shows that a real symmetric matrix is similar to a diagonal matrix. T h e process of so transforming a matrix is called diagonalization.

(27)

1.10 N O N S Y M M E T R I C M A T R I C E S 27

It is interesting to note that the similarity transformation used above is also an orthogonal transformation. T o see this, we form the product

eln

e2n ^p12^c22 ^cn2

nnA L^cln^c2n

(1.9.23)

Since the vectors are orthogonal and normalized, w e have

Μ^ΓΜ = I, (1.9.24)

and hence

M^T = M -¹. (1.9.25)

T h e normalized modal matrix is an orthogonal matrix.

1,10 Nonsymmetric Matrices

T h e results of the previous section do not apply in full generality to a nonsymmetric matrix. W e again consider only matrices with real elements.

W e show first that, if the characteristic roots of a square η by η nonsymmetric matrix A are distinct, then there are η linearly independent eigenvectors associated with the matrix. L e t the eigenvalues be denoted

and the eigenvectors as ct-. W e have

Ac,- = X^i (1.10.1)

for i = 1,2,..., n. I f the eigenvectors are linearly dependent, then at least one of the eigenvectors, say en , is a linear combination of the remaining η — 1. T h e n

2N ^i^i · (1.10.2)

N o t all of the at are zero. Applying the operator A to both sides of Eq.

(1.10.2), w e have

η—1 n—1

A en = Anen = ^ atAet = ^ ^A-e,. (1.10.3)

(28)

W e use the expansion (1.10.2) in (1.10.3) to find

n-1 n-1

(1.10.4)

i = l i=l

or

n-1

X <*MK — Κ) = ο. ^(1.10.5) T h u s if the are all distinct, and since not all the a{ are zero, Eq.

(1.10.5) cannot be true. Consequently, the assumption (1.10.2) is invalid.

Even though the eigenvectors are linearly independent, we cannot assume they are orthogonal. In fact they cannot be. O f course, the eigenvectors may be normalized. A nonsymmetric matrix with distinct eigenvalues may be diagonalized by using the modal matrix constructed from the eigenvectors of the matrix. However, since the eigenvectors are not generally orthogonal, the diagonalization is accomplished by a similarity transformation which is not an orthogonal transformation in general.

If the eigenvalues of a nonsymmetric matrix are not all distinct, it may not be possible to find a complete set of eigenvectors, but, never

theless, it is always possible to find a complete set of some other vectors, called principal vectors, which permit some simplification of the original matrix (see Reference 6, pp. 32-36). L e t the matrix A have η roots, Xt, where some of the roots are repeated. L e t X1 be repeated k times. W e can always find one eigenvector e2 such that

W e assume this is the only eigenvector associated with X1 . W e now seek a vector tx satisfying

A n y solution of Eq. (1.10.7) may be chosen orthogonal to e1 (see Problem 17). L e t us assume we have found the vector t j . W e then seek another vector t2 from the relation

T h i s implies that t2 must be orthogonal to tx and furthermore, from Eq. (1.10.7) we have

(1.10.6)

( A - A1I ) t1 = e1. (1.10.7)

( A - Xxl)t2 = t,. (1.10.8)

( A - A1I ) Ha = e1. (1.10.9)

(29)

1.10 N O N S Y M M E T R I C M A T R I C E S 29 Consequently, t2 may also be chosen orthogonal to ex. W e continue generating vectors in sequence of the form

( A - λ,ΐχ = tv.x. (1.10.10)

Each new vector tp will be orthogonal to , tp_2, ... , ex . It can be shown that we can only find k — 1 vectors tp in this manner (see below).

L e t us assume for the moment that X1 is the only repeated root of A . T h u s , for the η — k remaining distinct roots , we have n — k linearly independent eigenvectors et-. W e assert that the set of vectors ex, tx, t2, ... , tk_x, ek+1, ek + 2, ... , en are linearly independent. T h i s follows from the fact that if any of the t, were linear combinations of the ez-, i Φ 1, then we would have

(1.10.11)

i=k+l

But from the definition of t;-, we also have

e3 = ( A - AJJ't, = £ a,(A - λ ^ ' β , , (1.10.12)

which implies ex is a linear combination of the . Therefore, the t;- are linearly independent of the ei 9i Φ 1. Since the set e1, tx, t2, ..., are orthogonal, they are linearly independent of each other. T h u s , the sets e{, tj are linearly independent of each other and constitute a basis.

Since we have η — k + 1 eigenvectors, we see that w e cannot find more than k — 1 independent principal vectors, hence the statement in the preceding paragraph.

T h e particular advantage of the set of vectors so chosen can be seen by constructing the modal matrix. W e again define Μ as

Μ =

«11

«12

*11

(1.10.13)

W e operate on Μ by the matrix A . T h e first column of the product is merely . Similarly, all the eigenvectors are reproduced times their

(30)

corresponding eigenvalue. N o w consider the second column of the product. T h i s is merely

Atx ex + λ Λ . (1.10.14)

For the third column, we have

A t2 = tx + Axt2 (1.10.15) and so forth, for the k — 1 principal vectors. T h e matrix formed from the product A M is then seen to be of the form

A M

^1^12 ^12 ~\~ ^1^12 ^12 "T" ^1^22

_^l^eln^eln H~ ^lhn ^lw + ^1^2n

T h i s product may be factored in the form X1 1 0 0 λ, 1

Xn^enl

λ e

(1.10.16)

A M =

L^cln *ln

λ, (1.10.17)

where the submatrix in λχ is k by k. T h e product is of the form M J , where

j =

1 0 0 . .. 0 0 1⁰ 0 .. 0 "

0

κ

^{1 0 .}.. 0 0 1⁰ 0 .. 0 0 0

κ

^{1 .}.. 0 0 1⁰ 0 .. 0

0 0 0 0 .

. \

1 I ο 0 ... 0 0 0 0 0 . . 0

κ

1⁰ 0 ... 0 0 0 0 0 . . 0 0 λ*+ι<> ·· 0 0 0 0 0 . . 0 0 0 _{λ)Β-2 ··} 0

_0 0 0 0 . . 0 0 0 0 ...

κ.

(1.10.18)

(31)

1.11 G E O M E T R I C I N T E R P R E T A T I O N 31

T h e similarity transformation

M^{_ 1}A M = J (1.10.19)

yields a nearly diagonal matrix J, which is called the Jordan canonical form. N o t e that the matrix has the form of a diagonal matrix for the eigenvectors, while the submatrix for the repeated root contains the eigenvalue X1 along the diagonal and the element unity along the upper subdiagonal.

In the case of a repeated root with more than one eigenvector, the submatrix has a form similar to that below.

A^t 1 0 0 1 0 0 λχ

0 0 0 0 0 0

0 0

0

0 0

1

(1.10.20)

For more than one repeated root, the Jordan canonical form (also called normal f o r m ) is

•J„ 0 ... 0 0 J22 ... 0 J =

0 0 ... J „ J

(1.10.21)

where each of the Ji f is a submatrix in canonical form (for repeated roots) or diagonal (for distinct roots).

It is important to realize that any real matrix may be reduced to the canonical form as above. I f there is a complete set of eigenvectors, the canonical form is diagonal. Otherwise, some of the submatrices contain off-diagonal elements. W e shall find this result leads to considerable simplification of the analysis of later problems.

I f the eigenvalues of a matrix are all greater than zero, the matrix is said to be positive definite. Conversely, a matrix all of whose eigenvalues are less than zero is said to be negative definite.

1.11 Geometric Interpretation

T h e eigenvalue problem can be given a very useful and illustrative setting in terms of the geometry of quadratic surfaces. W e first

(32)

+ - * - + . . . + * L = 1.

« 1 « 2 « 7»

W e note that the equation can be written in matrix form as x^rD x = 1,

where

χ =

and

0 ... 0 0 1/4 ... 0

0 0 1/4..

(1.11.1)

(1.11.2)

(1.11.3)

(1.11.4)

T h i s result suggests that there is some intimate relation between the quadratic equation (1.11.1) and the diagonalization of matrices. T o see this relation, consider a real symmetric matrix A and the quadratic form x^rA x . T h e quadratic form can be written

x⁷A x

+ Λ2 1#2# 1 "Γ" #22^2 ~t~. · · ·^a2n^x2^Xn

• · • (1.11.5)

-f- dn\XnX\ -f" Q>n2^Xn^x2 "Γ" · · ·^ann^xn ·

I f we set x^rA x = 1, then the equation represents a general second- order surface. T h e normal, N , to the surface

f(^Xl ι^X2 » ··· >^Xn) — 1 (1.11.6) consider the equation of a quadratic surface in η-dimensional space

(33)

1.11 G E O M E T R I C I N T E R P R E T A T I O N 33 is given b y⁵

Ν = dxx

JL

dx9

L 8x„

( 1 . 1 1 . 7 )

T h e normal to the surface x^rA x is thus

Ν = 2Ax, ( 1 . 1 1 . 8 ) which follows from the symmetry of A .

T h e principal axes of a quadratic surface are defined as the directions at which the normal vector is parallel to the radius vector. T h u s , a principal χ axis is a direction such that

£x = N, (1.11.9)

where β is some constant. Consequently, the principal axes satisfy the equation

Α χ = λ χ . ( 1 . 1 1 . 1 0 ) T h e principal axes are particularly useful since the equation of the

quadratic surface expressed in terms of the principal axes contains only a sum of squares. T h e eigenvectors of the matrix A are seen to be just the principal axes of the quadratic surface. I f we transform the matrix A by the modal matrix, say M , then w e find

A ' = M^{_ 1}A M = Λ, ( 1 . 1 1 . 1 1 )

5 This relation may b e proved by noting that Eq. (1.11.6) implies that

i =1 dXi dt

where the Xi are assumed to be functions of some parameter t. Since the tangOnt to the surface is proportional to the vector [dxt/dt], the normal must be proportional to the vector [df/dx{].

(34)

and the quadratic form

χ^ΓΑ χ = χ'^ΓΛ χ ' = 1, (1.11.12)

which is just the form of Eq. (1.11.2). Notice that the expanded equation is

T h e eigenvalues are equal to the reciprocal of the square of the length of the principal axes.

T h e occurrence of repeated roots can be interpreted in this geometric view. I f two roots are equal, then the quadratic surface has rotational symmetry about the axes orthogonal to the eigenvectors of the repeated root. A zero root implies the quadratic surface lies in a space orthogonal to the given direction.

For a real symmetric matrix, w e have shown that the eigenvectors form a set of mutually orthogonal vectors. T h e eigenvectors are a convenient basis for the space of the problem. I n the case of a nonsymmetric matrix, the eigenvectors may not be mutually orthogonal however. It is con

venient, in this case, to generate a second set of vectors which are not orthogonal amongst themselves, but are orthogonal with respect to the original set of vectors. Such relationships are known as biorthogonality relationships.

T h e importance of such relationships can be seen from the following simple example. Consider a vector χ in two-dimensional space, as shown in Fig. 1.12.1.

x^rA x = λλχ'ι + A2#2² + ... + ληχ'η = 1. (1.11.13)

1.12 Biorthogonal Vectors

j

X

F I G . 1.12.1. Vector χ in the orthogonal coordinate system i, j .