Definitions and

(1)

A P P E N D I X I

Some Definitions and Theorems of Linear Algebra

This appendix presents some fundamental definitions and theorems of linear algebra with particular reference to finite-dimensional vector spaces. T h e subject of linear algebra is quite extensive, and the brief discussion given here is by no means complete. Further details may be found in mathematical texts devoted to the subject; especially recom- mended are:

1. P. R. Haimos, "Finite-Dimensional Vector Spaces," 2nd ed.

Van Nostrand, Princeton, N e w Jersey, 1958.

2. G. D . Birkhoff and S. MacLane, "A Survey of Modern Algebra," rev.

ed. Macmillan, N e w York, 1953.

1 . Mathematical Notation

T h e mathematical symbols most frequently used in the text are given below, where M = (M^) denotes an η χ η matrix.

* complex conjugation of a number or matrix

~ transpose of a matrix: M = (MH)

+ complex conjugation and transposition: Mf = (M*) det M determinant of M

tr M trace of M

dim V dimension of the vector space V

dim M number of rows or columns in the (square) matrix M

[Ay B] commutator of operators or matrices A,B\ [A, B] = AB — ΒΑ 8^i:j Kronecker delta: = 1, for i = j \ 8^{j = 0, for i Φ j

441

(2)

448 A P P E N D I X I

i d e n t i t y o p e r a t o r o r i d e n t i t y m a t r i x : lio = Si3- i d e n t i t y o p e r a t o r o f a s y m m e t r y g r o u p i d e n t i t y o p e r a t o r o f a n a b s t r a c t g r o u p d i r e c t s u m o f m a t r i c e s A , Β

K r o n e c k e r p r o d u c t o f o p e r a t o r s o r m a t r i c e s A , Β d u a l o f t h e v e c t o r χ

h e r m i t i a n s c a l a r p r o d u c t o f t h e v e c t o r s χ a n d y

h e r m i t i a n s c a l a r p r o d u c t o f χ a n d y i n t h e D i r a c n o t a t i o n

i n v e r s e m a t r i x o f M: MM'1 = Μ~ΧΜ = 1

(

^{ti\ η !}ι I = Ί 7ΤΓ7Τ ? 0! = Ξ 1

kl (n — k)\k\

T h e following nomenclature is used to describe certain types of matrices or operators:

Matrix Defining condition

S y m m e t r i c M = M

S k e w - s y m m e t r i c M = -M

O r t h o g o n a l M-¹ = M

H e r m i t i a n M = M^f

S k e w - h e r m i t i a n M = -M'

U n i t a r y M-¹ = M'

S i n g u l a r d e t M = 0

N o n s i n g u l a r d e t M

2 . Vector Spaces

Definition. A vector space over the complex numbers is a set of elements V, called vectors, together with two laws of composition, called vector addition and scalar multiplication defined as follows.

Vector Addition

T o every pair of vectors x, y there is associated a unique element of V called the sum of χ and y, and denoted χ + y. Vector addition satisfies the following axioms:

( A l ) x -f y = y + χ

(A2) χ + (y + ζ) = (χ + y) + ζ

(A3) V contains a unique vector 0, called the zero vector, such that for every χ in V,

χ + 0 = χ

(3)

D E F I N I T I O N S A N D T H E O R E M S O F L I N E A R A L G E B R A 449 (A4) For every vector χ there exists a unique vector — χ such that

Scalar Multiplication

T o every vector χ and every complex number c there corresponds a vector of V called a scalar multiple of x⁹ and denoted ex. Scalar multi- plication satisfies the following axioms:

(A5) c(c'x) = (cc)x (A6) Ix = χ

(A7) c(x + y) = cx + (A8) (c + = cx + ^

Axioms ( A l ) through (A4) assert that V is an additive abelian group.1

T h e zero vector is the additive identity, and (—x) the additive inverse.

T h e complex numbers are explicitly introduced in the remaining axioms. Axiom (A5) expresses the associativity of scalar multiplication, (A6) displays 1 -f- 0 · i as the scalar identity; (A7) and (A8) express the distributivity of scalar multiplication with respect to vector and scalar addition.

It is worth emphasizing that both laws of composition preserve the vector character of the elements of V and are closed operations; that is, if χ and y are in V, then χ + y and cx are also in V. Furthermore, the symbol 0 has been used to denote the zero vector and the complex number 0 + 0 · z, so that some care must be exercised in reading equa- tions. T h e context will always indicate whether 0 means the zero vector or the scalar zero.

An important illustration of a vector space is provided by the set of all rc-dimensional column vectors (i.e., η X I matrices) with complex entries:

Addition and scalar multiplication of column vectors are defined by χ + (—χ) = 0

1 Cf. C h a p t e r 8.

(4)

450 APPENDIX I

with

0 =

T h e set of all η X 1 column vectors is said to be a representation of an abstract w-dimensional vector space over the complex numbers.

Henceforth, column vectors (η X 1 matrices) will be written as n-tuples

Linear Combinations

Suppose that V contains the r vectors χ^λ , x² , x^r . From the definition of a vector space, it follows that V also contains c¹ Xl , C2X2 ,

and c¹x¹ + c²x² , where c^x and c² are arbitrary complex numbers. B u t if V contains c¹x¹ + c²x² , it must also contain (c^xx^x -f- c²x²) + £3^3 = c¹x¹ + (c²x² + T h e parentheses in the last equation can be omitted, since the commutative and associative laws for vector addition show that the sum is independent of the ordering of the vectors or the manner in which the vectors are grouped. By induction, V also contains

A vector χ expressible in this form is said to be a linear combination of the r vectors xx , x2 , xr = {xr}.

A subspace S of a vector space F is a nonempty2 subset of V which is itself a vector space. For example, in the real three-dimensional vector space of analytic geometry, a line through the origin represents a one- dimensional subspace, a plane through the origin a two-dimensional subspace. T h e subspaces defined by lines and planes must pass through the origin—the origin repn ?nts the zero vector and every subspace must contain the zero vector.

r

Subspaces

2 A n o n e m p t y set is a set c o n t a i n i n g at least o n e e l e m e n t .

(5)

D E F I N I T I O N S AND THEOREMS OF LINEAR ALGEBRA 451 Any subset of V is a subspace if it contains cx + c'y whenever it contains χ and y. T h e zero vector alone3 and V itself are subspaces of V.

Theorem. T h e set of all linear combinations of an arbitrary collection of vectors { x j of V is a subspace of V.

Proof. If χ = cixⁱ, and χ = Σ* ^X* » t n en

έΓΛΤ +

£ V =

^{^ CCiXi}

+ 2}

^{c'Ci'Xi =}

2) («\· + * VK

i i ζ

is also a linear combination of the xi.

T h e subspace generated by all linear combinations of the xi is said to be spanned by the set Note that the given set of vectors is not required to be a subspace, and that every vector in the subspace spanned by {#J is a linear combination of the xi .

Linear Independence

T h e m vectors x1, x², x^m are said to be linearly independent if the only scalars for which + c2x² + ··· + c^mx^m — 0 are c^x — ^2 = ··· = £m = 0. If the m scalars q are not all zero, the given vectors are said to be linearly dependent.

For example, the vectors ( - 1 , - 1 , - 1 ) , (1, 0, 0), (0, 1, 0), (0, 0, 1), are linearly dependent, since their sum is the zero vector. On the other hand, the vectors (—2, 1,0) and (1, 3, 2) are linearly independent. For if it is assumed that these vectors are linearly dependent, then

Cl(-2,

l,0)+<r

^a

(l,3,

2) = ( 0 , 0 , 0 ) . This vector equation may be expanded to give

—2c± + c2 = 0, cx + 3^2 = 0, 2c2 = 0,

whose solution is c^x = c² = 0. This establishes the independence of the given vectors.

Theorem. T h e nonzero vectors x¹ , x² , x^m are linearly dependent if and only if one of these vectors is a linear combination of the preceding vectors.

3 If an e l e m e n t is selected from a given set, t h e set is n o t c o n s i d e r e d as h a v i n g b e e n d e p l e t e d of t h a t e l e m e n t ; all e l e m e n t s are available as often as desired.

(6)

452 APPENDIX I

Proof. Suppose that one of the vectors, say xk (Ä ^ 1), is a linear combination of the preceding vectors,

Jfc-l

XK — IXI · C I=L

If k = m, then

( l)xm + cm_1xm_1 -f- · · · -f- c1x1 = 0.

If 1 < k < m^y one can write

0 · xm + 0 · + ··· + ( — \)xh + c^x^ + ··· + cxxx = 0.

In either case not all ci are zero, so that the vectors are linearly dependent.

Conversely, if the vectors are linearly dependent, at least one cⁱ Φ 0, say c^k , so that

XK = (CKL K - L )C XK - L + ·'* + (CKLL )CXL ·

Hence xk is a linear combination of the preceding vectors.

Basis and Dimension

A basis of a vector space V is a linearly independent subset of V which spans the whole space. If the number of vectors in the basis is finite,

V is said to be a finite-dimensional vector space. T h e number of vectors in a basis for a finite-dimensional vector space V is called the dimension of V and denoted dim V.

These definitions are illustrated by the η vectors

ex = (1, 0, 0), e2 = (0, 1, 0), en = (0, 0, 1), which provide a basis for the w-dimensional vector space of w-tuples.

T h e ei span the whole space, since an arbitrary w-tuple χ = (ξλ, ξ2 , ..., ξη) may be expressed as

X = IL^L

~h

^{^2}^E²

^~\~ ^"' ^~\~ ^{£n^n} ^·

Moreover, the ei are linearly independent, since c^xe^x + c²e² + ··· + cⁿeⁿ = 0 implies that

(c¹,c^2y ...,cⁿ) = (0, 0, ...,0),

which is equivalent to c¹ = c² = ··· = cⁿ = 0. T h u s the vector space of w-tuples is w-dimensional.

(7)

D E F I N I T I O N S AND THEOREMS OF L I N E A R ALGEBRA 453 It can be s h o w n4 that if dim V = n, (1) any basis for V contains exactly η elements, (2) any set containing more than η vectors is a linearly dependent set.

Theorem. If { # J i s a basis for an w-dimensional vector space then every vector χ in V can be uniquely expressed as

η

Proof. T h e η + 1 vectors {x1, x², ..., xⁿ , x] are linearly dependent, so that the given expression is just the dependence relation solved for x.

Uniqueness follows from the independence of the xi . For if χ = Σ >

then χ — # = 0 = Σ — èi)x% · T h e independence of the xi demands that ξί — ξ/ = 0, for all i.

T h e ξ^ί are called the components of χ relative to the basis {ffj. In general, the components of a given vector are different relative to different bases. T o illustrate the point, consider the vector * = ( f i , f2 > £ 3 ) · Relative to the basis { ( 1 , 0 , 0 ) , (0, 1,0), ( 0 , 0 , 1)}, the components of χ are ξ¹, ξ², and £³ . If the basis is {(1, 0, 0), (1, 1, 0), (1, 1, 1)}, the components of χ are = ξ^λ — ξ², ξ² = ξ² — ξ^{3 y} and ξ³' = ξ³ .

Dual Spaces Let

denote a vector in an rc-dimensional vector space V, and xf the vector obtained from χ by taking the complex conjugate of every component of χ and then applying the rule for the transposition of matrices:

Î 2 * - i n * ) -

Evidently, ff+ is a 1 Χ η matrix or row vector, distinguishable from the w-tuple notation for η X 1 matrices through the omission of commas.

T h e vector ff⁺ is of an entirely different nature than the vector χ from which it was generated. For although x^f has η components, it cannot be

4 Reference 1, p . 13.

(8)

454 A P P E N D I X I

added to x. However, one can add Λ;¹ to a row vector generated from an element^ in V. T h e vectors #⁺ andjy⁺ are elements in an w-dimensional vector space V^f called the dual of V\ x^f and y^f are called the vectors dual to χ and y. Note that if ζ = c^xx -f c²y> z* = c^x* -f- ^ 2 * ^+> O S T T N A TE N

relation⁵ between V and V^f is not linear but antilinear. However, since (x'Y = x, ⁽ ^v ^y = V.

Let {#J be a basis for V containing η elements. Any vector in V may be expressed in the form

η

X CjX^ . i = l

Since χ is arbitrary, so is x^fy and the last equation yields

η

T h u s the vectors {x^} span V\ If A? = 0, then x* = 0⁺ = (0 0 0 · " 0), and

η η

2) = ο, Σ € * χ * * = 0+·

N o w the ^ are linearly independent, so that all cⁱ = 0. But this implies that all = 0, so that the x^tf are also linearly independent. T h u s if fa} is a basis for V> then {x^} is a basis for F⁺ and dim V = dim F⁺. Scalar Products

In a real three-dimensional space, the scalar or inner product of two vectors a and b is defined by

(a, b) = a · b ^{= I} a ^{I I} b ^I

^cos^{0 .}

This scalar is proportional to the lengths of the two vectors and to the cosine of the angle between the vectors. In particular, if | a |, | b | are nonzero, then a · b = 0 implies that the vectors are orthogonal. If a = b, then θ — 0, and a · b = | a |2 is the square of the length of a.

However, this definition of the scalar product of vectors is inadequate for complex vector spaces. Consider, for example, the vector χ = (1, /).

In the complex plane χ is represented by a line segment of length A/2

5 F o r a m o r e precise description of t h e relations b e t w e e n V, F+, a n d ( F+)+, see Reference 1, p . 24.

(9)

D E F I N I T I O N S A N D T H E O R E M S O F L I N E A R A L G E B R A 455

**(I) - *=(,';)·**

by (x^yy) = ξχ*ηχ +^Ç²*V2 y o ne obtains (ff, x) = 1 * · 1 + i* · i = 2,

and I χ | = (ff, Λ : )1/2 = \ / 2 .

T h e preceding geometric considerations lead to the following definition of the scalar product of two ^-dimensional vectors with complex com- ponents:

(X,Y) =

F +

^Ί^{*^2}²

H H

^Sⁿ^*Vn^·

It follows that

(*»:y) = Cy>*)*>

(ff, CY + C'Y') = φ?, J ) + £'(*,/),

(cff + C'X\Y) = c*(ff,jy) + j ) , (Λ;, ff) > 0.

T h e definition shows that (ff, ff) is a real number, and that (ff, ff) = 0 if and only if χ = 0. These properties make the interpretation of I ff I = (ff, ff)¹/² as the (generalized) length of a complex vector plausible.

If (ff, y) = 0, χ and y are said to be orthogonal.

A vector space over the complex numbers together with the above definition of the hermitian scalar product is called a unitary space.

A basis {ff J for a unitary space is said to constitute an orthonormal basis if {^XI )^XJ)⁼ &IJ^y

where

(0 if^{ι φ ) .}

If the vectors of a given basis do not have unit length, they can be scaled to unit length through division by | χ |. Moreover, an arbitrary basis can always be converted to an orthonormal basis by the Gram- Schmidt orthogonalization process.⁶ T h e following discussion will be restricted to orthonormal bases.

T h e scalar product can be connected with the dual space of V by noting that

**y = (FI* £2* - ίη*) Π;2 1 = X ti*Vi = (XYY)y

6 Reference 1, p . 128; Reference 2, p . 192.

whose polar angle is π/4. T h e above formula for the scalar product gives

#) = 1 · 1 + ί · Î = 0. On the other hand, if one defines the hermitian scalar product of

(10)

456 A P P E N D I X I

where the row and column matrices are multiplied according to the rule for matrix multiplication. T h u s the scalar product may be interpreted as a scalar-valued function7 of a vector in V and a vector in V*.

3 . Linear Transformations

Definition. A linear transformation8 Γ of a vector space into itself is a law of correspondence which assigns to every vector χ in V a vector Tx in V, called the image or transform of x, in such a way that the image of cx + c'y is

T(cx + c'y) = cTx + c'Ty,

for all vectors χ and y and all scalars c and c . This définition is intrinsic, in the sense that no reference is made to a particular basis of V.

A linear transformation is defined when its effect on every element of V is known.

Examples

1. Let 0 be the transformation which transforms every element of V into the zero vector:

Ox = 0, for all x.

2. Let 1 be the transformation defined by Ix = x, for all x.

3. Let Κ be the operator which multiplies every vector in Γ by a fixed scalar k:

Kx = kx, for all x.

In each case the operator is unambiguously defined for every vector in V.

T h e operator / is called the identity operator, since every vector is its own image; 0 is called the zero operator, and Κ is called a scalar operator.

A very important law of composition is defined by the equation (AB)x = A(Bx), for all x,

where A and Β are arbitrary linear transformations. T h e operator AB is called the product of A and B.

7 Reference 1, p p . 1 3 0 - 1 3 3 .

8 A linear t r a n s f o r m a t i o n is often called an operator, or, m o r e precisely, a linear operator.

(11)

D E F I N I T I O N S A N D T H E O R E M S O F L I N E A R A L G E B R A 457 T h e difference of the two possible products of A and Β is called the commutator of A and B, and denoted [A, B]:

[A, B] = AB ΒΑ.

If [A, B] = 0, A and Β are said to commute. Any two of the operators 0, 7, and Κ commute. In fact, these operators commute with every operator A.

When working with multiple products of noncommuting operators, it is necessary to preserve the order in which the operators appear. On the other hand, since [A, A] = 0, multiple products of a single operator can be manipulated like ordinary numbers. Such products are unam- biguous and it is customary to write

A = A\ A² = AA, AAA = A³,... . T h e zeroth power of an operator is defined as A0 — 7.

An important class of linear transformations consists of the so-called nonsingular transformations. A linear transformation Τ is said to be nonsingular if:

(1) Every vector y is the image of at least one pre-image x; that is, given any vector y, a vector χ exists for which

y = Tx.

(2) If χ and x' are distinct vectors, then Tx and Tx' are also distinct or, equivalently,

Tx = Tx' implies that χ = x'.

When these conditions are satisfied, there exists a linear operator called the inverse of T, with the properties

ΤΤ~^λ = Τ¹ Τ = J

If a transformation does not meet requirements (1) and (2), it is said to be singular.

T h e operator Κ = kl possesses an inverse K~x — &_ 17, provided that k Φ 0. T h e zero operator is singular since it violates (1) and (2).

The inverse of the inverse operator is the original operator

the inverse of a product of nonsingular operators is equal to the product of the inverses taken in reversed order:

(Ax A2... An) 1 — An 1 An\ ... Αλ \

(12)

458 A P P E N D I X I

Matrix Representation of a Linear Transformation

Let Τ denote a linear transformation of a finite-dimensional vector space V into itself and [χλ , xn} a basis for V. An arbitrary vector χ may be expressed as

X = »

3

where the are the components of χ relative to {xj}. T h e transform of χ is

3

and Tx will be determined when the transforms of the basis vectors are known. But the TXJ are vectors in V, so that

TXJ = X Tkixk (j = 1, 2, n).

k

T h e T^kj are equal to the n² scalar products

(x^k , TXJ) = ^ T^lj(x^k , x^t) Tu aw = T^KJ,

ι ι

which can be compounded into an η Χ η matrix

(Tki)

(Tn - Tln\

\ Τγιΐ * " " T^NNJ

whose jth column is made up of the components of Txj . This matrix is said to represent Τ relative to the basis {#J. If a different basis is chosen for V, the matrix of Τ in the new basis will, in general, be different from the matrix given above.

T h e operators 0, 7, and Κ have very simple matrix representations.

By the definition of K, it follows that

Kxj = kxj = 0 · + 0 · x² + · · * + kxj + · · · + 0 · xⁿ , so that

Ik 0 0 ···

0 A 0 - (K^jk) = k(8^jk) = I 0 0 h -

Ώ Ό Ό ···

(13)

D E F I N I T I O N S AND THEOREMS OF L I N E A R A L G E B R A 459 T h e matrices for the zero operator and the identity operator are obtained from (Ky) by setting k = 0 and k = 1. The matrices of K, 7, and 0 are exceptional, since they do not change when the basis is changed.

For an example of an operator whose matrix does change with the basis, consider the operator 7? which subjects all vectors in the real cartesian plane to a reflection in the line ξ1 = ξ2 . Under such a reflection, the vector + ξ2β2 —>• ξ2β1 + ξχβ2 , while all vectors with ξ1 = ξ2 are unchanged; hence

Re1 = e2 = 0 · e1 + 1 · e2 ,

R 2E =^ei = 1 ' ei + 0 ' E2 »

and

(*

^W

) = (J J)-

However, if the basis is {(l/\/2)(e¹ +

(^IV^)(

^e

i— ^2)}»

^{t n e n}

T h e matrix associated with the product of two linear transformations S and Τ may be deduced by applying their product to a typical basis vector Xj . Since STXJ = S(Txj), and

TXJ = TRJXR y Sx^r = S^^rx^,

r ζ

it follows that

STXJ =

^[/(Σ ^S

^IR

^T

^r

^{^j}

^x^{

^{= ^}

^{( S}^ I J X I ,

i r «

where

= ^Xs

^IR

^T

^RI

(ij=

1,2,...,«).

r

T h e last equation is the well-known rule for matrix elements of the product matrix S Τ in terms of the matrix elements of S and T. Hence the matrix representative of the product of two linear transformations is equal to the matrix product of the matrices representing the linear transformations.

If Τ is a nonsingular operator, the matrix representing Τ must also be nonsingular; hence9 det Τ Φ 0 (Τ nonsingular), where det denotes the determinant of T. On the other hand, det Τ = 0, whenever Τ is singular.

(14)

460 A P P E N D I X I

which is the matrix representation of the operator equation χ — Tx.

Adjoints

T h e adjoint of a matrix is the matrix obtained by taking the complex conjugate of every element in the matrix and then interchanging rows and columns:

A' = Ä*.

It follows that (^4+)+ = A, and from the rule for matrix multiplication that

(AB)' = B*A\

Applying this rule to the matrix representation of x' = Tx one obtains

(?* ... C) = (έ*

or

(x'Y = (xj

IT*

' 1 11

\T* nul

= (Τχγ = χ τ .

9 Reference 2, p p . 3 0 4 - 3 0 5 .

T h e determinant of a product of two matrices may be computed by the rule⁹

det(.ST) = (det 5)(det T) = det TS,

which shows that a product of nonsingular matrices is also nonsingular.

T h e components of χ = Tx relative to the basis {#J may be obtained by equating the coefficients of xk in the expression

x = 2) £k^Xk = Σ Σ TkjÇj^xk - k k j It follows that

j

If x and x are written as column vectors, the last equation can be written

(15)

D E F I N I T I O N S AND THEOREMS OF LINEAR ALGEBRA 461

(*, Ty) = x^Ty = g fΛ , Χ Σ Λ * * ) = Χ % ξ+Τίό

I J K I J

- » · · · • M y = M ; i-

\ * ni " *

¹

nn/ ^WNF

But a^Tjy can also be written (T⁺JC)+J, so that one has the important relations

(a?, 7 » = = (y, If 7¹ = ^4£, then

(x, ABy) = (A% By) = (£+,4+*, v).

T h e term adjoint of a matrix is also used in connection with an algorithm for computing the inverse of a nonsingular matrix. Let (a^) denote an η χ η nonsingular matrix. T h e cofactor of the element a^ is defined as

cofactor of aⁱ⁵ = Aⁱ⁵ = ( — l)^i+j det M^io,

where Mⁱ³- is the η — 1 Χ η — 1 matrix obtained from A by deleting the ith row andjth column. T h e transpose of the η Χ η matrix of cofactors is called the adjoint of

adj(fltf) = (Au), and the inverse of is

v d e t ( ^ )

In the following discussion the term "adjoint" will always refer to the operator in the dual space.

Normal Operators

Operators are often classified according to some specific relationship with their adjoints. A linear transformation is said to be normal if it commutes with its adjoint:

NN* - iVW = 0.

Thus, for every linear transformation represented by a matrix there is a corresponding linear transformation T⁺ on the dual space represented by the matrix adjoint to (T^).

Let χ and y denote arbitrary vectors such that

χ = Σ toi > y = Σ

⁷

*!*** ·* I I

By the definition of a scalar product,

(16)

462 A P P E N D I X I

This condition is certainly satisfied if TV = iV⁺, and such operators are said to be hermitian or self-adjoint. If Ν is nonsingular, then Ν commutes with TV^{- 1}, so that an operator satisfying iV⁺ = N~^x is normal. Operators of this type are said to be unitary. If the base field of the vector space consists of the real numbers, rather than the complex numbers, the complex conjugation included in the symbol f becomes unnecessary and only the transpose operation remains. Hermitian and unitary operators then become symmetric and orthogonal operators, respectively.

Unitary operators on an inner product space preserve inner products.

For if χ and y are subjected to a unitary transformation U, the scalar product of the transformed vectors is

(£/*, Uy) = (UWx.y) = (U'1 Ux,y) = (x,y),

by the unitary property t /+ = t /- 1. For a real three-dimensional space, this means that the lengths and relative orientations of vectors are unchanged by an orthogonal transformation.

Similarity Transformations

Let Τ denote any linear transformation on an w-dimensional vector space, and let {x{} and { y j be two bases for V, related by the nonsingular linear transformation A:

y% = ⁱ^A (* = 1, 2, n). ^x

These equations define a change of basis {#J - > { y j , which induces a transformation¹⁰ of all operators defined on V. In particular, the operator

Τ is transformed into

Τ = ATA-1.

A transformation of this form is called a similarity transformation.

T h e matrix representing Τ also undergoes a similarity transformation¹⁰ by the matrix representing A:

It should be noted that the matrix of T' is referred to the j ; basis, whereas the matrices for Τ and A are referred to the χ basis.

Similarity transformations preserve many important properties of linear transformations. For example, the determinant of a matrix is unaltered by a similarity transformation, since

det Τ = det A-^TA = det AA~XT = det T.

1 0 Reference 1, p . 84.

(17)

D E F I N I T I O N S AND THEOREMS OF LINEAR ALGEBRA 463 Another invariant is the trace of a (finite square) matrix, defined as the sum of the diagonal elements of the matrix:

trT= £R„.

i

T h e invariance of tr Τ under a similarity transformation can be deduced from the fact that the trace of AB is equal to the trace of Β A :

tr AB = %(AB)ii = ^XAikBki = X(BA)kk = tr BA.

I I K K

It follows that

tr Τ = tr A-^TA = tr AA~^XT = tr T.

Direct Sums

An η Χ η matrix A is said to be the direct sum of the n^x X nx matrix Ax and the n2 X n2 matrix A2 if A has the form

A^x 0

*-Q X)·

where the zero in the upper right-hand corner represents an nx X n2

zero matrix, and the zero in the lower left-hand corner an n2 X nx zero matrix. T h e matrix for A is denoted

A=A¹®A².

More generally, an η Χ η matrix is said to be the direct sum of the square matrices A^x, A², A^N if all elements of A not contained in some Aⁱ vanish. When this is the case,

Α =Α1@ΑΛ@-®ΑΝ

and

I=l

It is not difficult to show that if A = Ax © ··· © A^N, then

Ν Ν

tr A =

2)

^{tr A}^{^{, det A =}

JJ

^{det A}^t^.

I=l I=l

AB

= © ^

²

5

²

© ··· © .

(18)

464 A P P E N D I X I

4 . Diagonalization of Commuting Matrices Schur's Theorem

Let S be any linear transformation on an η-dimensional unitary space, and (Si:j) the matrix of S relative to an orthonormal basis Further- more, let U denote a unitary operator, represented by the matrix (U^) relative to which sends the orginal basis into a new basis { v j . T h e change of basis {χ^{} —> {yJ = {UXJ} induces similarity transformations of S and (S^if) with U and (U^4j):

USW = 7¹, (1)

( £ / « )+( 5 « ) ( Ε / « ) = ( Γ „ ) . ( 2 )

For an arbitrary choice of £/, the matrix (T^id) will not, in general, be any simpler than (S^). However, according to an important theorem^{1 1} of I. Schur, there exists, for any S, a unitary transformation U such that the matrix for Τ is triangular:

In this matrix, which is called the Schur canonical form, all elements below the principal diagonal are zero (T^ = 0, for j < i). T h e elements along the principal diagonal (Xk = T^kk) are called the eigenvalues of S.

It follows that

(1) t r S = t r T = the sum of the eigenvalues.

(2) det S = det Τ = the product of the eigenvalues.

T h e eigenvalues of a linear transformation S are associated with the eigenvectors of S, defined as those nonzero vectors uk satisfying

Suk = Xkuk . (4)

T h e scalar is said to be the eigenvalue associated with or belonging to the eigenvector uk . It is possible that Xk may belong to gk linearly independent eigenvectors. If g^k > 1, the eigenvalue X^k is said to be

^ - f o l d degenerate.

1 1 F o r a proof of this t h e o r e m , see Reference 1, p . 144.

(19)

D E F I N I T I O N S AND THEOREMS OF LINEAR ALGEBRA 465

(5)

as one may verify by operating with (T^) on the y^k . T h u s the eigen- vectors of 5 constitute an orthonormal set—in fact, a basis for the rc-dimensional space.

T h e eigenvectors of S relative to the χ basis may be obtained by applying (2) to a generic eigenvector y^k :

(UnYiS^U^y, = (T^ij)y^k = X^ky^k . Multiplying from the left with (E/#), one obtains

which shows that the eigenvectors of S in the χ basis are

(yk)x = (Ua)yk . (6) It follows, from (5) and (6), that the (y^k)^x are given by the η columns of

the matrix (U^). T h u s (4) in the χ basis is S¹² ' " &1η\

^ 2 1 $22 $2n I u^a

S ni Sn2 - s

j

^—^Xh ^2k

or, more concisely,

[(S«)-A*/]C/<» = 0 ,

(7)

(8) where U^{k) is the Ath column of (£/#).

T h e eigenvectors of S are, by definition, nonzero, so that [(Si;j) — Xkl]

must be singular.¹² This requires that

det[(S„) - Xkl]

S11 ^k

$22 ^k

Snn ^k

0. (9)

1 2 If it is a s s u m e d t h a t — Xkl] is n o n s i n g u l a r , t h e n (8) implies t h a t Uik) = 0, c o n t r a r y to a s s u m p t i o n .

T h e « X 1 matrices which represent the uk in the y basis are

(20)

466 A P P E N D I X I

This equation, variously known as the secular, det er minant al, or charac- teristic equation, is a polynomial equation of the nth degree whose roots are the η eigenvalues of S. If one of these roots is inserted in (7), one obtains η linear equations for the η unknowns Uîk , i = 1, 2, n. But the determinant of the coefficient matrix vanishes, so that only η — 1 of these equations are independent. One may, however, solve for η — 1 of the Uîk in terms of some fixed Uîk , say U^nk . T h e normalization con- dition

then determines U^lk , U^2k , U^nk , except for an arbitrary phase factor of unit modulus. Repeating this procedure for each eigenvalue, one obtains the full matrix (U^).

T h e preceding analysis determines the eigenvalues and eigenvectors, but not their order of appearance in (T^) and ( t / # ) . If it desired to have a particular eigenvalue appear at the intersection of the k\h row and kth column, care must be taken to have its associated eigenvector as the £th column of (U^).

T h e Schur canonical form of matrix can be used to obtain some important results concerning hermitian and unitary matrices.

Unitary Matrices

An operator U is unitary if U^f = t /^{- 1}, so that

In matrix notation these relations lead to the orthogonality conditions

XI

^u^ik

I

²

=

1

UU* = UW = L

η

_{( 1 0 )}

k=l

η

(H)

where i, j = 1 , 2 , . . . , « . Equations (10) are equivalent to

£ I Uik\* = 1, for i=j, k

Xu^ikU*^t=0, for ίφ).

k

(21)

D E F I N I T I O N S A N D T H E O R E M S O F L I N E A R A L G E B R A 467

vwv =

Since V^UV is still unitary, the rows and columns must satisfy the unitary conditions. In particular, the first column, considered as a vector, must be normalized to unity; hence

"ιι"ίι = 1»

or

un = eiei.

Moreover, the first column must be orthogonal to the second column, so that

uuui 2 = e~i 9 l ui 2 = 0.

or u12 = 0.

If the rows of U are interpreted as row vectors, these equations state that every row of U is normalized and orthogonal to every other row of U. Equations (11) may be used to show that every column of U is normalized and orthogonal to every other column.

A general η Χ η unitary matrix contains n² complex numbers, but not all of these are independent. T h e first row of U must be normalized and orthogonal to the remaining η — 1 rows, so that the unitary nature of U imposes η conditions on the first row. Similarly, the second row must be normalized and orthogonal to the remaining η — 2 rows, which give η — 1 additional constraints. Continuing this procedure, one finds that the most general η χ η unitary matrix depends upon

n > - n - ^{ n - l ) - ^ = n > - i k = ~^1}

independent complex numbers, or n(n — 1) real parameters.

T h e unitary property of a matrix U is preserved under a similarity transformation by a unitary matrix V. For if U' = V~XUV = VfUV, then

(U'){uy = {vwv)(vwvy = vwvvww

= vuvv-w-w - vw - /.

Similarly, (U')\U') = 1. Suppose now that V brings U to triangular form:

(22)

468 A P P E N D I X I

vwv =

where λ^. denotes the kth diagonal element. This proves that

(1) Every unitary matrix can be brought to diagonal form by a similarity transformation with an appropriate unitary matrix.

(2) T h e eigenvalues of a unitary matrix are all of modulus unity.

Hermitian Operators

An operator is hermitian if it is self-adjoint:

H = W.

In matrix notation this condition requires that

TT _ υ *

T h e hermitian property is preserved under a similarity transformation by a unitary matrix U, since

(u-wuy = (WHuy = u*H\u^fy =

UHU.*

If U is the unitary matrix which reduces Η to triangular form, then, by the hermitian property, all elements above the principal diagonal must be equal to the complex conjugates of the corresponding elements below the diagonal. But the latter are all zero, so that

(1) Every hermitian matrix can be brought into diagonal form by an appropriate unitary matrix.

(2) T h e eigenvalues of a hermitian matrix are real numbers.

Theorem. T w o hermitian matrices A and Β can be simultaneously diagonalized by a unitary matrix V if and only if A and Β commute.

Normalization of the second column now yields u²² = e^w*. By a repeated application of the conditions for orthonormality to the remaining columns, it follows that

(23)

D E F I N I T I O N S A N D T H E O R E M S O F L I N E A R A L G E B R A 469 Proof. If A and Β can be simultaneously diagonalized, then, since diagonal matrices always commute,

( F W ) ( F W ) = ( F W ) ( F W ) ,

or, upon left multiplication with V and right multiplication with V^f, AB = ΒΑ.

Suppose now that A and Β commute. A unitary matrix can be found such that one of the given hermitian matrices, say A, is diagonal. Let VÂ denote the unitary matrix which effects the diagonalization, { x j the basis of eigenvectors, and a^{ the eigenvalues of A which will be assumed to be £^t-fold degenerate. By permuting the rows of VÂ , one can arrange for the diagonal matrix VÂAVÂ to assume the block form

S~a^x 0 ··· 0 0 a^x ··· 0

0 6 · · · α^λ

« 2 0 · • 0 0 « 2 · ·· 0

ό ό · • « 2

α^Ν 0 ··· 0 0 α^Ν ··· 0

V 0 0 -

a

^N

J

where all elements not on the principal diagonal are zero. T h e matrix is of dimension n, where η = Σ, Si · Evidently

VAAV^A = Α^λ®Α²® - ®A^k = α^λ1^λ ® a²l² ® - 0 a^Nl^N , where LI denotes a GI X GI unit matrix.

Consider now the matrix for Β relative to the basis which diagonalizes A. By assumption AB = ΒΑ, so that

(Xi, ABXj) = (Xi, BAxj) = aj(xⁱ, BXJ) and

(xt, ABxj) = (A*x{, Bx,) = (Axi, Βχό) = a^Xi, Bx5\.

(24)

470 A P P E N D I X I

From these results one obtains

(a^{ — a^Xi, Bxj) = 0.

T h u s = 0 for at Φ , but Btj does not necessarily vanish when xⁱ and Xj are the eigenvectors corresponding to any of the degenerate eigenvalues. It follows that the Β matrix has the form

V^A*BV^A=B¹®B²@-@B^N,

where each Bi is of dimension gi X gi . If gi = 1 for all i, then Β is diagonalized by VA and the theorem is proved.

If all the gi are not unity, one proceeds as follows. Each Bi is hermitian;

hence there exists a unitary matrix Vi such that ViABiVi is diagonal.

T h e whole matrix Β is diagonalized by

V^B= V¹®V²®-®V^N, since

Moreover, the matrix V^B does not alter the diagonal form of V^{A f}A V^A , since

VJVJAVAVB = axVxHxVx 0 a2V2U2V2 - = V i © - 0 « J » = VA*AVA . Thus V = V^AV^B simultaneously diagonalizes A and B.

T h e method used to establish the preceding theorem can also be used to prove that any finite number of commuting hermitian matrices, or any finite number of commuting unitary matrices, can be simul- taneously brought to diagonal form by a unitary matrix.

T h e theorem on commuting hermitian matrices is often stated in terms the hermitian operators which they represent.

Theorem. If A1 , A2 , AN are Ν hermitian operators such that [At, A,] = 0,

for all i and j9 then there exists a set of vectors which are simultaneous eigenvectors of all the given operators.

The Dirac Notation

When working with sets of commuting operators, it is convenient to employ the bra and ket notation of Dirac. In this notation, a vector which is a simultaneous eigenvector of the commuting operators

(25)

D E F I N I T I O N S AND THEOREMS OF LINEAR ALGEBRA 471 A^y B, C, ... is denoted by the ket vector \ a, b^y c^y ...>, where a, b^y c^y ...

are the eigenvalues of the given operators. T h e vector dual to | a, b^y c^y ...>

is the bra vector (a, b, c^y ... |. T h e scalar product of a bra vector

<#', b'y c' y ... I and a ket vector | ay by cy ...) is written

and the matrix elements of an arbitrary operator Ρ are written

T h e operator Ρ in this expression may be considered as operating on the ket vector, in which case the scalar product is <V, ... \{P\ ay ...>}. It may be also considered as operating on the bra (dual) vector, in which case the scalar product is {<#', ... | P+} | ay ...>, where (a!y ... | P+ = {P\ a'^y ...>}+. These forms correspond to (xy Py) = (P^fx^y y) in the previous notation.

<a\ b\ c\

... I

a, by c, ...> = <a^y b, c, ... \a\ b\ c', ...>*,

\P\a,b, c,...>.

Definitions and

A P P E N D I X I

Some Definitions and Theorems of Linear Algebra

(

£ V =

+ 2}

2) («\· + * VK

l,0)+<r

(l,3,

~h

~\~ "' ~\~ £n^n ·

(a, b) = a · b = I a I I b I

(I) - *=(,';)·

F +

H H

(*

) = (J J)-

(^IV^)(

i— ^2)}»

r ζ

^[/(Σ S

T

^j

= ^

i r «

= Xs

T

(ij=

r

IT*

I J K I J

\ * ni " *

nn/ WNF

χ = Σ toi > y = Σ

!*** · I I

I I K K

*-Q X)·

I=l

Ν Ν

2)

JJ

I=l I=l

= © ^

5

© ··· © .

j

XI

I

=

η

η

"ιι"ίι = 1»

H = W.

U*HU.

a

J

... I

^~\~ ^"' ^~\~ ^{£n^n} ^·

(a, b) = a · b ^{= I} a ^{I I} b ^I

**(I) - *=(,';)·**

^[/(Σ ^S

^T

^{^j}

^{= ^}

= ^Xs

^T

nn/ ^WNF

*!*** ·* I I

UHU.*