Multiplication - Complexity of Algorithms

9. Chapter: Algebraic computations 161

162 9.2. Multiplication We could try to use this formula to compute the product recursively; but it seems that to evaluate this expression, even if we ignore additions, we have to perform four multiplications on m-bit integers, and it is easy to see that this does not lead to any gain in the number of bit-operations. The key observation is that with a little algebra, three multiplications are enough: Since

U0V1 +U1V0 = (U1−U0)(V0−V1) +U0V0+U1V1, we can express the product as

(2^2m+ 2^m)U₁V₁+ 2^m(U₁−U₀)(V₀−V₁) + (2^m+ 1)U₀V₀.

This way we have reduced the multiplication of two 2m-bit integers to three multipli-cations of m-bit integers, and a few additions and multiplications by powers of 2. It is easy to count these additional operations, to get that they involve at most 22m bit operations.

If we denote by T(n) the number of operations used by this recursive algorithm, then

T(2m)≤3T(m) + 22m.

This formula gives, by induction, an upper bound on the number of bit-operations, at least if the number of bits is a power of 2:

T(2^k)≤3T(2^k−1) + 22·2^k−1 ≤ · · · ≤3^k+ 22(2^k−1 + 3·2^k−2 +· · ·+ 3^k−1)

= 3^k+ 22(3^k−2^k)<23·3^k.

To multiply two integers with n bits for a general n, we can “pad” the numbers with leading 0-s to increase the number of their bits to the next power of 2. If k =dlogne, then this algorithm will compute the product of two n-bit integers using

T(n)≤T(2^k)<23·3^k<23·3^1+logⁿ = 69·n^{log 3} <69·n^1.585 bit-operations.

This simple method to compute the product of two large integers more efficiently than the method learned in elementary school can be improved substantially. We will see that using more advanced methods (discrete Fourier transforms) much less bit-operations suffice.

9.2.2 Matrix multiplication

Assume that we want to multiply the matrices





a₁₁ . . . a_1n ... ...

a_n1 . . . a_nn



 and B =





b₁₁ . . . b_1n ... ...

b_n1 . . . b_nn





9. Chapter: Algebraic computations 163 (for simplicity, we consider only the case when the matrices are square). The product matrix is

C =





c₁₁ . . . c_1n ... ...

c_n1 . . . c_nn



 where c_ij =a_i1b_1j +· · ·+a_inb_nj. (9.3)

Performing these arithmetic operations in the simple way takesn³ multiplications and n²(n−1) ∼ n³ additions. Normally, we think of multiplications as more costly than additions, so it could be useful to reduce the number of multiplications, even if it would mean to increase the number of additions. (However, we will see that, surprisingly, we can also reduce the number of additions.)

Strassen noticed that the multiplication of2×2 matrices can be carried out using 7multiplications and 18 additions, instead of the 8 multiplications and 4 additions in (9.3). We form7 products:

u₀ = (a₁₁+a₂₂)(b₁₁+b₂₂), u₁ = (a₂₁+a₂₂)b₁₁,

u₂ =a₁₁(b₁₂−b₂₂), u₃ =a₂₂(b₂₁−b₁₁), u₄ = (a₁₁+a₁₂)b₂₂,

u5 = (a21−a11)(b11+b12),

u₆ = (a₁₂−a₂₂)(b₂₁+b₂₂). (9.4)

Then we can express the entries of the product matrix as follows:

c₁₁=u₀+u₃−u₄+u₆, c₂₁=u₁+u₃,

c₁₂=u₂+u₄,

c₂₂=u₀+u₂−u₁+u₅. (9.5)

To have to perform 14 extra additions to save one multiplication does not look like a lot of gain, but we see how this gain is realized once we extend the method to larger matrices. Similarly as for multiplication of integers, we show how to reduce the multiplication of (2n)×(2n) matrices to the multiplication of n×n matrices. Let A, B and C =AB be (2n)×(2n)matrices, and let us split each of them into four n×n matrices:

A =

µA₁₁ A₁₂ A₂₁ A₂₂

, B =

µB₁₁ B₁₂ B₂₁ B₂₂

, C =

µC₁₁ C₁₂ C₂₁ C₂₂

¶ .

Then C_ij =A_i1B_1j+A_i2B_2j, and we can use the formulas (9.4) and (9.5) to compute these four matrices using only 7 multiplications and 18 additions of n×n matrices.

(Luckily, the verification of the formulas (9.4) and (9.5), which we did not write down, does not use commutativity of multiplication, so it remains valid for matrices.) As-suming that we start with a 2^k×2^k matrix, we can do this splitting recursively, until

164 9.2. Multiplication we get down to 1×1 matrices (which can be multiplied using a single multiplication of numbers). If the number of rows and columns is not a power of 2, we start with adding all-0 rows and columns to increase the size to the nearest power of 2.

Do we save any work by this more complicated algorithm? Let M(n) denote the number of multiplications, and S(n) the number of additions, when this algorithm is applied to n×n matrices. Then

M(2n) = 7M(n) and S(2n) = 18n²+ 7S(n).

Clearly M(1) = 1, S(1) = 0, and it is easy to prove by induction on k that M(2^k) = 7^k and S(2^k) = 6(7^k−4^k).

Let k=dlogne, then

M(n) = M(2^k) = 7^k <7^1+logⁿ= 7n^{log 7} <7n^2.81, and similarly

S(n)<42n^{log 7}<42n^2.81.

We see that while for n = 2 Strassen’s method only gained a little in the number of multiplications (and lost a lot in the number of additions), through this iteration we improved both the number of multiplications and the number of additions, at least for large matrices.

It is not easy to explain where the formulas (9.4) and (9.5) come from; in a sense, this is not even understood today, since it is open how much the exponent of n in the complexity of matrix multiplication can be reduced by similar methods. The current best algorithm, due to Williams, uses O(n^2.3727)multiplications and additions.

9.2.3 Inverting matrices

Let B be a(2n)×(2n) matrix, which we partition into 4parts:

B =

µB₁₁ B₁₂ B₂₁ B₂₂

(9.6) We can bringB to a block-diagonal form similarly as we would do for a 2×2 matrix:

µ I 0

−B₂₁B₁₁⁻¹ I

¶ µB11 B12

B₂₁ B₂₂

¶ µI −B₁₁⁻¹B12

0 I

µB11 0

0 B₂₂−B₂₁B⁻¹₁₁B₁₂

. (9.7) To simplify notation, let C =B22−B21B₁₁⁻¹B12. Inverting and expressingB⁻¹, we get

B⁻¹ =

µI −B₁₁⁻¹B₁₂

0 I

¶ µB₁₁⁻¹ 0 0 C⁻¹

¶ µ I 0

−B₂₁B₁₁⁻¹ I



B₁₁⁻¹+B₁₁⁻¹B₁₂C⁻¹B₂₁B⁻¹₁₁ −B₁₁⁻¹B₁₂C⁻¹

−C⁻¹B₂₁B₁₁⁻¹ C⁻¹



 (9.8)

9. Chapter: Algebraic computations 165 This is a messy formula, but it describes how to compute the inverse of a (2n)×(2n) matrix using two matrix inversions (for B₁₁ and C), 6 matrix multiplications and 2 additions (one of which is in fact a subtraction), all performed on n×n matrices. We could use this recursively as before, but there is a problem: how do we know that B₁₁ is invertible? This does not follow even if we assume thatB is invertible.

The way out is to use the identity

A⁻¹ = (A^>A)⁻¹A^>. (9.9)

This shows that if we can invert the matrix B = A^>A, then, at the cost of a further matrix multiplication, we can compute the inverse ofA. (We do not count the cost of computing the transpose of A, which involves only moving numbers around, no alge-braic operations.) Now if A is nonsingular, then B is symmetric and positive definite.

Hence the principal submatrix B₁₁ in the decomposition (9.6) is also symmetric and positive definite. Furthermore, identity (9.7) implies that C is also symmetric and positive definite.

These facts have three important consequences. First, it follows that B₁₁ and C are nonsingular, so the inverses B₁₁⁻¹ and C⁻¹ in (9.8) make sense. Second, it follows that when computingB₁₁⁻¹ and C⁻¹ recursively, then we stay in the territory of invert-ing symmetric and positive definite matrices, and so we don’t have to appeal to the trick (9.9) any more. Third, it follows that B21 =B^>₁₂, which saves us two multiplica-tions, sinceB₂₁B₁₁⁻¹ = (B⁻¹₁₁B₁₂)^> and C⁻¹B₂₁B₁₁⁻¹ = (B₁₁⁻¹B₁₂C⁻¹)^> do not need to be computed separately.

LetI⁺(n)denote the minimum number of multiplications needed to invert ann×n positive definite matrix, and let L(n) denote the minimum number of multiplications needed to compute the product of two n×n matrices. It follows from formula (9.8) that

I⁺(2n)≤2I⁺(n) + 4L(n).

Using the matrix multiplication algorithm given in Section 9.2.2, we get that I⁺(2^k+1)≤2I⁺(2^k) + 4·7^k,

which implies by induction that I⁺(2^k)≤7^k.

Using (9.9), we get that a nonsingular 2^k ×2^k matrix can be inverted using 3 ·7^k multiplications. Just as in Section 9.2.2, this implies a bound for generaln: ann×n matrix can be inverted using no more than 21·n^{log 7} multiplications. The number of additions can be bounded similarly.

9.2.4 Multiplication of polynomials

Suppose that we want to compute the product of two real polynomials in one variable, of degreen. Given

P(x) =a₀+a₁x+· · ·+a_nxⁿ, and Q(x) =b₀+b₁x+· · ·+b_nxⁿ,

166 9.2. Multiplication we want to compute their product

R(x) =P(x)Q(x) = c₀+c₁x+· · ·+c_2nx²ⁿ.

The coefficients of this polynomial can be computed by the formulas

c_i =a₀b_i+a₁b_i−1+· · ·+a_ib₀. (9.10) This is often called the convolutionof the sequences(a0, a1, . . . , an)and (b0, b1, . . . , bn).

To compute every coefficient by these formulas takes (n+ 1)² multiplications.

We can do better if we use the fact that we can substitute into a polynomial. Let us substitute the values 0,1, . . . ,2n into the polynomials. In other words, we compute the values P(0), P(1), . . . , P(2n) and Q(0), Q(1), . . . , Q(2n), and then compute their products R(j) = P(j)Q(j). From here, the coefficients of R can be determined by solving the equations

c₀ =R(0) c₀+c₁+c₂+· · ·+c_2n =R(1) c0+ 2c1+ 2²c2+· · ·+ 2²ⁿc2n =R(2)

... (9.11)

c₀+ (2n)c₁+ (2n)²c₂+· · ·+ (2n)²ⁿc_2n =R(2n)

This does not seem to be a great idea, since we need aboutn² multiplications (and about the same number of additions) to compute the values P(0), P(1), . . . , P(2n)and Q(0), Q(1), . . . , Q(2n); it takes a small number of multiplications to get the values R(0), R(1), . . . , R(2n), but then of the order of n³ multiplications and additions to solve the system (9.11) if we use Gaussian elimination (fewer, if we use the more so-phisticated methods discussed in Section 9.2.2, but still substantially more than n²).

We see some gain, however, if we distinguish two kinds of multiplications: multiplica-tion by a fixed constant, or multiplicamultiplica-tion of two expressions containing the parameters (the coefficients a_i and b_i). Recall, that additions and multiplications by a fixed con-stant are linear operations. The computation of the values P(0), P(1), . . . , P(2n) and Q(0), Q(1), . . . , Q(2n), as well as the solution of equations (9.11), takes only linear operations. Nonlinear operations are needed only in the computation of the R(j), so their number is only 2n+ 1.

It would be very useful to reduce the number of linear operations too. The most time-consuming part of the above is solving equations (9.11); it takes of the order of n³ operations if we use Gaussian elimination (these are all linear, but still a lot).

Using more of the special structure of the equations, this can be reduced to O(n²) operations. But we can do even better, if we notice that there is nothing special about the substitutions 0,1, . . . ,2n, we could use any other 2n + 1 real or even complex numbers. As we are going to discuss in Section 9.2.5, substituting appropriate roots of unity leads to a much more efficient method for the multiplication of polynomials.

9.2.5 Discrete Fourier transform

LetP(x) = a0+a1x+· · ·+anxⁿbe a real polynomial, and fix anyr > n. Letε=e^2πi/r be the first r-th root of unity, and consider the values

a_k =P(ε^k) = a₀+a₁ε^k+a₂ε^2k+· · ·+a_nε^nk (k = 0, . . . , r−1). (9.12)

9. Chapter: Algebraic computations 167 The sequence(ˆa₀,aˆ₁, . . . ,ˆa_r−1)is called thediscrete Fourier transform of orderrof the sequence of coefficients (a₀, a₁, . . . , a_n). We will often append r−n−1 zeros to this sequence, to get a sequence(a₀, a₁, . . . , a_r−1) of length r.

Discrete Fourier transforms have a number of very nice properties and important applications, of which we only discuss those related to polynomial multiplication.

We start with some simple but basic properties. First, the inverse transformation can be described by similar formulas:

ak= 1 r

¡ˆa0+ ˆa1ε^−k+ ˆa2ε^−2k+· · ·+ ˆar−1ε^−(r−1)k¢

(k = 0, . . . , r−1). (9.13) This can be verified by substituting the definition of ˆa_k into these formulas. Second, assume thatr >2n, and let(b₀, . . . , b_r−1)and(c₀, . . . , c_r−1)be the coefficient sequences of the polynomialsQ(x)and R(x) =P(x)Q(x), and let(ˆb₀, . . . ,ˆb_r−1)and(ˆc₀, . . . ,cˆ_r−1) be their Fourier transforms of orderr. Sinceaˆ_k is a the value of P atε^k, we get that

c_k = ˆa_kˆb_k. (9.14)

The main point in using discrete Fourier transforms is that they can be computed very fast; this method is one of the most successful algorithmic tools in computations.

To describe a fast method for computing the discrete Fourier transform, suppose that r= 2s is even. The Fourier transform (of order r) of a sequence(a₀, a₁, . . . , a_r−1) can be split into two parts:

a_k=a₀+a₁ε^k+· · ·+a_r−1ε^(r−1)k

= (a₀+a₂ε^2k+· · ·+a_2s−2ε^(2s−2)k)

+ε^k(a₁ +a₃ε^2k+· · ·+a_2s−1ε^(2s−2)k). (9.15) Both expressions in parenthesis are Fourier transforms themselves: since ε² is the first s-th root of unity, they are Fourier transforms of order s of the two sequences (a0, a2, . . . , a2s−2) and (a1, a3, . . . , a2s−1). So we have reduced the computation of a Fourier transform of order r = 2s to the computation of two Fourier transforms of orders. We can do this recursively.

How much work is involved? Let K(r)denote the number of arithmetic operations this algorithm uses to perform a Fourier transform of order r = 2s. Recursively, we need2K(s)operations to compute the two smaller Fourier transforms. We need2s−2 multiplications to compute the powers of ε. Once we have these powers, we need only two further arithmetic operations to apply (9.15), but we have to do so for every k, so we need4s operations. Putting these together, we get

K(2s)≤2K(s) + 6s.

Ifs= 2^m is a power of2, then this inequality implies, by induction, that K(2^m)≤3m·2^m.

For a generalr, we can choosem=dlogre, and get K(r)≤K(2^m)≤3(1 + logr)2^1+log^r= 6(1 + logr)r.

168 9.3. Algebraic complexity theory

In document Complexity of Algorithms (Pldal 161-168)