• Nem Talált Eredményt

Multiplication

In document Complexity of Algorithms (Pldal 161-168)

9. Chapter: Algebraic computations 161

162 9.2. Multiplication We could try to use this formula to compute the product recursively; but it seems that to evaluate this expression, even if we ignore additions, we have to perform four multiplications on m-bit integers, and it is easy to see that this does not lead to any gain in the number of bit-operations. The key observation is that with a little algebra, three multiplications are enough: Since

U0V1 +U1V0 = (U1−U0)(V0−V1) +U0V0+U1V1, we can express the product as

(22m+ 2m)U1V1+ 2m(U1−U0)(V0−V1) + (2m+ 1)U0V0.

This way we have reduced the multiplication of two 2m-bit integers to three multipli-cations of m-bit integers, and a few additions and multiplications by powers of 2. It is easy to count these additional operations, to get that they involve at most 22m bit operations.

If we denote by T(n) the number of operations used by this recursive algorithm, then

T(2m)3T(m) + 22m.

This formula gives, by induction, an upper bound on the number of bit-operations, at least if the number of bits is a power of 2:

T(2k)3T(2k−1) + 22·2k−1 ≤ · · · ≤3k+ 22(2k−1 + 3·2k−2 +· · ·+ 3k−1)

= 3k+ 22(3k2k)<23·3k.

To multiply two integers with n bits for a general n, we can “pad” the numbers with leading 0-s to increase the number of their bits to the next power of 2. If k =dlogne, then this algorithm will compute the product of two n-bit integers using

T(n)≤T(2k)<23·3k<23·31+logn = 69·nlog 3 <69·n1.585 bit-operations.

This simple method to compute the product of two large integers more efficiently than the method learned in elementary school can be improved substantially. We will see that using more advanced methods (discrete Fourier transforms) much less bit-operations suffice.

9.2.2 Matrix multiplication

Assume that we want to multiply the matrices

A=



a11 . . . a1n ... ...

an1 . . . ann

 and B =



b11 . . . b1n ... ...

bn1 . . . bnn



9. Chapter: Algebraic computations 163 (for simplicity, we consider only the case when the matrices are square). The product matrix is

C =



c11 . . . c1n ... ...

cn1 . . . cnn

 where cij =ai1b1j +· · ·+ainbnj. (9.3)

Performing these arithmetic operations in the simple way takesn3 multiplications and n2(n1) n3 additions. Normally, we think of multiplications as more costly than additions, so it could be useful to reduce the number of multiplications, even if it would mean to increase the number of additions. (However, we will see that, surprisingly, we can also reduce the number of additions.)

Strassen noticed that the multiplication of2×2 matrices can be carried out using 7multiplications and 18 additions, instead of the 8 multiplications and 4 additions in (9.3). We form7 products:

u0 = (a11+a22)(b11+b22), u1 = (a21+a22)b11,

u2 =a11(b12−b22), u3 =a22(b21−b11), u4 = (a11+a12)b22,

u5 = (a21−a11)(b11+b12),

u6 = (a12−a22)(b21+b22). (9.4)

Then we can express the entries of the product matrix as follows:

c11=u0+u3−u4+u6, c21=u1+u3,

c12=u2+u4,

c22=u0+u2−u1+u5. (9.5)

To have to perform 14 extra additions to save one multiplication does not look like a lot of gain, but we see how this gain is realized once we extend the method to larger matrices. Similarly as for multiplication of integers, we show how to reduce the multiplication of (2n)×(2n) matrices to the multiplication of n×n matrices. Let A, B and C =AB be (2n)×(2n)matrices, and let us split each of them into four n×n matrices:

A =

µA11 A12 A21 A22

, B =

µB11 B12 B21 B22

, C =

µC11 C12 C21 C22

.

Then Cij =Ai1B1j+Ai2B2j, and we can use the formulas (9.4) and (9.5) to compute these four matrices using only 7 multiplications and 18 additions of n×n matrices.

(Luckily, the verification of the formulas (9.4) and (9.5), which we did not write down, does not use commutativity of multiplication, so it remains valid for matrices.) As-suming that we start with a 2k×2k matrix, we can do this splitting recursively, until

164 9.2. Multiplication we get down to 1×1 matrices (which can be multiplied using a single multiplication of numbers). If the number of rows and columns is not a power of 2, we start with adding all-0 rows and columns to increase the size to the nearest power of 2.

Do we save any work by this more complicated algorithm? Let M(n) denote the number of multiplications, and S(n) the number of additions, when this algorithm is applied to n×n matrices. Then

M(2n) = 7M(n) and S(2n) = 18n2+ 7S(n).

Clearly M(1) = 1, S(1) = 0, and it is easy to prove by induction on k that M(2k) = 7k and S(2k) = 6(7k4k).

Let k=dlogne, then

M(n) = M(2k) = 7k <71+logn= 7nlog 7 <7n2.81, and similarly

S(n)<42nlog 7<42n2.81.

We see that while for n = 2 Strassen’s method only gained a little in the number of multiplications (and lost a lot in the number of additions), through this iteration we improved both the number of multiplications and the number of additions, at least for large matrices.

It is not easy to explain where the formulas (9.4) and (9.5) come from; in a sense, this is not even understood today, since it is open how much the exponent of n in the complexity of matrix multiplication can be reduced by similar methods. The current best algorithm, due to Williams, uses O(n2.3727)multiplications and additions.

9.2.3 Inverting matrices

Let B be a(2n)×(2n) matrix, which we partition into 4parts:

B =

µB11 B12 B21 B22

(9.6) We can bringB to a block-diagonal form similarly as we would do for a 2×2 matrix:

µ I 0

−B21B11−1 I

¶ µB11 B12

B21 B22

¶ µI −B11−1B12

0 I

=

µB11 0

0 B22−B21B−111B12

. (9.7) To simplify notation, let C =B22−B21B11−1B12. Inverting and expressingB−1, we get

B−1 =

µI −B11−1B12

0 I

¶ µB11−1 0 0 C−1

¶ µ I 0

−B21B11−1 I

=

B11−1+B11−1B12C−1B21B−111 −B11−1B12C−1

−C−1B21B11−1 C−1

 (9.8)

9. Chapter: Algebraic computations 165 This is a messy formula, but it describes how to compute the inverse of a (2n)×(2n) matrix using two matrix inversions (for B11 and C), 6 matrix multiplications and 2 additions (one of which is in fact a subtraction), all performed on n×n matrices. We could use this recursively as before, but there is a problem: how do we know that B11 is invertible? This does not follow even if we assume thatB is invertible.

The way out is to use the identity

A−1 = (A>A)−1A>. (9.9)

This shows that if we can invert the matrix B = A>A, then, at the cost of a further matrix multiplication, we can compute the inverse ofA. (We do not count the cost of computing the transpose of A, which involves only moving numbers around, no alge-braic operations.) Now if A is nonsingular, then B is symmetric and positive definite.

Hence the principal submatrix B11 in the decomposition (9.6) is also symmetric and positive definite. Furthermore, identity (9.7) implies that C is also symmetric and positive definite.

These facts have three important consequences. First, it follows that B11 and C are nonsingular, so the inverses B11−1 and C−1 in (9.8) make sense. Second, it follows that when computingB11−1 and C−1 recursively, then we stay in the territory of invert-ing symmetric and positive definite matrices, and so we don’t have to appeal to the trick (9.9) any more. Third, it follows that B21 =B>12, which saves us two multiplica-tions, sinceB21B11−1 = (B−111B12)> and C−1B21B11−1 = (B11−1B12C−1)> do not need to be computed separately.

LetI+(n)denote the minimum number of multiplications needed to invert ann×n positive definite matrix, and let L(n) denote the minimum number of multiplications needed to compute the product of two n×n matrices. It follows from formula (9.8) that

I+(2n)2I+(n) + 4L(n).

Using the matrix multiplication algorithm given in Section 9.2.2, we get that I+(2k+1)2I+(2k) + 4·7k,

which implies by induction that I+(2k)7k.

Using (9.9), we get that a nonsingular 2k ×2k matrix can be inverted using 3 ·7k multiplications. Just as in Section 9.2.2, this implies a bound for generaln: ann×n matrix can be inverted using no more than 21·nlog 7 multiplications. The number of additions can be bounded similarly.

9.2.4 Multiplication of polynomials

Suppose that we want to compute the product of two real polynomials in one variable, of degreen. Given

P(x) =a0+a1x+· · ·+anxn, and Q(x) =b0+b1x+· · ·+bnxn,

166 9.2. Multiplication we want to compute their product

R(x) =P(x)Q(x) = c0+c1x+· · ·+c2nx2n.

The coefficients of this polynomial can be computed by the formulas

ci =a0bi+a1bi−1+· · ·+aib0. (9.10) This is often called the convolutionof the sequences(a0, a1, . . . , an)and (b0, b1, . . . , bn).

To compute every coefficient by these formulas takes (n+ 1)2 multiplications.

We can do better if we use the fact that we can substitute into a polynomial. Let us substitute the values 0,1, . . . ,2n into the polynomials. In other words, we compute the values P(0), P(1), . . . , P(2n) and Q(0), Q(1), . . . , Q(2n), and then compute their products R(j) = P(j)Q(j). From here, the coefficients of R can be determined by solving the equations

c0 =R(0) c0+c1+c2+· · ·+c2n =R(1) c0+ 2c1+ 22c2+· · ·+ 22nc2n =R(2)

... (9.11)

c0+ (2n)c1+ (2n)2c2+· · ·+ (2n)2nc2n =R(2n)

This does not seem to be a great idea, since we need aboutn2 multiplications (and about the same number of additions) to compute the values P(0), P(1), . . . , P(2n)and Q(0), Q(1), . . . , Q(2n); it takes a small number of multiplications to get the values R(0), R(1), . . . , R(2n), but then of the order of n3 multiplications and additions to solve the system (9.11) if we use Gaussian elimination (fewer, if we use the more so-phisticated methods discussed in Section 9.2.2, but still substantially more than n2).

We see some gain, however, if we distinguish two kinds of multiplications: multiplica-tion by a fixed constant, or multiplicamultiplica-tion of two expressions containing the parameters (the coefficients ai and bi). Recall, that additions and multiplications by a fixed con-stant are linear operations. The computation of the values P(0), P(1), . . . , P(2n) and Q(0), Q(1), . . . , Q(2n), as well as the solution of equations (9.11), takes only linear operations. Nonlinear operations are needed only in the computation of the R(j), so their number is only 2n+ 1.

It would be very useful to reduce the number of linear operations too. The most time-consuming part of the above is solving equations (9.11); it takes of the order of n3 operations if we use Gaussian elimination (these are all linear, but still a lot).

Using more of the special structure of the equations, this can be reduced to O(n2) operations. But we can do even better, if we notice that there is nothing special about the substitutions 0,1, . . . ,2n, we could use any other 2n + 1 real or even complex numbers. As we are going to discuss in Section 9.2.5, substituting appropriate roots of unity leads to a much more efficient method for the multiplication of polynomials.

9.2.5 Discrete Fourier transform

LetP(x) = a0+a1x+· · ·+anxnbe a real polynomial, and fix anyr > n. Letε=e2πi/r be the first r-th root of unity, and consider the values

ˆ

ak =Pk) = a0+a1εk+a2ε2k+· · ·+anεnk (k = 0, . . . , r1). (9.12)

9. Chapter: Algebraic computations 167 The sequence(ˆa0,aˆ1, . . . ,ˆar−1)is called thediscrete Fourier transform of orderrof the sequence of coefficients (a0, a1, . . . , an). We will often append r−n−1 zeros to this sequence, to get a sequence(a0, a1, . . . , ar−1) of length r.

Discrete Fourier transforms have a number of very nice properties and important applications, of which we only discuss those related to polynomial multiplication.

We start with some simple but basic properties. First, the inverse transformation can be described by similar formulas:

ak= 1 r

¡ˆa0+ ˆa1ε−k+ ˆa2ε−2k+· · ·+ ˆar−1ε−(r−1)k¢

(k = 0, . . . , r1). (9.13) This can be verified by substituting the definition of ˆak into these formulas. Second, assume thatr >2n, and let(b0, . . . , br−1)and(c0, . . . , cr−1)be the coefficient sequences of the polynomialsQ(x)and R(x) =P(x)Q(x), and let(ˆb0, . . . ,ˆbr−1)and(ˆc0, . . . ,cˆr−1) be their Fourier transforms of orderr. Sinceaˆk is a the value of P atεk, we get that

ˆ

ck = ˆakˆbk. (9.14)

The main point in using discrete Fourier transforms is that they can be computed very fast; this method is one of the most successful algorithmic tools in computations.

To describe a fast method for computing the discrete Fourier transform, suppose that r= 2s is even. The Fourier transform (of order r) of a sequence(a0, a1, . . . , ar−1) can be split into two parts:

ˆ

ak=a0+a1εk+· · ·+ar−1ε(r−1)k

= (a0+a2ε2k+· · ·+a2s−2ε(2s−2)k)

+εk(a1 +a3ε2k+· · ·+a2s−1ε(2s−2)k). (9.15) Both expressions in parenthesis are Fourier transforms themselves: since ε2 is the first s-th root of unity, they are Fourier transforms of order s of the two sequences (a0, a2, . . . , a2s−2) and (a1, a3, . . . , a2s−1). So we have reduced the computation of a Fourier transform of order r = 2s to the computation of two Fourier transforms of orders. We can do this recursively.

How much work is involved? Let K(r)denote the number of arithmetic operations this algorithm uses to perform a Fourier transform of order r = 2s. Recursively, we need2K(s)operations to compute the two smaller Fourier transforms. We need2s2 multiplications to compute the powers of ε. Once we have these powers, we need only two further arithmetic operations to apply (9.15), but we have to do so for every k, so we need4s operations. Putting these together, we get

K(2s)2K(s) + 6s.

Ifs= 2m is a power of2, then this inequality implies, by induction, that K(2m)3m·2m.

For a generalr, we can choosem=dlogre, and get K(r)≤K(2m)3(1 + logr)21+logr= 6(1 + logr)r.

168 9.3. Algebraic complexity theory

In document Complexity of Algorithms (Pldal 161-168)