You are here

Geometry and Complexity Theory

J. M. Landsberg
Publisher: 
Cambridge University Press
Publication Date: 
2018
Number of Pages: 
339
Format: 
Hardcover
Series: 
Cambridge Studies in Advanced Mathematics 169
Price: 
64.99
ISBN: 
9781107199231
Category: 
Monograph
[Reviewed by
Felipe Zaldivar
, on
04/12/2018
]

Computational complexity theory, a major field in theoretical computer science, aims to classify all possible algorithms that could be used to solve a given computational problem. Here a computational problem is a task that, in principle, could be solved by a computer, for example by means of an algorithm. It is also convenient to specify that a computer is a mathematical model of a computer, for example a Turing machine. The computational problems that complexity theory aims to classify are problems that come with some restrictions, for example on available resources. If one does not impose such restrictions, then one is in the field of computability theory whose aim is to determine, in principle, which computational problems could be solved by means of an algorithm. On the other hand, if one is interested on the amount of resources that a certain algorithm needs to solve a given problem then one is in the field of analysis of algorithms, whose aim is precisely the study of the cost of algorithms. Examples of computational problems of current interest are the traveling salesman problem, the integer factorization problem, or the matrix multiplication problem; the latter is one of the main focuses of the book under review. Since it is easy to formulate, let us recall the main details: Given two \(n\times n\) matrices, with entries in a given field, computing their product requires a certain number of scalar multiplications and additions of the entries of the given matrices. For example, for the product of two \(2\times 2\) matrices, \[\begin{pmatrix} a_{11}& a_{12} \\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} b_{11}& b_{12} \\ b_{21} & b_{22} \end{pmatrix}=\begin{pmatrix} c_{11}& c_{12} \\ c_{21} & c_{22} \end{pmatrix}\] the right-hand side in the standard algorithm is

\(\begin{align*} c_{11}&=a_{11}b_{11} + a_{12}b_{21}\\ c_{12}& =a_{11}b_{12}+a_{12}b_{22} \\ c_{21}&=a_{21}b_{11}+a_{22}b_{21} \\ c_{22} & = a_{21}b_{12}+a_{22}b_{22} \end{align*}\)

This requires 8 multiplications and 4 additions. In general, for \(n\times n\) matrices, it is straightforward to see that the product of two such matrices requires \(n^3\) multiplications and \(n^3-n^2\) additions. Since scalar multiplications are more computationally costly than scalar additions the focus is on the number of scalar multiplications required to find the product of two square matrices.

In 1969, V. Strassen in “Gaussian elimination is not optimal”, Numer. Math. 13 (1969), 354–356) showed, rather surprisingly, that there is an algorithm for the multiplication of square matrices that requires fewer scalar multiplications than the standard algorithm. For \(2\times 2\) matrices as above, Strassen algorithm has right-hand side given by

\( \begin{align*} c_{11}&= m_1+m_4-m_5+m_7\\ c_{12}&= m_3+ m_5 \\ c_{21}&=m_2+m_4 \\ c_{22}&=m_1-m_2+m_3+m_6 \end{align*} \)

where in the right-hand side the scalar entries, each requiring just one scalar multiplication, are

\(\begin{align*} m_1&=(a_{11} + a_{22}) (b_{11} + b_{22})\\ m_2&= (a_{21}+a_{22})b_{11}\\ m_3&=a_{11}(b_{12}-b_{22}) \\ m_4&=a_{22}(b_{21}-b_{11})\\ m_5&= (a_{11}+a_{12})b_{22}\\ m_6&= (a_{21}-a_{11})(b_{11}+b_{12})\\ m_7&= (a_{12}-a_{22})(b_{21}+b_{22}). \end{align*} \)

Thus, Strassen’s algorithm for \(2\times2\) matricesrequires 7 scalar multiplications instead of the standard 8. This algorithm can be generalized to arbitrary \(n\times n\) matrices. First, when \(n\) is a power of \(2\), we generalize by recursively considering each entry in the above algorithm as a block matrix; next, when \(n\) is not a power of \(2\), by enlarging the given matrices adding rows and columns of zeros to obtain matrices whose size a power of \(2\). A direct count shows that for \(2^k\times 2^k\) matrices Strassen’s algorithm requires \(7^k\) scalar multiplications whereas the standard algorithm requires \(8^k\) scalar multiplications. For \(n\neq 2^k\), writing \(7^k=(2^k)^{\ell}=n^{\ell}\) where \(\ell=\log_27\) we see that, for \(n\times n\) matrices, the number of scalar multiplications in Strassen’s algorithm can be written as \(n^{\log_27}\approx n^{2.81}\) whereas for the standard algorithm it is \(n^3\). Strassen’s algorithm is more efficient when \(n\) is large.

In general, for matrix multiplication, the complexity of any efficient algorithm depends just on the number of scalar multiplications. The measure of complexity of matrix multiplication \(\omega\) (or the exponent of matrix multiplication) is the infimum of the exponents \(h\in{\mathbb R}\) such that any two \(n\times n\) matrices can be multiplied using \(O(n^h)\) arithmetic operations (addition and multiplication of scalars). Strassen’s algorithm shows that \(\omega\leq \log_27<2.81\). Strassen’s algorithm opened the door to the search for better algorithms to efficiently multiply matrices and through the years several algorithms have been found that improve on the asymptotic complexity \(O(n^{2.81})\) of Strassen’s algorithm. The current record, a variant of the Coppersmith-Winograd algorithm, gives \(\omega<2.373\). Notice that \(\omega\geq 2\) since an \(n\times n\) matrix has \(n^2\) entries. A folklore conjecture states that \(\omega=2\). The first five chapters of the book under review give a thorough discussion of the complexity of matrix multiplication, with chapters 2 and 5 devoted to finding lower bounds for the exponent \(\omega\) and chapters 3 and 4 to upper bounds.

The key point in these developments is the use of representation theory and algebraic geometry to obtain these bounds. The crucial notions are generalizations of the concept of rank of a matrix \(T\). First, viewing a matrix \(T:U\rightarrow V\) as a tensor in \(U^*\otimes V\), then \(T\) has rank \(1\) if and only if \(T\) has the form \(T=u\otimes v\), that is, \(T\) is a decomposable nonzero tensor. In general, the rank of \(T\) is the minimal number \(r\) of decomposable tensors \(u_i\otimes v_i\in U^*\otimes V\) that add up to \(T\) \[T=u_1\otimes v_1+\cdots +u_r\otimes v_r.\] Next, since multiplication of matrices is a bilinear map \(M:U\times V\rightarrow W\), viewing it as a tensor in \(U^*\otimes V^*\otimes W\), then \(M\) has tensor rank \(1\) if it is a nonzero decomposable tensor \(M=u\otimes v\otimes w\). In general, the tensor rank \(R(M)=r\) of \(M\) is the minimal number \(r\) of rank \(1\) tensors that add up to \(M\). That the notion of tensor rank is indeed a measure of the complexity of matrix multiplication is again a theorem of Strassen: For the product \(M_n\) of square \(n\times n\) matrices \[\omega=\inf\{h\in {\mathbb R}:R(M_n)=O(n^h)\}.\] Therefore, to find or bound \(\omega\) one must consider the subset of tensors in \(U^*\otimes V^*\otimes W\) of tensor rank \(r\) or at most \(r\). The second set, denoted \(\hat{\sigma}_r^0\), is the one amenable to the use of algebraic geometry, since it corresponds to an affine cone associated to the \(r\)-th secant variety of a classical Segre variety. Then, to find lower bounds for \(\omega\) one needs to find \(M_n\not\in \hat{\sigma}_r^0\) and to do this one must find a polynomial \(f\) in the vanishing ideal of \(\hat{\sigma}_r^0\) that does not vanish at \(M_n\). Details of how this is done, using algebraic geometry and representation theory, are in chapters 2 and 5 of the book. To find upper bounds for \(\omega\) is a bit more difficult and one must consider the Zariski closure \(\hat{\sigma}_r\) of \(\hat{\sigma}_r^0\) and the associated notion of border rank of a tensor: A tensor \(T\in U^*\otimes V^*\otimes W\) has border rank \(\underline{R}(T)=r\) if \(T\in \hat{\sigma}_r\) but \(T\not\in \hat{\sigma}_{r-1}\). In general, \(\underline{R}(M_n)\leq R(M_n)\), but for the exponent of matrix multiplication this possible difference does not matter: \[\omega=\inf\{h\in {\mathbb R}:\underline{R}(M_n)=O(n^h)\}.\] Chapters 3, 4 and 5 give details of how to use these constructions to obtain asymptotic upper bounds for \(\omega\).

The second main example considered in the book under review gives an introduction to an ambitious program for proving the P v. NP conjecture. The ambitious adjective comes from the fact that this program borrows tools from invariant theory to clarify certain aspects of the P v. NP conjecture, sometimes via specific algebraic geometry conjectures that seem, at least, plausible.

For this introduction, the author has chosen a conjecture of Valiant for which the geometric formulation is easier to understand. The road to algebraic geometry is via a generalization of Boolean circuits (networks of Boolean logic gates; the size of one such Boolean circuit is the number of such gates in it) to algebraic circuits with inputs a (finite set of) variables \(x_1,\ldots, x_n\) or elements of a given field \(F\) and with gates the operations addition and multiplication of the field. One can draw these circuits as finite, directed, acyclic graphs whose nodes are labeled by the variables \(x_1,\ldots, x_n\), elements of the field, and the operations; the size of an algebraic circuit is the number of edges in it. Algebraic circuits recover the classical Boolean ones by taking the field with two elements. To each algebraic circuit, over a field \(F\) one associates a polynomial taking the inputs (variables or scalars) and performing the operations of each one of its gates. In computational complexity theory, algebraic circuits form a class of standard algorithms for computing polynomials and measuring their complexity as the size of the smallest circuit that realizes the polynomial. The questions are: Given a polynomial \(f=f(x_1,\ldots,x_n)\), is there an algebraic circuit, over the field \(F\) and variables \(x_1,\ldots, x_n\), whose associated polynomial is \(f\)? In general. given a sequence of polynomials \(f_m=f_m(x_1,\ldots,x_n)\) is there a sequence of circuits in \(n\) variables realizing the polynomials \(f_m\)? Moreover, given a sequence of polynomials \(f_m\) in \(n\) variables, is there a sequence of circuits in \(n\) variables and whose size grows polynomially in \(n\), which realizes the sequence \(f_m\)?

Two families of polynomials are relevant in this context: Recall that for an \(n\times n\) matrix \(X\), with entries variables \(x_{11},\ldots, x_{nn}\), there are associated two homogeneous polynomials of degree \(n\), its determinant \(\det_nX\) and its permanent \(\text{perm}_nX\), which only differ by the sign of some of their monomials. From the point of view of complexity theory, however, they are quite different: there exists a sequence of circuits of polynomial size realizing \(\det_n\) for each \(n\). In other words, \(\det_n\) is computable in polynomial time (actually, of order \(O(n^{\omega})\), for \(\omega\) as before).

For the permanent, however, things are different. L. Valiant (The complexity of computing the permanent, Theoretical Comp. Sci. 8 No. 2 (1979), 189–201), has conjectured that there is no polynomial size sequence of circuits computing \(\text{perm}_n\). A, rather surprising, road to approach this conjecture considers the subspace of homogeneous polynomials of degree \(n\) of the algebra of polynomials \(F[x_{11},\ldots, x_{nn}]\) in \(n^2\) variables with coefficients in a field \(F\) as the symmetric algebra \(\text{Sym}^nE^*\), where \(E\simeq F^{n^2}\). Thus \(\text{perm}_n\in \text{Sym}^nE^*\). There is a natural action of the general linear group \(\text{GL}_{n^2}F\) on the space \(\text{Sym}^nE^*\) of homogeneous polynomials of degree \(n\): for matrices \(A\in \text{GL}_{n^2}F\) and polynomials \(f\in \text{Sym}^nE^*\) the action is \((A\cdot f)(x)=f(Ax)\). Then, the natural question is if \(\text{perm}_nA\) is \(\det_nB\) of another matrix \(B\) obtained in a simple (that is, affine) way from \(A\). This is obviously true for \(2\times 2\) matrices \[\text{perm}_2\begin{pmatrix} a & b\\ c & d \end{pmatrix}=\text{det}_2\begin{pmatrix} a & -b \\ c & d \end{pmatrix},\] but for \(n>2\) Marcus and Mink (“On the relation between the determinant and the permanent”, Illinois J. Math. 5 (1961), 376–381) showed that it is not possible to express \(\text{perm}_n(A)\) as \(\det_n(B)\) for a matrix \(B\) of the same size as \(A\) obtained by affine linear functions of the entries of \(A\). If we allow the matrix \(B\) to have size larger than that of \(A\), it is possible: Bruno Grenet has proved that there exists a \((2^n-1)\times (2^n-1)\) matrix \(B\), obtained from \(A\) by affine functions of its entries, such that \(\text{perm}_n(A)=\det_{2^n-1}B\). The question is if one can improve on the number \(2^n-1\).

To search for an answer, it is convenient to write the equality in the same space of matrices, and to do this, if \(n<m\) one considers the padded permanent \[\text{perm}^*_{m,n}(B)=t^{m-n}\text{perm}_n(B|_n),\] where \(B\) is an \(m\times m\) matrix, \(B|_n\) is the upper-left \(n\times n\) submatrix of \(B\), and \(t\) is any entry not in \(B|_n\). Thus, \(\text{perm}^*_{m,n}\) and \(\det_m\) are degree \(m\) homogeneous polynomials and the question if \(\text{perm}_nA\) is a \(\det_mB\) of a matrix \(B\) obtained from \(A\) by affine transformations of its entries can now be formulated in terms of points in the space \(\text{Sym}^mE^*\) of degree \(m\) homogeneous polynomials in \(m^2\) variables. Thus, for the points \(\det_m\) and \(\text{perm}^*_{m,n}\) in \(\text{Sym}^mE^*\) one considers their orbits, under de action of the linear group \(\text{GL}_{m^2}F\), and their closures. Then, the question if there exists an affine map \(L\) from the space of \(n\times n\) matrices to the space of \(m\times m\) matrices such that for any \(n\times n\) matrix \(A\) we have that \(\text{perm}^*_{m,n}(A)=\text{det}_mLA\) is translated into a question about if one of these points belongs to the closure of the orbit of the other point. Chapters 6, 7, and 8 are devoted to a precise formulation of this question, giving both results and conjectures in algebraic geometry, and exploring their relation to finding bounds for the embedding of a permanent in a determinant.

Chapters 9 and 10 explore other complexity problems, expanding the role of algebraic geometry and the use of symmetries to formulate computational complexity problems such as Valiant’s conjecture, for some specific simple situation, exploring its relation to the classical Chow variety of linear forms in projective space.

The methods of representation theory and algebraic geometry that are used to frame, understand, and solve some problems of computational complexity theory provide theoretical computer scientists additional techniques for their toolbox already containing large portions of combinatorics, linear algebra and probability. On the other hand, for someone interested in representation theory or algebraic geometry, computational complexity theory has already entered the ever expanding playground for applications of these fundamental areas of mathematics. The book under review provides an introduction accessible to graduate students and researchers either in computer science or in geometry and representation theory expanding a useful bridge between these disciplines and unifying recent trends.


Felipe Zaldivar is Professor of Mathematics at the Universidad Autonoma Metropolitana-I, in Mexico City. His e-mail address is fz@xanum.uam.mx.

1. Introduction
2. The complexity of matrix multiplication I
3. The complexity of matrix multiplication II
4. The complexity of matrix multiplication III
5. The complexity of matrix multiplication IV
6. Valiant's hypothesis I
7. Valiant's hypothesis II
8. Representation theory and its uses in complexity theory
9. The Chow variety of products of linear forms
10. Topics using additional algebraic geometry.