Math Origins: Eigenvectors and Eigenvalues

Author(s): 
Erik R. Tou (University of Washington Tacoma)

In most undergraduate linear algebra courses, eigenvalues (and their cousins, the eigenvectors) play a prominent role. Their most immediate application is in transformational geometry, but they also appear in quantum mechanics, geology, and acoustics. Those familiar with the subject will know that a linear transformation \(T\) of an \(n\)-dimensional vector space can be represented by an \(n\times n\) matrix, say \(M,\) and that a nonzero vector \(\vec{v}\) is an eigenvector of \(M\) if there is a scalar \(\lambda\) for which \[M\vec{v} = \lambda\vec{v}.\] Narrowing our focus to \(n\)-dimensional real space, we may take \(I\) to be the \(n\times n\) identity matrix, in which case the equation becomes \[(\lambda I - M)\vec{v} = \vec{0}.\] This equation can hold for a nonzero vector \(\vec{v}\) (our eigenvector) only when the determinant of \(\lambda I - M\) is zero. This leads us to a characteristic polynomial, defined by \[\det(\lambda I - M).\] Roots of this polynomial are the eigenvalues \(\lambda\) of the matrix \(M\).

As an example, consider the transformation of \(\mathbb{R}^3\) represented by the matrix \[M = \begin{bmatrix} 2 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & -1 \end{bmatrix}.\] The characteristic polynomial is given by \[\det \begin{bmatrix} \lambda-2 & -1 & 0 \\ -1 & \lambda & 0 \\ 0 & 0 & \lambda+1 \end{bmatrix} = \lambda^3-\lambda^2-3\lambda-1,\] and so the eigenvalues of \(M\) are the roots \(-1\), \(1-\sqrt{2}\), \(1+\sqrt{2}\) of this polynomial. (Linear algebra students may recall that since this matrix is symmetric, we know before computing them that the eigenvalues must be real.) The respective eigenvectors are \(\langle 0,0,1\rangle\), \(\langle 1,-1-\sqrt{2},0\rangle\), \(\langle 1,-1+\sqrt{2},0\rangle\).

Now consider the words I have used here, and in particular note the presence of the German prefix eigen-. This may be perplexing to most readers, and indeed, its use in North America has not always been so common. In fact, over the past two centuries the words proper, latent, characteristic, secular, and singular have all been used as alternatives to our perplexing prefix. And while these variations have disappeared from common use in the United States, we'll see that even today the argument is not quite over.

Before going further, though, note that the prefix eigen- predates its mathematical use. Old English used the word agen to mean "owned or possessed (by)," and while this usage no longer exists in modern English, eigen is used to mean "self" in modern German. For example, the word eigenkapital is usually translated into English as equity, though a more literal translation would be "self capital." In other words, this word refers to the principal that is the borrower's own—the money that has already been payed to the lender. Interestingly, this meaning also carries over to acoustics, where an eigentöne (literally, "self-tone") is a frequency that produces resonance in a room or other performance space. So we can see that the prefix eigen- had a well-established usage in German (and once upon a time, in English). How, then, did it get attached to mathematics?

I. Cauchy and Celestial Mechanics

The proper mathematical history of the eigenvalue begins with celestial mechanics, in particular with Augustin-Louis Cauchy's 1829 paper "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planétes" ("On the equation which helps one determine the secular inequalities in the movements of the planets"). This paper, which later appeared in his Exercises de mathématiques [Cau1], concerned the observed motion of the planets. At this time, astronomers and mathematicians were making detailed observations of the planets in order to validate the mathematical models inherited from Kepler and Newton's laws of motion. One famous outcome of this program was Urbain Le Verrier's prediction of the existence of Neptune based on observed perturbations in the orbit of Uranus.

In celestial mechanics, most perturbations are periodic; for example, Neptune's influence on Uranus's orbit waxes and wanes as the planets revolve around the sun. However, one perplexing issue in the 18th and 19th centuries was that of secular perturbations—those non-periodic perturbations that increase gradually over time. Today, the word secular usually refers to those aspects of society and culture that are not religious or spiritual in nature. However, it also carries the meaning of something whose existence persists over long stretches of time (coming from the Latin saecularis, "occurring once in an age"), which is the more accurate usage in this case. As it happens, Cauchy's original interest in this subject was not secular perturbation, but rather the axes of motion for certain surfaces in three dimensional space. In an 1826 note to the Paris Academy of Sciences, Cauchy wrote:

It is known that the determination of the axes of a surface of the second degree or of the principal axes and moments of inertia of a solid body depend on an equation of the third degree, the three roots of which are necessarily real. However, geometers have succeeded in demonstrating the reality of the three roots by indirect means only... The question that I proposed to myself consists in establishing the reality of the roots directly... [Haw1, p. 20]

In today's language, we would say that Cauchy's research program was to show that a symmetric matrix has real eigenvalues. When "Sur l'équation" appeared in print three years later, Cauchy had succeeded in a direct proof of this fact. His argument began by establishing some basic properties for a real, homogeneous function of degree two in \(\mathbb{R}^n\). (For simplicity, we will only consider the three-dimensional version of the problem.)

Figure 1. In "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planétes" (1829), Cauchy used the Lagrange multiplier method to begin his eigenvalue problem. Image courtesy of the Bibliothèque Nationale de France's Gallica collection.

Cauchy restricted his function to the unit sphere, in which case its maxima and minima on the unit sphere may be found via the equation \[\frac{\varphi(x,y,z)}{x} = \frac{\chi(x,y,z)}{y} = \frac{\psi(x,y,z)}{z},\] where \(\varphi = \frac{\partial f}{\partial x}\), \(\chi = \frac{\partial f}{\partial y}\), and \(\psi = \frac{\partial f}{\partial z}\). (Readers might recognize this as the Lagrange multiplier method found in most multivariable calculus courses.) Next, he defined the quantities \(A_{xx}\), \(A_{xy}\), \(A_{yx}\), etc., according to the coefficients on the terms \(x^2\), \(xy\), \(yx\), etc. (Note that the pure second-order partial derivatives are equal to their corresponding \(A\) value, but in Cauchy's notation mixed partials are twice the corresponding \(A\) value.) Using the fact that \(A_{xy} = A_{yx}\), Cauchy obtained a system of equations that gave a solution to the eigenvalue problem.

Figure 2. Cauchy's method for determining extrema for a real, homogeneous function of degree two restricted to the unit sphere in \(\mathbb{R}^n\), from "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planétes" (1829). Image courtesy of the Bibliothèque Nationale de France's Gallica collection.

To better illuminate the connection between Cauchy's problem and a modern eigenvalue problem, let's consider the function \(f(x,y,z) = x^2 + xy - \frac{1}{2}z^2\) on the unit sphere. The nonzero second-order partial derivatives are \(\frac{\partial^2 f}{\partial x^2} = 2\), \(\frac{\partial^2 f}{\partial z^2} = -1\), and \(\frac{\partial^2 f}{\partial x \partial y} = 1\). In matrix form, we have \[\begin{bmatrix} A_{xx} & A_{xy} & A_{xz} \\ A_{yx} & A_{yy} & A_{yz} \\ A_{zx} & A_{zy} & A_{zz} \end{bmatrix} \;=\; \begin{bmatrix} 2 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & -1 \end{bmatrix},\] which is the same matrix from our earlier example.

Next, he used the determinant of the matrix (while notably using neither the word "determinant" or "matrix") to show that \(s\) must always be real. Cauchy reasoned that, if the characteristic polynomial had complex roots, they must come in conjugate pairs. Thus, there would be two \((n-1) \times (n-1)\) minors within the eigenvalue matrix whose determinants would be complex conjugates of each other. From here, Cauchy showed that the product of these determinants must be zero, i.e., the rank of the matrix is at most \(n-1\). Ultimately, his argument was a reductio ad absurdum, in which a symmetric matrix with non-real eigenvalues can be shown to have rank at most \(n-1\), then at most \(n-2\), etc., until the matrix itself vanishes. (The interested reader should take a look at the original paper, available from Gallica.)

Cauchy returned to the subject of linear equations in his 1840 paper "Mémoire sur l'integration des équations linéaires" [Cau2]. While this paper was focused on pure mathematics, his ultimate goal was to solve physics problems: "It is the integration of linear equations, and above all linear equations with constant coefficients, that is required for the solution of a large number of problems in mathematical physics." Later in the introduction, he described how to solve a system of linear differential equations using his newly-named équation caractéristique as the principal tool.

... after having used elimination to reduce the principal variables to a single one, we may, with the help of these methods, express the principal variable as a function of the independent variable and arbitrary constants, then... determine the values of the arbitrary constants, using simultaneous equations of the first degree... we obtain for the general value of the principal variable a function in which there enter along with the principal variable the roots of a certain equation that I will call the characteristic equation, the degree of this equation being precisely the order of the differential equation which is to be integrated. [Cau2, p. 53]

In the section "Intégration d'un système d'équations différentielles du premier ordre, linéaires et à coefficients constants" ("Integration of a system of linear, first-order differential equations with constant coefficients"), Cauchy described this process with a bit more detail.

Figure 3. Cauchy's method for solving a system of linear, first-order differential equations with constant coefficients, equivalent to a modern-day eigenvalue problem. From "Mémoire sur l'integration des équations linéaires" (1840); image courtesy of Archive.org.

For the sake of clarity, let us take \(n = 3\) and apply Cauchy's reasoning to our earlier example. In this case, the differential equations are \[\begin{cases} \frac{d\xi}{dt} + 2\xi + \eta & = 0 \\ \frac{d\eta}{dt} + \xi & = 0 \\ \frac{d\zeta}{dt} - \zeta & = 0 \end{cases}.\] In our example, it is easy to see that the equation for \(\zeta\) has solution \(\zeta = Ce^t\). Untangling the first two equations takes a bit more work. Cauchy's method would set \(\xi = Ae^{st}\) and \(\eta = Be^{st}\) and substitute into the system.

Figure 4. More of Cauchy's method for solving a system of linear, first-order differential equations with constant coefficients, equivalent to a modern-day eigenvalue problem. From "Mémoire sur l'integration des équations linéaires" (1840); image courtesy of Archive.org.

In our example, making the substitution and canceling the common term \(e^{st}\) will produce the system \[\begin{cases} (s + 2)A + B & = \; 0 \\ sB + A & = \; 0 \end{cases}\] which has nontrivial solutions when \(s = -1\pm\sqrt{2}\). Of course, we could write the whole system of differential equations in matrix form and divide by \(e^{st}\) to get \[\begin{bmatrix} s+2 & 1 & 0 \\ 1 & s & 0 \\ 0 & 0 & s-1 \end{bmatrix} \begin{bmatrix} A \\ B \\ C \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix},\] which is the same eigenvalue problem we saw at the beginning, except with \(-s\) substituted for \(\lambda\). In the general, \(n\)-dimensional case, one must reduce the system of equations via elimination to produce a degree \(n\) polynomial in \(s\). Cauchy named the equation, in which this general polynomial is set equal to \(0\), the "characteristic equation."

Figure 5. The conclusion of Cauchy's method for solving a system of linear, first-order differential equations with constant coefficients, including his naming of the équation caractéristique. From "Mémoire sur l'integration des équations linéaires" (1840); image courtesy of Archive.org.

We can easily see how Cauchy's method of solution is equivalent to a modern eigenvalue problem from a typical linear algebra course. As an aside, note that Cauchy referred to the eigenvalues as valeurs propres (proper values) near the end of this passage.

II. Latent and Singular Values

In 1846, only six years after Cauchy's "Mémoire sur l'integration des équations linéaires" appeared in print, Johann Gottfried Galle observed an object in the night sky from the Berlin observatory, roughly in the place that Le Verrier had predicted it would be found. Six years later, J. J. Sylvester published a seemingly-unrelated note in Philosophical Magazine on the use of matrices in solving homogeneous quadratic polynomials. In this paper, Sylvester used no terminology for the eigenvalues of a matrix, merely calling them "roots" of a determinant equation.

Figure 6. Sylvester's version of Cauchy's eigenvalue problem. From "A Demonstration of the Theorem That Every Homogeneous Quadratic Polynomial is Reducible By Real Orthogonal Substitutions to the Form of a Sum of Positive and Negative Squares" (1852). Image courtesy of Archive.org.

In a footnote, Sylvester referred to proofs of Cauchy's theorem by C. G. J. Jacobi and C. W. Borchardt, but it appears he did not have access to Cauchy's own work on the subject. Two decades later, in 1883, Sylvester published another note in Philosophical Magazine, "On the Equation to the Secular Inequalities in the Planetary Theory," with the obvious intent being to describe the mathematics of secular perturbation. After 21 years of reflection, Sylvester chose the word "latent" to refer to the roots of his matrix determinant, and began the paper with an explanation of this choice.

Figure 7. In his paper, "On the Equation to the Secular Inequalities in the Planetary Theory" (1883), Sylvester made the case for naming the latent roots of a matrix. Image courtesy of the University of Michigan Historical Mathematics Collection.

As an aside, we have seen Sylvester's name before in this "Math Origins" series, in his use of the word totient to describe the number of positive integers less than a given integer \(n\) which are relatively prime to \(n\). (He also suggested naming this function with the Greek letter \(\tau\) instead of Gauss's \(\varphi\), but this did not stick.) In both cases, he appeared to have an interest in providing mathematics with clear and rational nomenclature—even if his suggestions were not followed by the mathematical community.

Sylvester's use of "latent" was not the only attempt to rationalize the name of these quantities during this time. In 1896, Henri Poincaré wrote a sprawling, 99-page paper on mathematical equations that often appear in physics. In this paper, "Sur les Équations de la Physique Mathématique" ("On the equations of mathematical physics"), he surveyed the many mathematical equations and techniques that are employed in physics, including a method to solve differential equations which intersects with our interest in eigenvalues. Poincaré began with the differential equation \(\Delta P+\xi P + f D = 0\), where \(\xi\), \(f\), and \(P\) are a scalar, scalar function, and vector field, respectively, and \(\Delta\) denotes the divergence. After considerable analysis of this differential equation, he considered the special case in which \(D\) vanishes.

Figure 8. In "Sur les Équations de la Physique Mathématique" (1894), Poincaré chose the terms fonction harmonique and nombre caractéristique for eigenfunctions and eigenvalues. Image courtesy of Archive.org.

As we can see, Poincaré's name for an eigenfunction is "harmonic function," while the name for an eigenvalue is "characteristic number." In a contemporary (but considerably shorter) paper, Émile Picard developed a solution to a second order differential equation. Here, the word "singular" serves to name the eigenvalues, which Picard referred to as "singular points." In the excerpt below, \(y\) is a function of \(x\), \(k\) is a constant, and \(A(x)\) is a continuous, positive function on an interval \((a,b)\).

 

Figure 9. In "Sur les équations linéaires du second ordre renfermant un paramètre arbitraire" (1894), Picard chose the term points singuliers for eigenvalues. Image courtesy of the Bibliothèque Nationale de France's Gallica collection.

We can see that, while Sylvester, Poincaré, and Picard had very different goals in mind, they all felt a need to suggest a name for the roots of their respective equations. So, within the span of about 15 years the words "latent", "singular", and "harmonic" were all added to the mathematician's lexicon. Even though the naming conventions became more muddled during this period, we can at least see that mathematicians had separated the mathematics from the original notion of secular perturbation. Among Sylvester, Poincaré, and Picard, only Sylvester appeared to have a more general exposition in mind—his 1883 paper was meant to advertise a forthcoming work on "multiple algebra," which sought to give a systematic treatment of what would today be called linear algebra. This work is likely to have been "Lectures on the Principles of Universal Algebra," which appeared in Sylvester's American Journal of Mathematics in the following year.

III. Finding a Proper Eigenwert

So far, we have enjoyed a review of some 19th century works in astronomy and mathematics, and seen the many different names proposed for the solution of a certain matrix determinant calculation—characteristic equations, secular perturbations, singular points, harmonic functions, and latent values have all played a role. The reader may ask at this point: where are the eigenvalues? Indeed, it was not until David Hilbert's 1904 paper "Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen" ("Principles of a general theory of linear integral equations") [Hil1] that our perplexing prefix appeared in print. As the title suggests, this work was meant to provide a general mathematical theory for systems of linear equations, separated (at least in part) from its original context in the theory of celestial mechanics.

Mathematically, Hilbert took real-variable functions \(f(s)\) and \(K(s,t)\) as given, and wanted to find solutions to the equations \[\begin{eqnarray*} f(s) &=& \int_a^b K(s,t)\varphi(t)\,dt \\ f(s) &=& \varphi(s) - \lambda \int_a^b K(s,t)\varphi(t)\,dt\end{eqnarray*}\] where \(\varphi(s)\) and \(\lambda\) represent an unknown function and unknown parameter, respectively. Hilbert called these integral equations of the first kind and second kind, respectively. His goal here was to collect various strands of an idea and consolidate them into a single theory. As he wrote in his introduction,

A closer examination of the subject led me to the realization that a systematic construction of a general theory of linear integral equations is of the greatest importance for all of analysis, especially for the theory of definite integrals and the development of arbitrary functions into infinite series, and also for the theory of linear differential equations, as well as potential theory and the calculus of variations... I intend to reexamine the question of solving integral equations, but above all to look for the context and the general characteristics of the solutions... [Hil1, p. 50].

A careful reading of Hilbert's original text reveals that the word "characteristics" in the above quotation is a translation of the German Eigenschaften, which can mean either "characteristics" or "properties." Here we begin to see the linguistic connection between eigenvalues and Cauchy's équation caractéristique. Soon after, Hilbert described that his method employed "certain excellent functions, which I call eigenfunctions..." While he only claimed to organize and systematize the methods that came before, Hilbert's coinage of "eigenfunction" appears to be new. It is also noteworthy that Hilbert used eigenwerte to refer to eigenvalues.

As the 20th century unfolded, Hilbert's mathematical work heavily influenced the new physical theories of general relativity and quantum mechanics. Two important papers on these subjects are his "Die Grundlagen der Physik" ("Foundations of Physics") [Hil2] and "Über die Grundlagen der Quantenmechanik" ("Foundations of Quantum Mechanics") [HNN], which appeared in Mathematische Annalen in 1924 and 1928, respectively. The eigen- prefix appears in both papers. In the former, Hilbert coined the word Eigenzeit (in English, "eigentime") for eigenvalues of a particular matrix, while in the latter, he used both Eigenwert and Eigenfunktion when analyzing the Schrödinger wave equation. One of Hilbert's coauthors on this latter paper was a young John von Neumann, who emigrated to the United States in 1933 and made lasting contributions to both mathematics and physics. Crucially, von Neumann's long residence in the United States (he became a citizen in 1937) made his work more influential than most when it came to the choice of mathematical terminology. Now publishing mostly in English, von Neumann translated Hilbert's eigenfunktion as proper function (for an example, see [BvN]). While this usage was picked up by von Neumann's students and acolytes in North America, the half-translated form (eigenvalue for eigenwert, etc.) remained popular as well.

IV. Conclusion

By the middle of the 20th century, there were at least five different adjectives that could be used to refer to the solutions in our particular type of matrix equation: secular, characteristic, latent, eigen, and proper. In general, though, two naming conventions dominated: eigen- (from Hilbert's German writings) and characteristic/proper (from Cauchy's French writings and von Neumann's translation of eigen-). In the United States, the peculiar prefix eigen- won the debate, as described amusingly by Paul Halmos in his introduction to A Hilbert Space Problem Book in 1967:

For many years I have battled for proper values, and against the one and a half times translated German-English hybrid that is often used to refer to them. I have now become convinced that the war is over, and eigenvalues have won it; in this book I use them." [Hal, p. x]

By the time this book appeared in 1967, Halmos clearly felt the debate was over in the United States. While this is true, the global battle is not over! Both naming conventions are still in use around the world. A cursory search of modern languages reveals that mathematicians are still split on the issue. With apologies to those with more language expertise than I, here is a rough accounting of the word for "eigenvalue" in several widely-spoken languages around the world.

Language Word(s) for "eigenvalue" Literal translation
Arabic القيمة الذاتية self-value
German Eigenwert self-value
Russian собственное значение self-value
Chinese 特征值 characteristic value
French valeur propre proper value
Portuguese valor propio, autovalor proper value, self-value
Spanish valor próprio, autovalor proper value, self-value

As you can see, translations of Hilbert's eigen- prefix hold sway in Arabic, German, and Russian, while proper has taken hold in French, Portuguese, and Spanish. Of course, there is some ambiguity here, in that Portuguese and Spanish have words for both versions, and the Chinese text does not translate cleanly into either version. Note though, the English language's use of an untranslated German prefix—all other languages appear to translate from German or French using a word meaning "intrinsic" or "self" or "proper." So the peculiar prefix eigen- is not so peculiar after all. Rather, the peculiarity is that English uses a German prefix without translating it.

References

[BvN] Birkhoff, Garret and von Neumann, John. "The Logic of Quantum Mechanics," Annals of Mathematics, Second Series, 37 No. 4 (Oct. 1936), 823-843.

[Cau1] Cauchy, A.-L. "Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planétes," Exercises de mathématiques 4, in Œuvres complètes d'Augustin Cauchy, Paris: Gauthier-Villars et fils, 2 No. 9, 174-95.

[Cau2] Cauchy, A.-L. "Mémoire sur l'integration des équations linéaires," Exercises d'analyse et de physique mathématique, 1 (1840), 53-100.

[Hal] Halmos, Paul R. A Hilbert Space Problem Book. Princeton, NJ: D. Van Nostrand Company, Inc., 1967.

[Haw1] Hawkins, T. W. "Cauchy and the spectral theory of matrices," Historia Mathematica, 2 (1975), 1-29.

[Hil1] Hilbert, David. "Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen," Nachrichten von d. Königl. Ges. d. Wissensch. zu Göttingen (Math.-physik. Kl.), (1904), 49-91.

[Hil2] Hilbert, David. "Die Grundlagen der Physik," Math. Annalen, 92 Issue 1-2 (March 1924), 1-32.

[HNN] Hilbert, David, von Neumann, John, and Nordheim, Lothar. "Über die Grundlagen der Quantenmechanik," Math. Annalen, 98 Issue 1 (March 1928), 1-30.

[Pic] Picard, Émile. "Sur les équations linéaires du second ordre renfermant un paramètre arbitraire," Comptes Rendus Hebdomadaires Séances de l'Académie des Sciences, 118 (1894), 379-383.

[Poi] Poincaré, Henri. "Sur les Équations de la Physique Mathématique," Rendiconti del Circolo Matematico di Palermo, 8 (1894), 57-155.

[Syl1] Sylvester, J. J. "A Demonstration of the Theorem That Every Homogeneous Quadratic Polynomial is Reducible By Real Orthogonal Substitutions to the Form of a Sum of Positive and Negative Squares," Philosophical Magazine, 4 (1852), 138-142.

[Syl2] Sylvester, J. J. "On the Equation to the Secular Inequalities in the Planetary Theory," Philosophical Magazine, 16 (1883), 267-269.