The Journal of Online Mathematics and Its Applications, Volume 8 (2008)
The Most Marvelous Theorem in Mathematics, Dan Kalman
As shown here, it is always possible to reduce the quadratic equation
Ax2 + Bxy + Cy2 + Dx + Ey + F = 0
to a much simpler form through rotation and/or translation. A translation is necessary exactly when at least one of D and E is non-zero. But even in this case, the identity of a curve as an ellipse is completely determined by the values of just A, B, and C. Consequently, for this discussion, we will look only at the simpler case D = E = 0. We will also assume that F is not zero. There will be no constant term in the original equation if and only if there is no constant term in the equation for the rotated curve. Thus we know a priori that no ellipse can occur with a constant term of 0. As a final simplification, given that F is not 0, we may divide both sides of the original equation by that constant and obtain a constant term of 1. That gives an equation of the form
| (1) | Ax2 + Bxy + Cy2 = 1 |
We have two goals. The first is to see that this represents an ellipse precisely when A and C are both positive and B2 − 4AC < 0. In another discussion elsewhere, it has been shown that an ellipse centered at the origin has an equation of the desired type. Therefore, it remains only to show that if an equation has the form given above, it must correspond to an ellipse.
The second goal is to show that a linear transformation takes an ellipse to an ellipse. In particular, suppose the matrix M is applied to every point of the ellipse given by equation (1). We must show that the resulting curve is still an ellipse. To do so we find the equation of the new curve, and show that it can again be expressed in the form of equation (1), and that the conditions A >0, C >0, and B2 − 4AC < 0 will still hold for the new equation.
And how do we find the equation of the new curve? There is a simple method for this: replace xand y in equation (1) with the coordinates of the point that M transformed into (x, y), that is, with the coordinates of
. (The justification for this procedure is detailed in the section on Transformed Equations).
Turning to the first goal, consider the form that equation (1) will take after rotating the corresponding curve. As shown in the section on Quadratic Conics, the equation is
A(x cos α + y sin α )2 + B(x cos α + y sin α)(y cos α − x sin α) + C(y cos α − x sin α)2 = 1,
and xy can be eliminated by choosing cot 2α = (C − A)/B. Therefore, neglecting the xy terms, expanding the preceding equation and collecting like terms gives an equation of the form A′x2 + C′y2 = 1, where A′ and C′ are given by
A′ = A cos2α − B cos α sin α + Csin2α
C′ = A sin2α + B cos α sin α + C cos2α
We must show that both of these are positive, assuming A >0, C >0, and B2 − 4AC < 0. In that case, setting a2 = 1/A′ and b2 = 1/C′, the equation of the transformed curve is in the form of equation (1), and so is an ellipse.
Note that all appearances of α in the these equations can be expressed in terms of cos(2α) and sin(2α) using the identities cos α sin α = sin(2α) / 2, cos2α = (1 + cos(2α)) / 2, and sin2α = (1 − cos(2α)) / 2. That leads to
| (2) | A′ = [A + C − (C − A) cos 2α − B sin 2α] / 2 |
| C′ = [A + C + (C − A) cos 2α + B sin 2α] / 2 |
Next, from cot 2α = (C − A) / B, we see that cos(2α) = (C − A) / R and sin(2α) = B / R, where R2 = (C − A)2 + B2. Substituting these into both equations in (2) and simplifying ultimately produces
A′ = (A + C − R) / 2
C′ = (A + C + R) / 2
Finally, the assumption that A and C are both positive shows that C′ is also positive, and from B2 − 4AC < 0, we can also confirm that A′ is positive. Thus we have arrived at the first goal: we have shown that equation (1) represents an ellipse if and only if A > 0, C > 0, and B2 − 4AC < 0.
To reach the second goal, it is productive to express equation (1) in matrix notation:
| (3) | ![]() |
The left side of equation (1) is a quadratic form, and the matrix form in equation (3) is often very useful in the analysis of such forms. For reference purposes, let us call the 2 by 2 matrix in equation (3) Q. Notice the way the entries x and y appear twice in the equation. Preceding Q they are aligned horizontally, forming what is referred to as a row. Following Q they are stacked vertically (a column). It turns out that in manipulating equations of this sort, it is often helpful to change rows into columns and vice versa. This is referred to as the transpose operation, and it is defined as follows. Given a matrix A, we create a new matrix A′ by rewriting each row, in order, as a column. Thus, the first row of A becomes the first column of A′, the second row of A becomes the second column of A′, and so on. The new matrix is called the transpose of A, and it is a standard notation to label it A′ . With this terminology, in equation (1) the row preceding Q and the column following Q are transposes of each other. Therefore, we can express equation (3) more compactly as [x y]Q[x y]′ = 1.
Notice, too, that Q is its own transpose, or in equation form, Q = Q′. The first row of Q has the same entries as the first column, and likewise for the second row and column. Therefore, changing rows to columns has no visible effect on Q. When a matrix has this property, it is said to be symmetric.
One final comment on the transpose operation: it combines with matrix multiplication according to the rule (AB)′ = (B′)(A′). As an illustration, consider the transpose of the entire left side of equation (1). We find
([x y]Q[x y]′)′ = [x y]′′Q′[x y]′ = [x y]Q[x y]′
At the final step we used the facts that transposing any matrix twice leaves the matrix unchanged (so A′′=A) and Q′ = Q.
Here is how these ideas can be applied in the problem at hand. We are applying a new matrix, M, to every point of the curve given by equation (3), and wish to derive the equation for the resulting curve. As observed above, this can be achieved by replacing every instance of
with
. Or, using transpose notation, we have to replace each instance of [x y] with [x y](M − 1)′. That leads to the new equation
[x y](M − 1)′QM − 1[x y]′ = 1.
By combining all three square matrices into a single new matrix, P, the equation becomes [x y] P [x y]′ = 1. Alternatively, if the entries of M − 1 are s, t, u, and v (reading across the rows), then the transformed equation is found to be| (4) | ![]() |
Now we can apply some elementary matrix theory. We are assuming that B2 − 4AC < 0. This is closely related to the determinant of Q, defined as |Q| = AC − B2/4, and shows that |Q| > 0. We would like to show that |P| is also positive. Happily, the determinant function has some useful properties. First, for any matrices H and K, the identity |HK| = |H||K| holds. Second, |H| = |H′| is always true. These combine to give
|P| = |(M − 1)′QM − 1| = |(M − 1)′| |Q| |M − 1| = |Q| |M − 1|2.
Thus, |P| >0, as desired.
All that remains is to show that in the new matrix P, the diagonal entries (that is, the upper left and lower right) are both positive. This can be done by carrying out the matrix product (M − 1)′QM − 1 that defines P and then using the determinant inequality again. However, the details for this step will be left to the interested reader.