A singular value decomposition of an matrix is a factorization of the form , where and are square, orthogonal matrices and is a diagonal matrix with th entry .1Everything carries over essentially unchanged for complex-valued matrices with and being unitary matrices and every being replaced by for the Hermitian transpose. The diagonal entries of are referred to as the singular values of and are conventionally ordered . The columns of the matrices and are referred to as the right- and left- singular vectors of and satisfy the relations and .
One can obtain the singular values and right and left singular vectors of from the eigenvalues and eigenvectors of and . This follows from the calculations and . In other words, the nonzero singular values of are the square roots of the nonzero eigenvalues of and . If one merely solves one of these problems, computing along with or , one can obtain the other matrix or by computing or . (These formulas are valid for invertible square matrices , but similar formulas hold for singular or rectangular to compute the singular vectors with nonzero singular values.)
This approach is often unundesirable for several reasons. Here are a few I’m aware of:
- Accuracy: Roughly speaking, in double-precision arithmetic, accurate stable numerical methods can resolve differences on the order of 16 orders of magnitude. This means an accurately computed SVD of can resolve the roughly 16 orders of magnitude of decaying singular values, with singular values smaller than that difficult to compute accurately. By computing , we square all of our singular values, so resolving 16 orders of magnitude of the eigenvalues of means we only resolve 8 orders of magnitude of the singular values of .2Relatedly, the two-norm condition number of is twice that of . The dynamic range of our numerical computations has been cut in half!
- Loss of orthogonality: While and are valid formulas in exact arithmetic, they fair poorly when implemented numerically. Specifically, the numerically computed values and may not be orthogonal matrices with, for example, not even close to the identity matrix. One can, of course, orthogonalize the computed or , but this doesn’t fix the underlying problem that or have not been computed accurately.
- Loss of structure: If possesses additional structure (e.g. sparsity), this structure may be lost or reduced by computing the product .
- Nonlinearity: Even if we’re not actually computing the SVD numerically but doing analysis with pencil and paper, finding the SVD of from has the disadvantage of performing a nonlinear transformation on . This prevents us from utilizing additive perturbation theorems for sums of symmetric matrices in our analysis.3For instance, one cannot prove Weyl’s perturbation theorem for singular values by considering and applying Weyl’s perturbation theorem for symmetric eigenvalues.
There are times where these problems are insignificant and this approach is sensible: we shall return to this point in a bit. However, these problems should disqualify this approach from being the de facto way we reduce SVD computation to a symmetric eigenvalue problem. This is especially true since we have a better way.
The better way is by constructing the so-called Hermitian dilation4As Stewart and Sun detail in Section 4 of Chapter 1 of their monograph Matrix Perturbation Theory, the connections between the Hermitian dilation and the SVD go back to the discovery of the SVD itself, as it is used in Jordan’s construction of the SVD in 1874. (The SVD was also independently discovered by Beltrami the year previous.) Stewart and Sun refer to this matrix as the Jordan-Wiedlant matrix associated with , as they attribute the widespread use of the matrix today to the work of Wiedlant. We shall stick to the term Hermitian dilation to refer to this matrix. of , which is defined to be the matrix
(1)
One can show that the nonzero eigenvalues of are precisely plus-or-minus the singular values of . More specifically, we have
(2)
All of the remaining eigenvalues of not of this form are zero.5This follows by noting and thus for account for all the nonzero eigenvalues of . Thus, the singular value decomposition of is entirely encoded in the eigenvalue decomposition of .
This approach of using the Hermitian dilation to compute the SVD of fixes all the issues identified with the “” approach. We are able to accurately resolve a full 16 orders of magnitude of singular values. The computed singular vectors are accurate and numerically orthogonal provided we use an accurate method for the symmetric eigenvalue problem. The Hermitian dilation preserves important structural characteristics in like sparsity. For purposes of theoretical analysis, the mapping is linear.6The linearity of the Hermitian dilation gives direct extensions of most results about the symmetric eigenvalues to singular values; see Exercise 22.
Often one can work with the Hermitian dilation only implicitly: the matrix need not actually be stored in memory with all its extra zeros. The programmer designs and implements an algorithm with in mind, but deals with the matrix directly for their computations. In a pinch, however, forming directly in software and utilizing symmetric eigenvalue routines directly is often not too much less efficient than a dedicated SVD routine and can cut down on programmer effort significantly.
As with all things in life, there’s no free lunch here. There are a couple of downsides to the Hermitian dilation approach. First, is, except for the trivial case , an indefinite symmetric matrix. By constast, and are positive semidefinite, which can be helpful in some contexts.7This is relevant if, say, we want to find the small singular values by inverse iteration. Positive definite linear systems are easier to solve by either direct or iterative methods. Further, if (respectively, ), then (respectively, ) is tiny compared to , so it might be considerably cheaper to compute an eigenvalue decomposition of (or ) than .
Despite the somewhat salacious title of this article, the and Hermitian dilation approaches both have their role, and the purpose of this article is not to say the approach should be thrown in the dustbin. However, in my experience, I frequently hear the approach stated as the definitive way of converting an SVD into an eigenvalue problem, with the Hermitian dilation approach not even mentioned. This, in my opinion, is backwards. For accuracy reasons alone, the Hermitian dilation should be the go-to tool for turning SVDs into symmetric eigenvalue problems, with the approach only used when the problem is known to have singular values which don’t span many orders of magnitude or is tall and skinny and the computational cost savings of the approach are vital.
Great explanation.
The increased accuracy also means that in many cases it is possible to move to 32bit float values, which are much faster on a lot of hardware.
This is an excellent point and probably something I should have explored in the original post. Being able to move to single or even half precision which, as you mentioned, can be significantly faster and require less data movement is an important reason to get as much accuracy as you can out of your floating point number system. Maybe 16 orders of magnitude of dynamic range for double precision may seem unnecessary, but the difference between two and four orders of magnitude range in half precision is the difference between half precision being useless and totally fine in many applications.
Even in the extremely skinny (or fat) case the Hermitian dilation can be made just as efficient by first doing a QR (or LQ) factorization and then doing the Hermitian dilation on R (or L).
Nice post! One question for the very beginning: Is there a quick way to see that B^T B and BB^T have the same eigenvalues (I mean the ones which are nonzero)? Using the SVD it’s obvious, but without it?
This is a consequence of the more general fact that BC and CB always have the same nonzero eigenvalues (even if B, C are rectangular). There are many, many proofs of this fact (Fuzhen Zhang’s book contains 4!). One of my favorites (specific to the case when B, C are square and nonsingular): BC = B(CB)B⁻¹, so BC and CB are similar and possess the same eigenvalues. (In fact, the nonsingular square case implies the general singular or rectangular case by continuity of eigenvalues and padding rectangular matrices by zeros until they are square.)