A singular value decomposition of an  matrix
 matrix  is a factorization of the form
 is a factorization of the form  , where
, where  and
 and  are square, orthogonal matrices and
 are square, orthogonal matrices and  is a diagonal matrix with
 is a diagonal matrix with  th entry
th entry  .1Everything carries over essentially unchanged for complex-valued matrices
.1Everything carries over essentially unchanged for complex-valued matrices  with
 with  and
 and  being unitary matrices and every
 being unitary matrices and every  being replaced by
 being replaced by  for
 for  the Hermitian transpose. The diagonal entries of
 the Hermitian transpose. The diagonal entries of  are referred to as the singular values of
 are referred to as the singular values of  and are conventionally ordered
 and are conventionally ordered  . The columns of the matrices
. The columns of the matrices  and
 and  are referred to as the right- and left- singular vectors of
 are referred to as the right- and left- singular vectors of  and satisfy the relations
 and satisfy the relations  and
 and  .
.
One can obtain the singular values and right and left singular vectors of  from the eigenvalues and eigenvectors of
 from the eigenvalues and eigenvectors of  and
 and  . This follows from the calculations
. This follows from the calculations  and
 and  . In other words, the nonzero singular values of
. In other words, the nonzero singular values of  are the square roots of the nonzero eigenvalues of
 are the square roots of the nonzero eigenvalues of  and
 and  . If one merely solves one of these problems, computing
. If one merely solves one of these problems, computing  along with
 along with  or
 or  , one can obtain the other matrix
, one can obtain the other matrix  or
 or  by computing
 by computing  or
 or  . (These formulas are valid for invertible square matrices
. (These formulas are valid for invertible square matrices  , but similar formulas hold for singular or rectangular
, but similar formulas hold for singular or rectangular  to compute the singular vectors with nonzero singular values.)
 to compute the singular vectors with nonzero singular values.)
This approach is often unundesirable for several reasons. Here are a few I’m aware of:
- Accuracy: Roughly speaking, in double-precision arithmetic, accurate stable numerical methods can resolve differences on the order of 16 orders of magnitude. This means an accurately computed SVD of  can resolve the roughly 16 orders of magnitude of decaying singular values, with singular values smaller than that difficult to compute accurately. By computing can resolve the roughly 16 orders of magnitude of decaying singular values, with singular values smaller than that difficult to compute accurately. By computing , we square all of our singular values, so resolving 16 orders of magnitude of the eigenvalues of , we square all of our singular values, so resolving 16 orders of magnitude of the eigenvalues of means we only resolve 8 orders of magnitude of the singular values of means we only resolve 8 orders of magnitude of the singular values of .2Relatedly, the two-norm condition number .2Relatedly, the two-norm condition number of of is twice that of is twice that of . The dynamic range of our numerical computations has been cut in half! . The dynamic range of our numerical computations has been cut in half!
- Loss of orthogonality: While  and and are valid formulas in exact arithmetic, they fair poorly when implemented numerically. Specifically, the numerically computed values are valid formulas in exact arithmetic, they fair poorly when implemented numerically. Specifically, the numerically computed values and and may not be orthogonal matrices with, for example, may not be orthogonal matrices with, for example, not even close to the identity matrix. One can, of course, orthogonalize the computed not even close to the identity matrix. One can, of course, orthogonalize the computed or or , but this doesn’t fix the underlying problem that , but this doesn’t fix the underlying problem that or or have not been computed accurately. have not been computed accurately.
- Loss of structure: If  possesses additional structure (e.g. sparsity), this structure may be lost or reduced by computing the product possesses additional structure (e.g. sparsity), this structure may be lost or reduced by computing the product . .
- Nonlinearity: Even if we’re not actually computing the SVD numerically but doing analysis with pencil and paper, finding the SVD of  from from has the disadvantage of performing a nonlinear transformation on has the disadvantage of performing a nonlinear transformation on . This prevents us from utilizing additive perturbation theorems for sums of symmetric matrices in our analysis.3For instance, one cannot prove Weyl’s perturbation theorem for singular values by considering . This prevents us from utilizing additive perturbation theorems for sums of symmetric matrices in our analysis.3For instance, one cannot prove Weyl’s perturbation theorem for singular values by considering and applying Weyl’s perturbation theorem for symmetric eigenvalues. and applying Weyl’s perturbation theorem for symmetric eigenvalues.
There are times where these problems are insignificant and this approach is sensible: we shall return to this point in a bit. However, these problems should disqualify this approach from being the de facto way we reduce SVD computation to a symmetric eigenvalue problem. This is especially true since we have a better way.
The better way is by constructing the so-called Hermitian dilation4As Stewart and Sun detail in Section 4 of Chapter 1 of their monograph Matrix Perturbation Theory, the connections between the Hermitian dilation and the SVD go back to the discovery of the SVD itself, as it is used in Jordan’s construction of the SVD in 1874. (The SVD was also independently discovered by Beltrami the year previous.) Stewart and Sun refer to this matrix as the Jordan-Wiedlant matrix associated with  , as they attribute the widespread use of the matrix today to the work of Wiedlant. We shall stick to the term Hermitian dilation to refer to this matrix. of
, as they attribute the widespread use of the matrix today to the work of Wiedlant. We shall stick to the term Hermitian dilation to refer to this matrix. of  , which is defined to be the matrix
, which is defined to be the matrix 
 (1)    
One can show that the nonzero eigenvalues of  are precisely plus-or-minus the singular values of
 are precisely plus-or-minus the singular values of  . More specifically, we have
. More specifically, we have
 (2)    
All of the remaining eigenvalues of  not of this form are zero.5This follows by noting
 not of this form are zero.5This follows by noting  and thus
 and thus  for
 for  account for all the nonzero eigenvalues of
 account for all the nonzero eigenvalues of  . Thus, the singular value decomposition of
. Thus, the singular value decomposition of  is entirely encoded in the eigenvalue decomposition of
 is entirely encoded in the eigenvalue decomposition of  .
.
This approach of using the Hermitian dilation  to compute the SVD of
 to compute the SVD of  fixes all the issues identified with the “
 fixes all the issues identified with the “ ” approach. We are able to accurately resolve a full 16 orders of magnitude of singular values. The computed singular vectors are accurate and numerically orthogonal provided we use an accurate method for the symmetric eigenvalue problem. The Hermitian dilation
” approach. We are able to accurately resolve a full 16 orders of magnitude of singular values. The computed singular vectors are accurate and numerically orthogonal provided we use an accurate method for the symmetric eigenvalue problem. The Hermitian dilation  preserves important structural characteristics in
 preserves important structural characteristics in  like sparsity. For purposes of theoretical analysis, the mapping
 like sparsity. For purposes of theoretical analysis, the mapping  is linear.6The linearity of the Hermitian dilation gives direct extensions of most results about the symmetric eigenvalues to singular values; see Exercise 22.
 is linear.6The linearity of the Hermitian dilation gives direct extensions of most results about the symmetric eigenvalues to singular values; see Exercise 22.
Often one can work with the Hermitian dilation only implicitly: the matrix  need not actually be stored in memory with all its extra zeros. The programmer designs and implements an algorithm with
 need not actually be stored in memory with all its extra zeros. The programmer designs and implements an algorithm with  in mind, but deals with the matrix
 in mind, but deals with the matrix  directly for their computations. In a pinch, however, forming
 directly for their computations. In a pinch, however, forming  directly in software and utilizing symmetric eigenvalue routines directly is often not too much less efficient than a dedicated SVD routine and can cut down on programmer effort significantly.
 directly in software and utilizing symmetric eigenvalue routines directly is often not too much less efficient than a dedicated SVD routine and can cut down on programmer effort significantly.
As with all things in life, there’s no free lunch here. There are a couple of downsides to the Hermitian dilation approach. First,  is, except for the trivial case
 is, except for the trivial case  , an indefinite symmetric matrix. By constast,
, an indefinite symmetric matrix. By constast,  and
 and  are positive semidefinite, which can be helpful in some contexts.7This is relevant if, say, we want to find the small singular values by inverse iteration. Positive definite linear systems are easier to solve by either direct or iterative methods. Further, if
 are positive semidefinite, which can be helpful in some contexts.7This is relevant if, say, we want to find the small singular values by inverse iteration. Positive definite linear systems are easier to solve by either direct or iterative methods. Further, if  (respectively,
 (respectively,  ), then
), then  (respectively,
 (respectively,  ) is tiny compared to
) is tiny compared to  , so it might be considerably cheaper to compute an eigenvalue decomposition of
, so it might be considerably cheaper to compute an eigenvalue decomposition of  (or
 (or  ) than
) than  .
.
Despite the somewhat salacious title of this article, the  and Hermitian dilation approaches both have their role, and the purpose of this article is not to say the
 and Hermitian dilation approaches both have their role, and the purpose of this article is not to say the  approach should be thrown in the dustbin. However, in my experience, I frequently hear the
 approach should be thrown in the dustbin. However, in my experience, I frequently hear the  approach stated as the definitive way of converting an SVD into an eigenvalue problem, with the Hermitian dilation approach not even mentioned. This, in my opinion, is backwards. For accuracy reasons alone, the Hermitian dilation should be the go-to tool for turning SVDs into symmetric eigenvalue problems, with the
 approach stated as the definitive way of converting an SVD into an eigenvalue problem, with the Hermitian dilation approach not even mentioned. This, in my opinion, is backwards. For accuracy reasons alone, the Hermitian dilation should be the go-to tool for turning SVDs into symmetric eigenvalue problems, with the  approach only used when the problem is known to have singular values which don’t span many orders of magnitude or
 approach only used when the problem is known to have singular values which don’t span many orders of magnitude or  is tall and skinny and the computational cost savings of the
 is tall and skinny and the computational cost savings of the  approach are vital.
 approach are vital.
Great explanation.
The increased accuracy also means that in many cases it is possible to move to 32bit float values, which are much faster on a lot of hardware.
This is an excellent point and probably something I should have explored in the original post. Being able to move to single or even half precision which, as you mentioned, can be significantly faster and require less data movement is an important reason to get as much accuracy as you can out of your floating point number system. Maybe 16 orders of magnitude of dynamic range for double precision may seem unnecessary, but the difference between two and four orders of magnitude range in half precision is the difference between half precision being useless and totally fine in many applications.
Even in the extremely skinny (or fat) case the Hermitian dilation can be made just as efficient by first doing a QR (or LQ) factorization and then doing the Hermitian dilation on R (or L).
Nice post! One question for the very beginning: Is there a quick way to see that B^T B and BB^T have the same eigenvalues (I mean the ones which are nonzero)? Using the SVD it’s obvious, but without it?
This is a consequence of the more general fact that BC and CB always have the same nonzero eigenvalues (even if B, C are rectangular). There are many, many proofs of this fact (Fuzhen Zhang’s book contains 4!). One of my favorites (specific to the case when B, C are square and nonsingular): BC = B(CB)B⁻¹, so BC and CB are similar and possess the same eigenvalues. (In fact, the nonsingular square case implies the general singular or rectangular case by continuity of eigenvalues and padding rectangular matrices by zeros until they are square.)