Tyler Chen has just released an excellent monograph on the Lanczos method, one of the most powerful algorithms for extracting information from a symmetric matrix . I would definitely recommend checking it out.
One of the subjects touched on by this book is stochastic trace estimation, a topic I’ve written about several times before on this blog. In his book, Tyler uses stochastic trace estimates based on random vectors on the sphere (a great choice), and he provides a simple, but often pessimistic, mathematical analysis of this approach. To complement Tyler’s amazing book, I felt it was a good opportunity to share a sharper analysis that I came up with using a more sophisticated argument. I would not be surprised if this argument appears elsewhere in the literature, but I am not aware of a source.
Let be random vectors in
drawn uniformly at random from the sphere of radius
,1Informally, we define the uniform distribution on the sphere to be such that the probability that
belongs to a set
is the ratio of the (hyper-)surface area of
to the (hyper-)surface area of the full sphere. Formally, the uniform distribution on the sphere can be defined multiple ways. First, treating the sphere as a subset of
, we can define the uniform distribution as the
-dimensional Hausdorff measure, normalized so that the sphere has unit measure. Second, observe that the sphere is acted upon in a natural way by the group of orthogonal matrices. The uniform distribution is the unique probability measure which is invariant to this group action. let
be a real symmetric matrix, and define the quadratic form



![Rendered by QuickLaTeX.com \expect[q] = \expect[\hat{\tr}] = \tr A](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-1fca19e415f63f68b7b6ecf288acbd80_l3.png)

Hanson–Wright on the Sphere
Observe that is an average of
independent copies of the random variable
. Therefore, we will begin our analysis with the quadratic form
and discuss the Girard–Hutchinson estimate
at the end of the post.
Centering
Begin by introducing the centered matrix



Now, rewrite in terms of the centered matrix
:




We have shown that the error depends only on the centered matrix
rather than the original matrix
. This observation is at the root of the reason that quadratic forms with vectors on the sphere have smaller tail probabilities than other distributions like the Gaussian distribution. Indeed,
is smaller than
when measured using the Frobenius norm
, and the spectral norm is never much larger
. These observations will be important later.
Laplace Transform Method and Comparison to Standard Gaussian Vectors
To derive exponential concentration inequalities, we make use of the Laplace transform method. For a random variable , define the cumulant generating function (cgf)

To estimate the cgf of , we will make use of a comparison between Gaussian vectors and random vectors on the sphere. Recall that a standard Gaussian vector
is a vector with independent standard Gaussian entries. Standard Gaussian random vectors are spherically symmetric, meaning they can be written in the form
, where
is a uniform random vector on the sphere and
is a scaling factor.2Concretely,
is a
random variable with
degrees of freedom. The scaling factor
is independent of
and, in this case, satisfies
.3Indeed, the expected squared length of a standard Gaussian random vector is
. But also,
. Therefore,
.
Using this relationship, the error can be connected to the standard Gaussian random vector
as follows


Proposition (Scaling hurts). Let
be a random variable and let
be a random variable with expectation
, independent of
. Then
To prove this result, let and
denote expectations take over the randomness in
and
separately. The total expectation is
. Begin with the right-hand side and invoke Jensen’s inequality over
:
![Rendered by QuickLaTeX.com \expect[b] = 1](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-d309968b9a6a91bbf504b3813908e619_l3.png)
Having established this proposition, we conclude that for all
.
Completing the Analysis
Finally, invoking a standard bound for the cgf of for a trace-zero matrix
, we obtain


Is This Bound Good?
Tail bounds for quadratic forms such as or
are known as Hanson–Wright inequalities. For vectors on the sphere, our bound shows the tail probabilities decay like
for small
and
for large
. This pattern of results “subgaussian scaling for small
, subexponential scaling for large
” is typical for Hanson–Wright inequalities.
The Hanson–Wright inequalities for vectors on the sphere exhibit faster decay rate than for standard Gaussian vectors. Indeed, a standard Gaussian random vector obeys a Hanson–Wright inequality of the form







As another way to evaluate the quality of these bounds, recall that a Gaussian random variable with mean zero and variance has tail probabilities roughly of size
. For small
, our Hanson–Wright inequality for vectors on the sphere has the same form with
. The true variance of
is



Girard–Hutchinson Estimator on the Sphere
Using our results for one quadratic form, tail bounds for the Girard–Hutchinson estimator easily follow. Indeed, the cgf is additive for independent random variables and satisfies the identity
for constant $c, so