I am excited to share that our paper XTrace: Making the most of every sample in stochastic trace estimation has been published in the SIAM Journal on Matrix Analysis and Applications. (See also our paper on arXiv.)
Spurred by this exciting news, I wanted to take the opportunity to share one of my favorite results in randomized numerical linear algebra: a “speed limit” result of Meyer, Musco, Musco, and Woodruff that establishes a fundamental limitation on how accurate any trace estimation algorithm can be.
Let’s back up. Given an unknown square matrix , the trace of
, defined to be the sum of its diagonal entries




To simplify things, let’s assume the matrix is symmetric and positive (semi)definite. The classical algorithm for trace estimation is due to Girard and Hutchinson, producing a probabilistic estimate
with a small average (relative) error:


This state of affairs was greatly improved by Meyer, Musco, Musco, and Woodruff. Building upon previous work, they proposed the Hutch++ algorithm and proved it outputs an estimate satisfying the following bound:
(1)

The MMMW Trace Estimation “Speed Limit”
Given the dramatic improvement of Hutch++ and XTrace over Girard–Hutchinson, it is natural to hope: Is there an algorithm that does even better than Hutch++ and XTrace? For instance, is there an algorithm satisfying an even slightly better error bound of the form
Let’s add some fine print. Consider an algorithm for the trace estimation problem. Whenever the algorithm wants, it can present a vector and receive back
. The algorithm is allowed to be adaptive: It can use the matvecs
it has already collected to decide which vector
to present next. We measure the cost of the algorithm in terms of the number of matvecs alone, and the algorithm knows nothing about the psd matrix
other what it learns from matvecs.
One final stipulation:
Simple entries assumption. We assume that the entries of the vectors
presented by the algorithm are real numbers between
and
with up to
digits after the decimal place.
To get a feel for this simple entries assumption, suppose we set . Then
would be an allowed input vector, but
would not be (too many digits after the decimal place). Similarly,
would not be valid because its entries exceed
. The simple entries assumption is reasonable as we typically represent numbers on digital computers by storing a fixed number of digits of accuracy.1We typically represent numbers on digital computers by floating point numbers, which essentially represent numbers using scientific notation like
. For this analysis of trace estimation, we use fixed point numbers like
(no powers of ten allowed)!
With all these stipulations, we are ready to state the “speed limit” for trace estimation proved by Meyer, Musco, Musco, and Woodruff:
Informal theorem (Meyer, Musco, Musco, Woodruff). Under the assumptions above, there is no trace estimation algorithm producing an estimate
satisfying
We will see a slightly sharper version of the theorem below, but this statement captures the essence of the result.
Communication Complexity
To prove the MMMW theorem, we have to take a journey to the beautiful subject of communication complexity. The story is this. Alice and Bob are interested in solving a computational problem together. Alice has her input and Bob has his input
, and they are interested in computing a function
of both their inputs.
Unfortunately for the two of them, Alice and Bob are separated by a great distance, and can only communicate by sending single bits (0 or 1) of information over a slow network connection. Every bit of communication is costly. The field of communication complexity is dedicated to determining how efficiently Alice and Bob are able to solve problems of this form.
The Gap-Hamming problem is one example of a problem studied in communication complexity. As inputs, Alice and Bob receive vectors with
and
entries from a third party Eve. Eve promises Alice and Bob that their vectors
and
satisfy one of two conditions:
(2)
There’s one simple solution to this problem: First, Bob sends his whole input vector to Alice. Each entry of
takes one of the two value
and can therefore be communicated in a single bit. Having received
, Alice computes
, determines whether they are in case 0 or case 1, and sends Bob a single bit to communicate the answer. This procedure requires
bits of communication.
Can Alice and Bob still solve this problem with many fewer than bits of communication, say
bits? Unfortunately not. The following theorem of Chakrabati and Regev shows that roughly
bits of communication are needed to solve this problem:
Theorem (Chakrabati–Regev). Any algorithm which solves the Gap-Hamming problem that succeeds with at least
probability for every pair of inputs
and
(satisfying one of the conditions (2)) must take
bits of communication.
Here, is big-Omega notation, closely related to big-O notation
and big-Theta notation
. For the less familiar, it can be helpful to interpret
,
, and
as all standing for “proportional to
”. In plain language, the theorem of Chakrabati and Regev result states that there is no algorithm for the Gap-Hamming problem that much more effective than the basic algorithm where Bob sends his whole input to Alice (in the sense of requiring less than
bits of communication).
Reducing Gap-Hamming to Trace Estimation
This whole state of affairs is very sad for Alice and Bob, but what does it have to do with trace estimation? Remarkably, we can use hardness of the Gap-Hamming problem to show there’s no algorithm that fundamentally improves on Hutch++ and XTrace. The argument goes something like this:
- If there were a trace estimation algorithm fundamentally better than Hutch++ and XTrace, we could use it to solve Gap-Hamming in fewer than
bits of communication.
- But no algorithm can solve Gap-Hamming in fewer than
bits or communication.
- Therefore, no trace estimation algorithm is fundamentally better than Hutch++ and XTrace.
Step 2 is the work of Chakrabati and Regev, and step 3 follows logically from 1 and 2. Therefore, we are left to complete step 1 of the argument.
Protocol
Assume we have access to a really good trace estimation algorithm. We will use it to solve the Gap-Hamming problem. For simplicity, assume is a perfect square. The basic idea is this:
- Have Alice and Bob reshape their inputs
into matrices
, and consider (but do not form!) the positive semidefinite matrix
- Observe that
:
(2′)
- By working together, Alice and Bob can implement a trace estimation algorithm. Alice will be in charge of running the algorithm, but Alice and Bob must work together to compute matvecs. (Details below!)
- Using the output of the trace estimation algorithm, Alice determines whether they are in case 0 or 1 (i.e., where
or
) and sends the result to Bob.
To complete this procedure, we just need to show how Alice and Bob can implement the matvec procedure using minimal communication. Suppose Alice and Bob want to compute for some vector
with entries between
and
with up to
decimal digits. First, convert
to a vector
whose entries are integers between
and
. Since
, interconverting between
and
is trivial. Alice and Bob’s procedure for computing
is as follows:
- Alice sends Bob
.
- Having received
, Bob forms
and sends it to Alice.
- Having received
, Alice computes
and sends it to Bob.
- Having received
, Bob computes
and sends its to Alice.
- Alice forms
.
Because and
are
and have
entries, all vectors computed in this procedure are vectors of length
with integer entries between
and
. We conclude the communication cost for one matvec is
bits.
Analysis
Consider an algorithm we’ll call BestTraceAlgorithm. Given any accuracy parameter , BestTraceAlgorithm requires at most
matvecs and, for any positive semidefinite input matrix
of any size, produces an estimate
satisfying
(3)

To solve the Gap-Hamming problem, Alice and Bob just need enough accuracy in their trace estimation to distinguish between cases 0 and 1. In particular, if
Suppose that Alice and Bob apply trace estimation to solve the Gap-Hamming problem, using matvecs in total. The total communication is
bits. Chakrabati and Regev showed that Gap-Hamming requires
bits of communication (for some
) to solve the Gap-Hamming problem with
probability. Thus, if
, then Alice and Bob fail to solve the Gap-Hamming problem with at least
probability. Thus,
Say Alice and Bob run BestTraceAlgorithm with parameter




One thought on “How Good Can Stochastic Trace Estimates Be?”