This post is part of a new series for this blog, Note to Self, where I collect together some notes about an idea related to my research. This content may be much more technical than most of the content of this blog and of much less wide interest. My hope in sharing this is that someone will find this interesting and useful for their own work.
This post is about a fundamental tool of high-dimensional probability, the Hanson–Wright inequality. The Hanson–Wright inequality is a concentration inequality for quadratic forms of random vectors—that is, expressions of the form  where
 where  is a random vector. Many statements of this inequality in the literature have an unspecified constant
 is a random vector. Many statements of this inequality in the literature have an unspecified constant  ; our goal in this post will be to derive a fairly general version of the inequality with only explicit constants.
; our goal in this post will be to derive a fairly general version of the inequality with only explicit constants.
The core object of the Hanson–Wright inequality is a subgaussian random variable. A random variable  is subgaussian if the probability it exceeds a threshold
 is subgaussian if the probability it exceeds a threshold  in magnitude decays as
 in magnitude decays as
 (1)    ![Rendered by QuickLaTeX.com \[\mathbb{P}\{|Y|\ge t\} \le \mathrm{e}^{-t^2/a} \quad \text{for some $a>0$ and for all sufficiently large $t$.} \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-7f713f49e028fa06020270f85a5a1e2f_l3.png)
The name subgaussian is appropriate as the tail probabilities of Gaussian random variables exhibit the same square-exponential decrease  .
.
A (non-obvious) fact is that if  is subgaussian in the sense (1) and centered (
 is subgaussian in the sense (1) and centered ( ), then
), then  ‘s cumulant generating function (cgf)
‘s cumulant generating function (cgf)
      ![Rendered by QuickLaTeX.com \[\xi_Y(t) := \log \mathbb{E} \exp(tY).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c65c7c900f2941726b28371b5fdcb4e8_l3.png)
is subquadratic: There is a constant  (independent of
 (independent of  and
 and  ), for which
), for which
 (2)    ![Rendered by QuickLaTeX.com \[\xi_Y(t) \le ca t^2 \quad \text{for all $t\in\mathbb{R}$}. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e307f36da20e09f721e8f9a24d36a693_l3.png)
Moreover,1See Proposition 2.5.2 of Vershynin’s High-Dimensional Probability. a subquadratic cgf (2) also implies the subgaussian tail property (1), with a different parameter  .
.
Since properties (1) and (2) are equivalent (up to a change in the parameter  ), we are free to fix a version of property (2) as our definition for a (centered) subgaussian random variable.
), we are free to fix a version of property (2) as our definition for a (centered) subgaussian random variable.
Definition (subgaussian random variable): A centered random variable
is said to be
-subgaussian or subgaussian with variance proxy
if its cgf is subquadratic:
(3)
For instance, a mean-zero Gaussian random variable  with variance
 with variance  has cgf
 has cgf
 (4)    ![Rendered by QuickLaTeX.com \[ \xi_X(t) = \frac{1}{2} \sigma^2 t^2,  \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ee966879f0689027cc0f7f2396dd0dae_l3.png)
and is thus subgaussian with variance proxy  equal to its variance.
 equal to its variance.
Here is a statement of the Hanson–Wright inequality as it typically appears with unspecified constants (see Theorem 6.2.1 of Vershynin’s High-Dimensional Probability):
Theorem (Hanson–Wright): Let
be a random vector with independent centered
-subgaussian entries and let
be a square matrix. Then
where
is a constant (not depending on
,
,
, or
).2Here,
and
denote the Frobenius and spectral norms.
This type of concentration is exactly the same type as provided by Bernstein’s inequality (which I discussed in my post on concentration inequalities). In particular, for small deviations  , the tail probabilities decay are subgaussian with variance proxy
, the tail probabilities decay are subgaussian with variance proxy  :
:
      ![Rendered by QuickLaTeX.com \[\mathbb{P}\left\{\left|x^\top Ax - \mathbb{E}\left[x^\top Ax\right]\right|\ge t \right\} \stackrel{\text{small $t$}}{\lessapprox} 2\exp\left(- \frac{c\cdot t^2}{v^2\left\|A\right\|_{\rm F}^2} \right)\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-37b39cb580d561ba6777a9e757dc714f_l3.png)
For large deviations  , this switches to subexponential tail probabilities with decay rate
, this switches to subexponential tail probabilities with decay rate  :
:
      ![Rendered by QuickLaTeX.com \[\mathbb{P}\left\{\left|x^\top Ax - \mathbb{E}\left[x^\top Ax\right]\right|\ge t \right\} \stackrel{\text{large $t$}}{\lessapprox} 2\exp\left(- \frac{c\cdot t}{v\|A\|} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-244ec27c116074c4ee0033dda85051d8_l3.png)
Mediating these two parameter regimes are the size of the matrix  , as measured by its Frobenius and spectral norms, and the degree of subgaussianity of
, as measured by its Frobenius and spectral norms, and the degree of subgaussianity of  , measured by the variance proxy
, measured by the variance proxy  .
.
Diagonal-Free Hanson–Wright
Now we come to a first version of the Hanson–Wright inequality with explicit constants, first for a matrix which is diagonal-free—that is, having all zeros on the diagonal. I obtained this version of the inequality myself, though I am very sure that this version of the inequality or an improvement thereof appears somewhere in the literature.
Theorem (Hanson–Wright, explicit constants, diagonal-free): Let
random vector with independent centered
-subguassian entries and let
be a diagonal-free square matrix. Then we have the cgf bound
As a consequence, we have the concentration bound
Similarly, we have the lower tail
and the two-sided bound
Let us begin proving this result. Our proof will follow the same steps as Vershynin’s proof in High-Dimensional Probability (which in turn is adapted from an article by Rudelson and Vershynin), but taking care to get explicit constants. Unfortunately, proving all of the relevant tools from first principles would easily triple the length of this post, so I make frequent use of results from the literature.
We begin by the decoupling bound (Theorem 6.1.1 in Vershynin’s High-Dimensional Probability), which allows us to replace one  with an independent copy
 with an independent copy  at the cost of a factor of four:
 at the cost of a factor of four:
 (5)    ![Rendered by QuickLaTeX.com \[\xi_{x^\top Ax}(t) \le \xi_{\tilde{x}^\top Ax}(4t). \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e20eed7500e7e93630cb1fb1dac324dd_l3.png)
We seek to compare the bilinear form  to the Gaussian bilinear form
 to the Gaussian bilinear form  where
 where  and
 and  are independent standard Gaussian vectors. We begin with the following cgf bound for the Gaussian quadratic form
 are independent standard Gaussian vectors. We begin with the following cgf bound for the Gaussian quadratic form  :
:
      ![Rendered by QuickLaTeX.com \[\xi_{g^\top Ag}(t) \le \frac{\left\|A\right\|_{\rm F}^2 \, t^2}{1-2\|A\|\, t}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-19f61ae89aa95b37ff49bcbd7ad27b96_l3.png)
This equation is the result of Example 2.12 in Boucheron, Lugosi, and Massart’s Concentration Inequalities. By applying this result to the Hermitian dilation of  in
 in  ‘s place, one obtains a similar result for the decoupled bilinear form
‘s place, one obtains a similar result for the decoupled bilinear form  :
:
 (6)    ![Rendered by QuickLaTeX.com \[\xi_{\tilde{g}^\top Ag}(t) \le \frac{\left\|A\right\|_{\rm F}^2 \, t^2}{2(1-\|A\|\, t)}. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-38e35e17a1a326c592e4a33717e57bd2_l3.png)
We now seek to compare  to
 to  . To do this, we first evaluate the cgf of
. To do this, we first evaluate the cgf of  only over the randomness in
 only over the randomness in  . Since we’re only taking an expectation over the random variable
. Since we’re only taking an expectation over the random variable  , we can apply the subquadratic tail condition (3) to obtain
, we can apply the subquadratic tail condition (3) to obtain
 (7)    ![Rendered by QuickLaTeX.com \[\log \mathbb{E}_{\tilde{x}} \exp(t \, \tilde{x}^\top Ax) = \sum_{i=1}^n \log \mathbb{E}_{\tilde{x}} \exp(t \,\tilde{x}_i (Ax)_i) \le  \frac{1}{2} v \left(\sum_{i=1}^n (Ax)_i^2\right)t^2 \le \frac{1}{2} v\left\|Ax\right\|^2 \, t^2. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0c463e71b81d7bd984fa8518767f1660_l3.png)
Now we perform a similar computation for the quantity  in which
 in which  has been replaced by the Gaussian vector
 has been replaced by the Gaussian vector  :
:
      ![Rendered by QuickLaTeX.com \[\log \mathbb{E}_{\tilde{g}} \exp((\sqrt{v} t) \, \tilde{g}^\top Ax) = \frac{1}{2} v \left\|Ax\right\|^2 \, t^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ef3b5b37870fe8313c9972bbc213f96b_l3.png)
We stress that this is an equality since the cgf of a Gaussian random variable is given by (4). Thus we can substitute the left-hand side of the above display into the right-hand side of (7), yielding
 (8)    ![Rendered by QuickLaTeX.com \[\log \mathbb{E}_{\tilde{x}} \exp(t \, \tilde{x}^\top Ax) \le \log \mathbb{E}_{\tilde{g}} \exp((\sqrt{v} t) \, \tilde{g}^\top Ax). \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c05efe81c8095276378b2e4788d64394_l3.png)
We now perform this same trick again using the randomness in  :
:
 (9)    ![Rendered by QuickLaTeX.com \[\log \mathbb{E}_{\tilde{g},x} \exp((\sqrt{v} t) \, \tilde{g}^\top Ax) \le \log \mathbb{E}_{\tilde{g}} \exp \left(\frac{1}{2} v^2 \left\|A^\top \tilde{g}\right\|^2t^2\right) = \log \mathbb{E}_{\tilde{g},g} \exp(v t \, \tilde{g}^\top Ag). \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c8fdd18f70e855effd8259473ee86933_l3.png)
Packaging up (8) and (9) gives
 (10)    ![Rendered by QuickLaTeX.com \[\xi_{\tilde{x}^\top Ax}(t)\le \xi_{\tilde{g}^\top Ag}(vt). \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-fdca14d3f0efedd0b795f0c1f871368c_l3.png)
Combining all these results (5), (6), and (10), we obtain
      ![Rendered by QuickLaTeX.com \[\xi_{x^\top Ax}(t) \le \xi_{\tilde{x}^\top Ax}(4t) \le \xi_{\tilde{g}^\top Ag}(4vt) \le \frac{16v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-4v\left\|A\right\|t)}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0037fd7f023f4d3f7058bb77afa263c5_l3.png)
This cgf implies the desired probability bound on the upper tail as a consequence of the following fact (see Boucheron, Lugosi, and Massart’s Concentration Inequalities page 29 and Exercise 2.8):
Fact (Bernstein concentration from Bernstein cgf bound): Suppose that a random variable
satisfies the cgf bound
for
. Then
To get the bound on the lower tail, apply the result for the upper tail to the matrix  to obtain
 to obtain
      ![Rendered by QuickLaTeX.com \[\mathbb{P} \{ x^\top A x \le -t \} = \mathbb{P} \{ x^\top (-A) x \ge t \} \le \exp\left( -\frac{t^2/2}{16v^2 \left\|A\right\|_{\rm F}^2+4v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f63f2b0759ceb4a78be2f3513ab8ab82_l3.png)
Finally, to obtain the two-sided bound, use a union bound over the upper and lower tails:
      ![Rendered by QuickLaTeX.com \[\mathbb{P} \{ |x^\top A x| \ge t \} \le \mathbb{P} \{ x^\top A x \ge t \} + \mathbb{P} \{ x^\top A x \le -t \} \le 2\exp\left( -\frac{t^2/2}{16v^2 \left\|A\right\|_{\rm F}^2+4v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f6d5d289322c34b3004c8501cd9cf477_l3.png)
General Hanson–Wright
Now, here’s a more general result (with worse constants) which permits the matrix  to possess a diagonal.
 to possess a diagonal.
Theorem (Hanson–Wright, explicit constants): Let
random vector with independent centered
-subguassian entries and let
be an arbitrary square matrix. Then we have the cgf bound
As a consequence, we have the concentration bound
Left tail and two-sided bounds versions of this bound also hold:
and
Decompose the matrix  into its diagonal and off-diagonal portions. For any two random variables
 into its diagonal and off-diagonal portions. For any two random variables  and
 and  (possibly highly dependent), we can bound the cgf of their sum using the following “union bound”:
 (possibly highly dependent), we can bound the cgf of their sum using the following “union bound”:
 (11)    ![Rendered by QuickLaTeX.com \begin{align*} \xi_{X+Y}(t) &= \log \mathbb{E} \left[\exp(tX)\exp(tY)\right] \\&\le \log \left(\left[\mathbb{E} \exp(2tX)\right]^{1/2}\left[\mathbb{E}\exp(2tY)\right]^{1/2}\right) \\&=\frac{1}{2} \xi_X(2t) + \frac{1}{2}\xi_Y(2t). \end{align*}](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-5fd6293d750d7bd98a31189e5802f59e_l3.png)
The two equality statements are the definition of the cumulant generating function and the inequality is Cauchy–Schwarz.
Using the “union bound”, it is sufficient to obtain bounds for the cgfs of the diagonal and off-diagonal parts ![Rendered by QuickLaTeX.com x^\top D x - \mathbb{E}[x^\top Ax]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c7bc48f04ae3847d7b03462151fca3f7_l3.png) and
 and  . We begin with the diagonal part. We compute
. We begin with the diagonal part. We compute
 (12)    ![Rendered by QuickLaTeX.com \begin{align*}\xi_{x^\top D x - \mathbb{E}[x^\top Ax]}(t) &= \log \mathbb{E} \exp\left(t \sum_{i=1}^n A_{ii}(x_i^2 - \mathbb{E}[x_i^2]) \right) \\ &= \sum_{i=1}^n  \log \mathbb{E} \exp\left((t A_{ii})\cdot(x_i^2 - \mathbb{E}[x_i^2]) \right). \end{align*}](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-03a1f2a90412fa1e00f7deb32d479def_l3.png)
For the cgf of ![Rendered by QuickLaTeX.com x_i^2 - \mathbb{E}[x_i^2]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-2ceca9ceddd7db5d803529c2da5c349c_l3.png) , we use the following bound, taken from Appendix B of the following paper:
, we use the following bound, taken from Appendix B of the following paper:
      ![Rendered by QuickLaTeX.com \[\log \mathbb{E} \exp\left(t(x_i^2 - \mathbb{E}[x_i^2]) \right) \le \frac{8v^2t^2}{1-2v|t|}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ec7de84a4e1f848cf06fb619582b4418_l3.png)
Substituting this result into (12) gives
 (13)    ![Rendered by QuickLaTeX.com \[\xi_{x^\top D x - \mathbb{E}[x^\top Ax]}(t) \le \sum_{i=1}^n \frac{8v^2|A_{ii}|^2t^2}{1-2v|A_{ii}|t} \le \frac{8v^2\|A\|_{\rm F}^2t^2}{1-2v\|A\|t}\quad \text{for $t>0$}. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-50a71b505cc381540a6934a28416f614_l3.png)
For the second inequality, we used the facts that  and
 and  .
.
We now look at the off-diagonal part  . We use a version of the decoupling bound (5) where we compare
. We use a version of the decoupling bound (5) where we compare  to
 to  , where we’ve both replaced one copy of
, where we’ve both replaced one copy of  with an independent copy and reinstated the diagonal of
 with an independent copy and reinstated the diagonal of  (see Remark 6.1.3 in Vershynin’s High-Dimensional Probability):
 (see Remark 6.1.3 in Vershynin’s High-Dimensional Probability):
      ![Rendered by QuickLaTeX.com \[\xi_{x^\top F x}(t) \le \xi_{\tilde{x}^\top Ax}(4t).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0a6c678fd5b04c01bb1001d4f6897493_l3.png)
We can now just repeat the rest of the argument for the diagonal-free Hanson–Wright inequality, yielding the same conclusion
 (14)    ![Rendered by QuickLaTeX.com \[ \xi_{x^\top Fx}(t) \le \frac{16v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-4v\left\|A\right\|t)}.  \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-29b5b5a2aa3fe27580abf965005db1d0_l3.png)
Combining (11), (13), and (14), we obtain
      ![Rendered by QuickLaTeX.com \begin{align*}\xi_{x^\top Ax-\mathbb{E} [x^\top A x]} &\le \frac{1}{2} \xi_{x^\top D x - \mathbb{E}[x^\top Ax]}(2t) + \frac{1}{2} \xi_{x^\top Fx}(2t) \\&\le \frac{8v^2\|A\|_{\rm F}^2t^2}{2(1-4v\|A\|t)} + \frac{32v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-8v\left|A\right|t)} \\&\le \frac{8v^2\|A\|_{\rm F}^2t^2}{2(1-4v\|A\|t)} + \frac{32v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-8v\left\|A\right\|t)} \\&\le \frac{40v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-8v\left\|A\right\|t)}.\end{align*}](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e26ce05b781b47ce078f0addb9559240_l3.png)
As with above, this cgf bound implies the desired probability bound.
![Rendered by QuickLaTeX.com \[\xi_{x}(t) \le\frac{1}{2} vt^2 \quad \text{for all $t\in\mathbb{R}$.} \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c7c44d1fed75ceb263a286963825c5ae_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P}\left\{\left|x^\top Ax - \mathbb{E} \left[x^\top A x\right]\right|\ge t \right\} \le 2\exp\left(- \frac{c\cdot t^2}{v^2\left\|A\right\|_{\rm F}^2 + v\left\|A\right\|t} \right),\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-a84a4ab33a1186630c8021cbfcd72d23_l3.png)
![Rendered by QuickLaTeX.com \[\xi_{x^\top Ax}(t) \le \frac{16v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-4v\left\|A\right\|t)}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-a024c1e2eb4eb0e9ce3b60a66a5cca04_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ x^\top A x \ge t \} \le \exp\left( -\frac{t^2/2}{16v^2 \left\|A\right\|_{\rm F}^2+4v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-2213aea3706c09f904897076143402f3_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ x^\top A x \le -t \} \le \exp\left( -\frac{t^2/2}{16v^2 \left\|A\right\|_{\rm F}^2+4v\left\|A\right\|t} \right)\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ae296911e4204fa8e70d217568edd442_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ |x^\top A x| \ge t \} \le 2\exp\left( -\frac{t^2/2}{16v^2 \left\|A\right\|_{\rm F}^2+4v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-4bdfeb16e5c96258871d56cb51203b95_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \left\{ X\ge t \right\} \le \exp\left( -\frac{t^2/2}{v+ct} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-12b977ef8c244a0750402071dcf6caf1_l3.png)
![Rendered by QuickLaTeX.com \[\xi_{x^\top Ax-\mathbb{E} [x^\top A x]}(t) \le \frac{40v^2\left\|A\right\|_{\rm F}^2\, t^2}{2(1-8v\left\|A\right\|t)}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-9ec7d119e890e1feaf86137b543c2adc_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ x^\top A x-\mathbb{E} [x^\top A x] \ge t \} \le \exp\left( -\frac{t^2/2}{40v^2 \left\|A\right\|_{\rm F}^2+8v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-5d35dd74f9d307d1616a4c0b2d10a7ac_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ x^\top A x-\mathbb{E} [x^\top A x] \le -t \} \le \exp\left( -\frac{t^2/2}{40v^2 \left\|A\right\|_{\rm F}^2+8v\left\|A\right\|t} \right)\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-4d0c27970de892090fbb22751582e8a0_l3.png)
![Rendered by QuickLaTeX.com \[\mathbb{P} \{ |x^\top A x-\mathbb{E} [x^\top A x]| \ge t \} \le 2\exp\left( -\frac{t^2/2}{40v^2 \left\|A\right\|_{\rm F}^2+8v\left\|A\right\|t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0239d1a1656250c1cdb5b0c5557973e4_l3.png)
Hi Ethan, one quick question. Why is it that in your versions of the Hanson-Wright Inequality with explicit constants you don’t subtract the expectation of the quadratic form in the tail bound?
Thank you for bringing this to my attention. For the diagonal-free case, the expectation is zero so this term is not necessary. For the general version, it was a mistake/typo to leave this term off; now corrected!
Hi Ethan,
Thank you very much for the great article! Could you confirm that the diagonal-free HW theorem with explicit constants assuming Rademacher iid random vectors is the same also for the lower tail bound? I think this is because xTAx is identically distributed to -xTAx.
I know it is out of scope from the post, but getting your feedback would help me understand this better! There are so many proofs of this theorem in different settings and I am trying to reconcile the different claims and techniques.
It is important to be careful. is not identically distributed to
 is not identically distributed to  ! (Try it yourself on a
! (Try it yourself on a  example.) What is true is that you can apply diagonal-free Hanson–Wright to
 example.) What is true is that you can apply diagonal-free Hanson–Wright to  to get that
 to get that  . You can get both the upper and lower tail by taking a union bound. I’ll add something to the post to clarify this.
. You can get both the upper and lower tail by taking a union bound. I’ll add something to the post to clarify this.
Thank you for the clarification!
Here is a recent proof of the inequality
https://arxiv.org/pdf/2509.00881