The purpose of this note is to describe the Gaussian hypercontractivity inequality. As an application, we’ll obtain a weaker version of the Hanson–Wright inequality.
The Noise Operator
We begin our discussion with the following question:
Let
be a function. What happens to
, on average, if we perturb its inputs by a small amount of Gaussian noise?
Let’s be more specific about our noise model. Let
be an input to the function
and fix a parameter
(think of
as close to 1). We’ll define the noise corruption of
to be
(1) ![Rendered by QuickLaTeX.com \[\tilde{x}_\varrho = \varrho \cdot x + \sqrt{1-\varrho^2} \cdot g, \quad \text{where } g\sim \operatorname{Normal}(0,I). \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-041342b8b0a6caeb403940fb8e87dd6a_l3.png)
Here,
is the standard multivariate Gaussian distribution. In our definition of
, we both add Gaussian noise
and shrink the vector
by a factor
. In particular, we highlight two extreme cases:
- No noise. If
, then there is no noise and
.
- All noise. If
, then there is all noise and
. The influence of the original vector
has been washed away completely.
The noise corruption (1) immediately gives rise to the noise operator
. Let
be a function. The noise operator
is defined to be:
(2) ![Rendered by QuickLaTeX.com \[(T_\varrho f)(x) = \expect[f(\tilde{x}_\varrho)] = \expect_{g\sim \operatorname{Normal}(0,I)}[f( \varrho \cdot x + \sqrt{1-\varrho^2}\cdot g)]. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-2c0d3910f8c2698ff609526d26364727_l3.png)
The noise operator computes the average value of
when evaluated at the noisy input
. Observe that the noise operator maps a function
to another function
. Going forward, we will write
to denote
.
To understand how the noise operator acts on a function
, we can write the expectation in the definition (2) as an integral:
![Rendered by QuickLaTeX.com \[T_\varrho f(x) = \int_{\real^d} f(\varrho x + y) \frac{1}{(2\pi (1-\varrho^2))^{d/2}}\e^{-\frac{|y|^2}{2(1-\varrho^2)}} \, \mathrm{d} y.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-4ffd4694f0e60bfefcab1cc5e280cea4_l3.png)
Here,

denotes the
(Euclidean) length of

. We see that

is the
convolution of

with a Gaussian density. Thus,

acts to
smooth the function

.
See below for an illustration. The red solid curve is a function
, and the blue dashed curve is
.
As we decrease
from
to
, the function
is smoothed more and more. When we finally reach
,
has been smoothed all the way into a constant.
Random Inputs
The noise operator converts a function
to another function
. We can evaluate these two functions at a Gaussian random vector
, resulting in two random variables
and
.
We can think of
as a modification of the random variable
where “a
fraction of the variance of
has been averaged out”. We again highlight the two extreme cases:
- No noise. If
,
. None of the variance of
has been averaged out.
- All noise. If
,
is a constant random variable. All of the variance of
has been averaged out.
Just as decreasing
smoothes the function
until it reaches a constant function at
, decreasing
makes the random variable
more and more “well-behaved” until it becomes a constant random variable at
. This “well-behavingness” property of the noise operator is made precise by the Gaussian hypercontractivity theorem.
Moments and Tails
In order to describe the “well-behavingness” properties of the noise operator, we must answer the question:
How can we measure how well-behaved a random variable is?
There are many answers to this question. For this post, we will quantify the well-behavedness of a random variable by using the
norm.
The
norm of a (
-valued) random variable
is defined to be
(3) ![Rendered by QuickLaTeX.com \[\norm{y}_p \coloneqq \left( \expect[|y|^p] \right)^{1/p}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-35a5fc4a23b6f18a7ae8fdc9a3cefd33_l3.png)
The

th power of the

norm

is sometimes known as the
th absolute moment of

.
The
norms of random variables control the tails of a random variable—that is, the probability that a random variable is large in magnitude. A random variables with small tails is typically thought of as a “nice” or “well-behaved” random variable. Random quantities with small tails are usually desirable in applications, as they are more predictable—unlikely to take large values.
The connection between tails and
norms can be derived as follows. First, write the tail probability
for
using
th powers:
![Rendered by QuickLaTeX.com \[\prob \{|y| \ge t\} = \prob\{ |y|^p \ge t^p \}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0883f1fd4c80aee10068d2e91c7f49f0_l3.png)
Then, we apply
Markov’s inequality, obtaining
(4) ![Rendered by QuickLaTeX.com \[\prob \{|y| \ge t\} = \prob \{ |y|^p \ge t^p \} \le \frac{\expect [|y|^p]}{t^p} = \frac{\norm{y}_p^p}{t^p}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-3cccec8fa93a5820d606eae22c7d6bf0_l3.png)
We conclude that a random variable with finite

norm (i.e.,

) has tails that decay at at a rate

or faster.
Gaussian Contractivity
Before we introduce the Gaussian hypercontractivity theorem, let’s establish a weaker property of the noise operator, contractivity.
Proposition 1 (Gaussian contractivity). Choose a noise level
and a power
, and let
be a Gaussian random vector. Then
contracts the
norm of
:
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_p \le \norm{f(x)}_p.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f1e07630c4ae5d950e1724b1cf8c63f1_l3.png)
This result shows that the noise operator makes the random variable
no less nice than
was.
Gaussian contractivity is easy to prove. Begin using the definition of the noise operator (2) and
norm (3):
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_p^p = \expect_{x\sim \operatorname{Normal}(0,I)} \left[ \left|\expect_{g\sim \operatorname{Normal}(0,I)}[f(\varrho x + \sqrt{1-\varrho^2}\cdot g)]\right|^p\right]\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0eab6434d372220e2c1bf61184491522_l3.png)
Now, we can apply
Jensen’s inequality to the
convex function 
, obtaining
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_p^p \le \expect_{x,g\sim \operatorname{Normal}(0,I)} \left[ \left|f(\varrho x + \sqrt{1-\varrho^2}\cdot g)\right|^p\right].\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-b598a0bef4b2e7ea6b2e8ee948b9f15a_l3.png)
Finally, realize that for the independent normal random vectors

, we have
![Rendered by QuickLaTeX.com \[\varrho x + \sqrt{1-\varrho^2}\cdot g \sim \operatorname{Normal}(0,I).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-af75cba71dc27112cced020a28aa8c39_l3.png)
Thus,

has the same distribution as

. Thus, using

in place of

, we obtain
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_p^p \le \expect_{x\sim \operatorname{Normal}(0,I)} \left[ \left|f(x)\right|^p\right] = \norm{f(x)}_p^p.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ef6d172ede6d0ca9579d1b3aaf18e9ad_l3.png)
Gaussian contractivity (Proposition 1) is proven.
Gaussian Hypercontractivity
The Gaussian contractivity theorem shows that
is no less well-behaved than
is. In fact,
is more well-behaved than
is. This is the content of the Gaussian hypercontractivity theorem:
Theorem 2 (Gaussian hypercontractivity): Choose a noise level
and a power
, and let
be a Gaussian random vector. Then
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_{1+(p-1)/\varrho^2} \le \norm{f(x)}_p.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-431d725cd121372139e21307236b0e6f_l3.png)
In particular, for
, ![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_{1+\varrho^{-2}} \le \norm{f(x)}_2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-bc0fd6124965b4c9ed69af922f40f38d_l3.png)
We have highlighted the
case because it is the most useful in practice.
This result shows that as we take
smaller, the random variable
becomes more and more well-behaved, with tails decreasing at a rate
![Rendered by QuickLaTeX.com \[\prob \{ |T_\varrho f(x)| \ge t \} \le \frac{\norm{T_\varrho f(x)}_{1+(p-1)/\varrho^2}}{t^{1 + (p-1)/\varrho^2}} \le \frac{\norm{f(x)}_p}{t^{1 + (p-1)/\varrho^2}}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-8970983227beeef8037b3151c46b2926_l3.png)
The rate of tail decrease becomes faster and faster as

becomes closer to zero.
We will prove the Gaussian hypercontractivity at the bottom of this post. For now, we will focus on applying this result.
Multilinear Polynomials
A multilinear polynomial
is a multivariate polynomial in the variables
in which none of the variables
is raised to a power higher than one. So,
(5) ![Rendered by QuickLaTeX.com \[1+x_1x_2\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-d3815294f85a9d7af8b74587e48133b8_l3.png)
is multilinear, but
![Rendered by QuickLaTeX.com \[1+x_1+x_1x_2^\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-8d5ce94da97c02b55aabd607ce18f481_l3.png)
is not multilinear (since

is squared).
For multilinear polynomials, we have the following very powerful corollary of Gaussian hypercontractivity:
Corollary 3 (Absolute moments of a multilinear polynomial of Gaussians). Let
be a multilinear polynomial of degree
. (That is, at most
variables
occur in any monomial of
.) Then, for a Gaussian random vector
and for all
,
![Rendered by QuickLaTeX.com \[\norm{f(x)}_q \le (q-1)^{k/2} \norm{f(x)}_2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-64e4aee5020d8dc0ff91a8f13e54214a_l3.png)
Let’s prove this corollary. The first observation is that the noise operator has a particularly convenient form when applied to a multilinear polynomial. Let’s test it out on our example (5) from above. For
![Rendered by QuickLaTeX.com \[f(x) = 1+x_1x_2,\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-a80e76951d6ebebf98ec09dad419c616_l3.png)
we have
![Rendered by QuickLaTeX.com \begin{align*}T_\varrho f(x) &= \expect_{g_1,g_2 \sim \operatorname{Normal}(0,1)} \left[1+ (\varrho x_1 + \sqrt{1-\varrho^2}\cdot g_1)(\varrho x_2 + \sqrt{1-\varrho^2}\cdot g_2)\right].\\&= 1 + \expect[\varrho x_1 + \sqrt{1-\varrho^2}\cdot g_1]\expect[\varrho x_2 + \sqrt{1-\varrho^2}\cdot g_2]\\&= 1+ (\varrho x_1)(\varrho x_2) \\&= f(\varrho x).\end{align*}](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-af701855849196fbc6cc6b014e3a2956_l3.png)
We see that the expectation applies to each variable separately, resulting in each
replaced by
. This trend holds in general:
Proposition 4 (noise operator on multilinear polynomials). For any multilinear polynomial
,
.
We can use Proposition 4 to obtain bounds on the
norms of multilinear polynomials of a Gaussian random variable. Indeed, observe that
![Rendered by QuickLaTeX.com \[f(x) = f(\varrho \cdot x/\varrho) = T_\varrho f(x/\varrho).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-be74992c166a998f5d4e6af5376a0217_l3.png)
Thus, by Gaussian hypercontractivity, we have
![Rendered by QuickLaTeX.com \[\norm{f(x)}_{1+\varrho^{-2}}=\norm{T_\varrho f(x/\varrho)}_{1+\varrho^{-2}} \le \norm{f(x/\varrho)}_2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-1102e3bac6bd91a00b2b9e4720e4b79a_l3.png)
The final step of our argument will be to compute
. Write
as
![Rendered by QuickLaTeX.com \[f(x) = \sum_{i_1,\ldots,i_s} a_{i_1,\ldots,i_s} x_{i_1} \cdots x_{i_s}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-bbab70512080f4f4823f8e501f882413_l3.png)
Since

is multilinear,

for

. Since

is degree-

,

. The multilinear monomials

are
orthonormal with respect to the

inner product:
![Rendered by QuickLaTeX.com \[\expect[(x_{i_1}\cdots x_{i_s}) \cdot (x_{i_1'}\cdots x_{i_s'})] = \begin{cases} 0 &\text{if } \{i_1,\ldots,i_s\} \ne \{i_1',\ldots,i_{s'}\}, \\1, & \text{otherwise}.\end{cases}\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-ffae6e30116eaf2f0ba6f2e59c16b3cc_l3.png)
(See if you can see why!) Thus, by the
Pythagorean theorem, we have
![Rendered by QuickLaTeX.com \[\norm{f(x)}_2^2 = \sum_{i_1,\ldots,i_s} a_{i_1,\ldots,i_s}^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-a6320300f83b368256b6451cbd8f730a_l3.png)
Similarly, the coefficients of

are

. Thus,
![Rendered by QuickLaTeX.com \[\norm{f(x/\varrho)}_2^2 = \sum_{i_1,\ldots,i_s} \varrho^{-2s} a_{i_1,\ldots,i_s}^2 \le \varrho^{-2k} \sum_{i_1,\ldots,i_s} a_{i_1,\ldots,i_s}^2 = \varrho^{-2k}\norm{f(x)}_2^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-60e2eb5d1b10e151be418d12f74ec51c_l3.png)
Thus, putting all of the ingredients together, we have
![Rendered by QuickLaTeX.com \[\norm{f(x)}_{1+\varrho^{-2}}=\norm{T_\varrho f(x/\varrho)}_p \le \norm{f(x/\varrho)}_2 \le \varrho^{-k} \norm{f(x)}_2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f114ba888a5294007218989be1af2253_l3.png)
Setting

(equivalently

), Corollary 3 follows.
Hanson–Wright Inequality
To see the power of the machinery we have developed, let’s prove a version of the Hanson–Wright inequality.
Theorem 5 (suboptimal Hanson–Wright). Let
be a symmetric matrix with zero on its diagonal and
be a Gaussian random vector. Then
![Rendered by QuickLaTeX.com \[\prob \{|x^\top A x| \ge t \} \le \exp\left(- \frac{t}{\sqrt{2}\mathrm{e}\norm{A}_{\rm F}} \right) \quad \text{for } t\ge \sqrt{2}\mathrm{e}\norm{A}_{\rm F}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e3d852941995ea03ba8a5437dea4739d_l3.png)
Hanson–Wright has all sorts of applications in computational mathematics and data science. One direct application is to obtain probabilistic error bounds for the error incurred by a stochastic trace estimation formulas.
This version of Hanson–Wright is not perfect. In particular, it does not capture the Bernstein-type tail behavior of the classical Hanson–Wright inequality
![Rendered by QuickLaTeX.com \[\prob\{|x^\top Ax| \ge t\} \le 2\exp \left( -\frac{t^2}{4\norm{A}_{\rm F}^2+4\norm{A}t} \right).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-74f8d582a44becc2445f4b52ac3eb4a3_l3.png)
But our suboptimal Hanson–Wright inequality is still pretty good, and it requires essentially no work to prove using the hypercontractivity machinery. The hypercontractivity technique also generalizes to settings where some of the proofs of Hanson–Wright fail, such as multilinear polynomials of degree higher than two.
Let’s prove our suboptimal Hanson–Wright inequality. Set
. Since
has zero on its diagonal,
is a multilinear polynomial of degree two in the entries of
. The random variable
is mean-zero, and a short calculation shows its
norm is
![Rendered by QuickLaTeX.com \[\norm{f(x)}_2 = \sqrt{\Var(f(x))} = \sqrt{2} \norm{A}_{\rm F}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-04fc820c7aa87627174119b5944e0170_l3.png)
Thus, by Corollary 3,
(6) ![Rendered by QuickLaTeX.com \[\norm{f(x)}_q \le (q-1) \norm{f(x)}_2 \le \sqrt{2} q \norm{A}_{\rm F} \quad \text{for every } q\ge 2. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c01528ece54d71c7e77c1b07a1c4a9c2_l3.png)
In fact, since the

norms are monotone, (6) holds for

as well. Therefore, the standard tail bound for

norms (4) gives
(7) ![Rendered by QuickLaTeX.com \[\prob \{|x^\top A x| \ge t \} \le \frac{\norm{f(x)}_q^q}{t^q} \le \left( \frac{\sqrt{2}q\norm{A}_{\rm F}}{t} \right)^q\quad \text{for }q\ge 1.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0eaa8096b3cce3efcf8840738de249a4_l3.png)
Now, we must optimize the value of
to obtain the sharpest possible bound. To make this optimization more convenient, introduce a parameter
![Rendered by QuickLaTeX.com \[\alpha \coloneqq \frac{\sqrt{2}q\norm{A}_{\rm F}}{t}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-6362cb79df6daa64768db6b91094ac4e_l3.png)
In terms of the

parameter, the bound (7) reads
![Rendered by QuickLaTeX.com \[\prob \{|x^\top A x| \ge t \} \le \exp\left(- \frac{t}{\sqrt{2}\norm{A}_{\rm F}} \alpha \ln \frac{1}{\alpha} \right) \quad \text{for } t\ge \frac{\sqrt{2}\norm{A}_{\rm F}}{\alpha}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0676270f71519fd22e35fe8d378860ba_l3.png)
The tail bound is minimized by taking

, yielding the claimed result
![Rendered by QuickLaTeX.com \[\prob \{|x^\top A x| \ge t \} \le \exp\left(- \frac{t}{\sqrt{2}\mathrm{e}\norm{A}_{\rm F}} \right) \quad \text{for } t\ge \sqrt{2}\mathrm{e}\norm{A}_{\rm F}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e3d852941995ea03ba8a5437dea4739d_l3.png)
Proof of Gaussian Hypercontractivity
Let’s prove the Gaussian hypercontractivity theorem. For simplicity, we will stick with the
case, but the higher-dimensional generalizations follow along similar lines. The key ingredient will be the Gaussian Jensen inequality, which made a prominent appearance in a previous blog post of mine. Here, we will only need the following version:
Theorem 6 (Gaussian Jensen). Let
be a twice differentiable function and let
be jointly Gaussian random variables with covariance matrix
. Then
(8) ![Rendered by QuickLaTeX.com \[b(\expect[h_1(x)], \expect[h_2(\tilde{x})]) \ge \expect [b(h_1(x),h_2(\tilde{x}))]\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-12fd2ed7b9a9ed4e8c3a2a8abfcfb83c_l3.png)
holds for all test functions
if, and only if, (9) ![Rendered by QuickLaTeX.com \[\Sigma \circ \nabla^2 b \quad\text{is negative semidefinite on all of $\real^2$}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-982109c83623dffed1091bebab2c8a07_l3.png)
Here,
denotes the entrywise product of matrices and
is the Hessian matrix of the function
.
To me, this proof of Gaussian hypercontractivity using Gaussian Jensen (adapted from Paata Ivanishvili‘s excellent post) is amazing. First, we reformulate the Gaussian hypercontractivity property a couple of times using some functional analysis tricks. Then we do a short calculation, invoke Gaussian Jensen, and the theorem is proved, almost as if by magic.
Part 1: Tricks
Let’s begin with “tricks” part of the argument.
Trick 1. To prove Gaussian hypercontractivity holds for all functions
, it is sufficient to prove for all nonnegative functions
.
Indeed, suppose Gaussian hypercontractivity holds for all nonnegative functions
. Then, for any function
, apply Jensen’s inequality to conclude

Thus, assuming hypercontractivity holds for the nonnegative function
, we have
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_{1+(p-1)/\varrho^2} \le \norm{T_\varrho |f|(x)}_{1+(p-1)/\varrho^2} \le \norm{|f|(x)}_p = \norm{f}_p.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-8af3943d192e74b575dd0ac9d66318ce_l3.png)
Thus, the conclusion of the hypercontractivity theorem holds for

as well, and the Trick 1 is proven.
Trick 2. To prove Gaussian hypercontractivity for all
, it is sufficient to prove the following “bilinearized” Gaussian hypercontractivity result:
![Rendered by QuickLaTeX.com \[\expect[g(x) \cdot T_\varrho f(x)]\le \norm{g(x)}_{q'} \norm{f(x)}_p\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-5354683cbb445eec2fa3a6270f5a4930_l3.png)
holds for all
with
. Here,
is the Hölder conjugate to
.
Indeed, this follows from the dual characterization of the norm of
:
![Rendered by QuickLaTeX.com \[\norm{T_\varrho f(x)}_q = \sup_{\substack{\norm{g(x)} < +\infty \\ g\ge 0}} \frac{\expect[g(x) \cdot T_\varrho f(x)]}{\norm{g(x)}_{q'}}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-fb7781d624035221c888dcb755aa797e_l3.png)
Trick 2 is proven.
Trick 3. Let
be a pair of standard Gaussian random variables with correlation
. Then the bilinearized Gaussian hypercontractivity statement is equivalent to
![Rendered by QuickLaTeX.com \[\expect[g(x) f(\tilde{x})]\le (\expect[(g(x)^{q'})])^{1/q'} (\expect[(f(\tilde{x})^{p})])^{1/p}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-484dbbe3223d9ee9eedcd6c61a5cb9e5_l3.png)
Indeed, define
for the random variable in the definition of the noise operator
. The random variable
is standard Gaussian and has correlation
with
, concluding the proof of Trick 3.
Finally, we apply a change of variables as our last trick:
Trick 4. Make the change of variables
and
, yielding the final equivalent version of Gaussian hypercontractivity:
![Rendered by QuickLaTeX.com \[\expect[v(x)^{1/q'} u(\tilde{x})^{1/p}]\le (\expect[v(x)])^{1/q'} (\expect[u(\tilde{x}))])^{1/p}\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e344261445ca42165f91b7a41bd4f27c_l3.png)
for all functions
and
(in the appropriate spaces).
Part 2: Calculation
We recognize this fourth equivalent version of Gaussian hypercontractivity as the conclusion (8) to Gaussian Jensen with
![Rendered by QuickLaTeX.com \[b(u,v) = u^{1/p}v^{1/q'}\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-66a908e9dcc2f7c2143bdd800b2bc67d_l3.png)
. Thus, to prove Gaussian hypercontractivity, we just need to check the hypothesis (9) of the Gaussian Jensen inequality (Theorem 6).
We now enter the calculation part of the proof. First, we compute the Hessian of
:
![Rendered by QuickLaTeX.com \[\nabla^2 b(u,v) = u^{1/p}v^{1/q'}\cdot\begin{bmatrix} - \frac{1}{pp'} u^{-2} & \frac{1}{pq'} u^{-1}v^{-1} \\ \frac{1}{pq'} u^{-1}v^{-1} & - \frac{1}{qq'} v^{-2}\end{bmatrix}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-1e4e90d7081679b6e70c3d15c433f32d_l3.png)
We have written

for the
Hölder conjugate to

. By Gaussian Jensen, to prove Gaussian hypercontractivity, it suffices to show that
![Rendered by QuickLaTeX.com \[\nabla^2 b(u,v)\circ \twobytwo{1}{\varrho}{\varrho}{1}= u^{1/p}v^{1/q'}\cdot\begin{bmatrix} - \frac{1}{pp'} u^{-2} & \frac{\varrho}{pq'} u^{-1}v^{-1} \\ \frac{\varrho}{pq'} u^{-1}v^{-1} & - \frac{1}{qq'} v^{-2}\end{bmatrix}\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-6a0103067b2861534c02b41257561c60_l3.png)
is negative semidefinite for all

. There are a few ways we can make our lives easier. Write this matrix as
![Rendered by QuickLaTeX.com \[\nabla^2 b(u,v)\circ \twobytwo{1}{\varrho}{\varrho}{1}= u^{1/p}v^{1/q'}\cdot B^\top\begin{bmatrix} - \frac{p}{p'} & \varrho \\ \varrho & - \frac{q'}{q} \end{bmatrix}B \quad \text{for } B = \operatorname{diag}(p^{-1}u^{-1},(q')^{-1}v^{-1}).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-133058d7adc70a8cbc73237908ead12d_l3.png)
Scaling

by nonnegative

and conjugation

both preserve negative semidefiniteness, so it is sufficient to prove
![Rendered by QuickLaTeX.com \[H = \begin{bmatrix} - \frac{p}{p'} & \varrho \\ \varrho & - \frac{q'}{q} \end{bmatrix} \quad \text{is negative semidefinite}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-e76306c9c94205e495a55477fa2f0359_l3.png)
Since the diagonal entries of

are negative, at least one of

‘s eigenvalues is negative. Therefore, to prove

is negative semidefinite, we can prove that its determinant (= product of its eigenvalues) is nonnegative. We compute
![Rendered by QuickLaTeX.com \[\det H = \frac{pq'}{p'q} - \varrho^2 .\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-1d039c438c40277dc480013fe3dfbb94_l3.png)
Now, just plug in the values for

,

,

:
![Rendered by QuickLaTeX.com \[\det H = \frac{pq'}{p'q} - \varrho^2 = \frac{p-1}{q-1} - \varrho^2 = \frac{p-1}{(p-1)/\varrho^2} - \varrho^2 = 0.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-c2ea946be5f8bedf9507e790b6e0cd8c_l3.png)
Thus,

. We conclude

is negative semidefinite, proving the Gaussian hypercontractivity theorem.