The Schur product theorem states that the entrywise product $A\circ M$ of two positive semidefinite matrices is also positive semidefinite. This post will present every proof I know for this theorem, and I intend to edit it to add additional proofs if I learn of them. (Please reach out if you know another!) My goal in this post is to be short and sweet, so I will assume familiarity with many properties for positive semidefinite matrices.

For this post, a matrix $A\in\real^{n\times n}$ is positive semidefinite (psd, for short) if it is symmetric and satisfies $x^\top Ax\ge 0$ for all vectors $x\in\real^n$ . All matrices in this post are real, though the proofs we’ll consider also extend to complex matrices. The entrywise product will be denoted $\circ$ and is defined as $(A\circ M)_{ij} = A_{ij}M_{ij}$ . The entrywise product is also known as the Hadamard product or Schur product.

It is also true that the entrywise product of two positive definite matrices is positive definite. The interested reader may be interested in seeing which of the proofs also yield this result.

Proof 1: Trace formula

We start by computing $x^\top (A\circ M)x$ :

$x^\top (A\circ M)x = \sum_{i,j=1}^n x_i (A\circ M)_{ij} x_j = \sum_{i,j=1}^n x_i A_{ij} M_{ij} x_j.$

Now, we may rearrange the sum, use symmetry of $M$ , and repackage it as a trace

$x^\top (A\circ M)x = \sum_{i,j=1}^n x_i A_{ij} x_j M_{ji} = \tr(\operatorname{diag}(x) A \operatorname{diag}(x) M).$

This the trace formula for quadratic forms in the Schur product.

Recall that a matrix $A$ is psd if and only if it $A$ is a Gram matrix (able to be expressed as $A = B^\top B$ ). Thus, we may write $A = B^\top B$ and $M = C^\top C$ . Substituting these expressions in the trace formula and invoking the cyclic property of the trace, we get

$x^\top (A\circ M)x = \tr(\operatorname{diag}(x) B^\top B \operatorname{diag}(x) C^\top C) = \tr(C\operatorname{diag}(x) B^\top B \operatorname{diag}(x) C^\top).$

The matrix on the right-hand side has the expression

$C\operatorname{diag}(x) B^\top B \operatorname{diag}(x) C^\top = G^\top G \quad \text{for } G = B \operatorname{diag}(x) C^\top.$

Therefore, it is psd and so its trace is psd:

$x^\top (A\circ M)x = \tr(G^\top G) \ge 0.$

We have shown $x^\top (A\circ M)x\ge 0$ for every vector $x$ , so $A\circ M$ is psd.

Proof 2: Gram matrix

Since $A$ and $M$ are psd, they may be written as $A = B^\top B$ and $M = C^\top C$ . Letting $b_i^\top$ and $c_i^\top$ denote the $i$ th rows of $B$ and $C$ , we have

$A = \sum_i b_ib_i^\top \quad \text{and} \quad M = \sum_j c_jc_j^\top.$

Computing the Schur product and distributing, we have

$A\circ M = \sum_{i,j} (b_ib_i^\top \circ c_jc_j^\top).$

The Schur product of rank-one matrices $b_ib_i^\top$ and $c_jc_j^\top$ is, by direct computation, $(b_i\circ c_j)(b_i\circ c_j)^\top$ . Thus,

$A\circ M = \sum_{i,j} (b_i\circ c_j)(b_i\circ c_j)^\top$

is a sum of (rank-one) psd matrices and is thus psd.

Proof 3: Covariances

Let $x$ and $y$ be independent random vectors with zero mean and covariance matrices $A$ and $M$ . The vector $x\circ y$ is seen to have zero mean as well. Thus, the $ij$ entry of the covariance matrix $\Cov(x\circ y)$ of $x\circ y$ is

$\expect[x_iy_ix_jy_j] = \expect[x_ix_j] \expect[y_iy_j] = A_{ij} M_{ij} = (A\circ M)_{ij}.$

The second equality is the independence of $x$ and $y$ , and the third equality uses the fact that $A$ and $M$ are the covariance matrices of $x$ and $y$ . Thus, the covariance matrix of $x\circ y$ is $A\circ M$ . All covariance matrices are psd, so $A\circ M$ is psd as well.¹