The Cauchy–Schwarz inequality has many proofs. Here is my favorite, taken from Chapter 3 of The Schur complement and its applications; the book is edited by Fuzhen Zhang, and this chapter was contributed by him as well. Let be vectors, assemble the matrix
, and form the Gram matrix




I like this proof because it is perhaps the simplest example of the (block) matrix technique for proving inequalities. Using this technique, one proves inequalities about scalars (or matrices) by embedding them in a clever way into a (larger) matrix. Here is another example of the matrix technique, adapted from the proof of Theorem 12.9 of these lecture notes by Joel Tropp. Jensen’s inequality is a far-reaching and very useful inequality in probability theory. Here is one special case of the inequality.
Proposition (Jensen’s inequality for the inverse): Let
be strictly positive numbers. Then the inverse of their average is no bigger than the average of their inverses:
To prove this result, embed each into a
positive semidefinite matrix
. Taking the average of all such matrices, we observe that
The matrix technique for proving inequalities is very powerful. Check out Chapter 3 of The Schur complement and its applications for many more examples.
If you like this blog and want another way to follow it, I am starting newsletter:
I recall many basic results in linear algebra (which may include the basic results surrounding positive semidefiniteness) require Cauchy-Schwarz to prove, which makes me wonder if there is circular reasoning at work in this proof.
It’s a good question!. I am pretty sure the proof can be made to be non-circular. You can prove the spectral theorem without Cauchy–Schwarz (basically by going through the Schur decomposition and proving that a unitary triangularization of a Hermitian matrix must be a diagonalization), and from there you obtain that the determinant of Hermitian matrix is the product of eigenvalues and that
is positive semidefinite if and only if
for all
. Therefore, the Gram matrix A = B^\top B
x^\top Ax = \sum_{i=1}^n (Bx)_i^2 \ge 0$. I think that establishes all of the results you need for this proof of Cauchy–Schwarz to go through.