description/proof of that antisymmetric real matrix can be block-diagonalized by orthogonal matrix
Topics
About: matrices space
The table of contents of this article
Starting Context
- The reader knows a definition of %ring name% matrices space.
- The reader knows a definition of Gram-Schmidt orthonormalization of countable subset of vectors space with inner product.
- The reader admits the proposition that for any commutative ring, the transpose of the product of any matrices is the product of the transposes of the constituents in the reverse order.
Target Context
- The reader will have a description and a proof of the proposition that any antisymmetric real matrix can be block-diagonalized by an orthogonal matrix.
Orientation
There is a list of definitions discussed so far in this site.
There is a list of propositions discussed so far in this site.
Main Body
1: Structured Description
Here is the rules of Structured Description.
Entities:
\(M\): \(\in \{\text{ the } n \times n \text{ real antisymmetric matrices } \}\)
//
Statements:
\(\exists O \in \{\text{ the orthogonal matrices }\} (O^t M O = \begin{pmatrix} 0 & \sqrt{\lambda_1} & 0 & ... & & & & & & & 0 \\ - \sqrt{\lambda_1} & 0 & ... & & & & & & & & 0 \\ 0 & ... & 0 & \sqrt{\lambda_2} & 0 & ... & & & & & 0 \\ 0 & 0 & - \sqrt{\lambda_2} & 0 & ... & & & & & & 0 \\ ... \\ 0 & ... & & && & 0 & \sqrt{\lambda_{2 m}} & 0 & ... & 0 \\ 0 & ... & & & & 0 & - \sqrt{\lambda_{2 m}} & 0 & ... & & 0 \\ 0 & ... & & & & & & & & & 0 \\ ... \\ 0 & ... & & & & & & & & & 0 \end{pmatrix})\), where \(\{\lambda_1, ..., \lambda_{2 m}, 0, ..., 0\}\) are the eigenvalues of \(M^t M\) where \(0 \lt \lambda_j\)
//
2: Note
"block-diagonalized" means that the result has some diagonal blocks (which means blocks at diagonal positions, not blocks having diagonal shapes), each of which is \(\begin{pmatrix} 0 & \sqrt{\lambda_j} \\ - \sqrt{\lambda_j} & 0 \end{pmatrix}\) or \(\begin{pmatrix} 0 \end{pmatrix}\), with the other components \(0\).
Obviously, any nonzero antisymmetric matrix cannot be diagonalized, because the diagonal components of the antisymmetric matrix are all \(0\), so, \(O^t M O\) would be the \(0\) matrix (\(O^t M O\) is antisymmetric), and \(M = O O^t M O O^t = 0\).
3: Proof
Whole Strategy: Step 1: see that \(M^t M\) is symmetric and has the decreasing eigenvalues (with any duplications), \((\lambda_1, ..., \lambda_k, 0, ..., 0)\), where \(0 \lt \lambda_j\), with some eigenvectors, \(e_1, ..., e_n\); Step 2: see that \(M e_j\) is an eigenvector for \(\lambda_j\) orthogonal to \(e_j\), so, \((e_j, M e_j)\) is a pair for the same \(\lambda_j\); Step 3: take an orthonormal eigenvectors of \(M^t M\), \((O_1, ..., O_{2 m}, O_{2 m + 1}, ..., O_n)\), with the eigenvalues, \((\lambda_1, ..., \lambda_{2 m}, 0, ..., 0)\); Step 4: take \(O\) as \(\begin{pmatrix} O_1 & ... & O_n \end{pmatrix}\); Step 5: see that \(O^t M O\) is as is demanded.
Step 1:
\(M^t M\) is symmetric, because \((M^t M)^t = M^t {M^t}^t\), by the proposition that for any commutative ring, the transpose of the product of any matrices is the product of the transposes of the constituents in the reverse order, \(= M^t M\).
So, \(M^t M\) has the eigenvalues (ordered decreasingly for our convenience), \((\lambda_1, ..., \lambda_n)\), with any duplications, with some eigenvectors, \((e_1, ..., e_n)\), as is well known.
Let us see that \(0 \le \lambda_j\) for each \(j \in \{1, ..., n\}\).
\(M^t M e_j = \lambda_j e_j\).
\({e_j}^t M^t M e_j = (M e_j)^t M e_j = \Vert M e_j \Vert^2\), which is non-negative.
But \({e_j}^t M^t M e_j = {e_j}^t \lambda_j e_j = \lambda_j {e_j}^t e_j = \lambda_j \Vert e_j \Vert^2\).
So, \(0 \le \lambda_j \Vert e_j \Vert^2\), which implies that \(0 \le \lambda_j\).
So, \((\lambda_1, ..., \lambda_n) = (\lambda_1, ..., \lambda_k, 0, ..., 0)\) where \(0 \lt \lambda_j\), where the "\(0, ..., 0\)" part does not really exist when \(k = n\).
Step 2:
Let \(j \in \{1, ..., k\}\) be any.
Let us see that \(M e_j\) is an eigenvector for \(\lambda_j\) orthogonal to \(e_j\).
\(M^t M (M e_j) = - M^t (- M) (M e_j) = - (- M) (M^t) (M e_j) = M (M^t M e_j) = M (\lambda_j e_j) = \lambda_j (M e_j)\).
On the other hand, \(M (M e_j) = - (- M M e_j) = - (M^t M e_j) = - \lambda_j e_j \neq 0\), which implies that \(M e_j \neq 0\).
\(e_j = - 1 / \lambda_j M (M e_j)\).
\({e_j}^t (M e_j) = (- 1 / \lambda_j M (M e_j))^t M e_j = - 1 / \lambda_j (M (M e_j))^t M e_j = - 1 / \lambda_j (M e_j)^t M^t M e_j = - 1 / \lambda_j (M e_j)^t \lambda_j e_j = - (M e_j)^t e_j = - ((M e_j)^t e_j)^t\), because the transpose of any scalar is the scalar, \(= - {e_j}^t ((M e_j)^t)^t = - {e_j}^t (M e_j)\), which implies that \({e_j}^t (M e_j) = 0\).
So, \(M e_j\) is an eigenvector for \(\lambda_j\) orthogonal to \(e_j\).
So, \(\{e_j, M e_j\}\) is linearly independent, and \((e_j, M e_j)\) forms a pair of eigenvectors for \(\lambda_j\).
Step 3:
\(O_1 := e_1 / \Vert e_1 \Vert\) is a normal eigenvector for \(\lambda_1\).
Let us take \(O_2 := - 1 / \sqrt{\lambda_1} M O_1\), which is an eigenvector for \(\lambda_1\) orthogonal to \(O_1\) by Step 2.
\({O_2}^t O_2 = (- 1 / \sqrt{\lambda_1} M O_1)^t (- 1 / \sqrt{\lambda_1} M O_1) = 1 / \lambda_1 (M O_1)^t M O_1 = 1 / \lambda_1 {O_1}^t M^t M O_1 = 1 / \lambda_1 {O_1}^t \lambda_1 O_1 = {O_1}^t O_1 = 1\), so, \(O_2\) is a normal eigenvector for \(\lambda_1\) orthogonal to \(O_1\).
Note that \(M O_2 = M (- 1 / \sqrt{\lambda_1} M O_1) = - 1 / \sqrt{\lambda_1} M M O_1 = \sqrt{\lambda_1} O_1\), by Step 2.
If there is no more duplication of \(\lambda_1\), \((\lambda_1, \lambda_2 = \lambda_1)\) will be the duplications of \(\lambda_1\).
Let us suppose that there is another duplication of \(\lambda_1\).
A normal eigenvector, \(O_3\), can be taken to be orthogonal to \((O_1, O_2)\), by the definition of Gram-Schmidt orthonormalization of countable subset of vectors space with inner product.
Then, let us take \(O_4 := - 1 / \sqrt{\lambda_1} M O_3\), a normal eigenvector for \(\lambda_1\) orthogonal to \(O_3\), as before.
Let us see that \(O_4\) is orthogonal also to \(O_1\) and \(O_2\).
For \(j \in \{1, 2\}\), \({O_j}^t O_4 = {O_j}^t (- 1 / \sqrt{\lambda_1} M O_3) = - 1 / \sqrt{\lambda_1} {O_j}^t M O_3 = - 1 / \sqrt{\lambda_1} {O_j}^t {M^t}^t O_3 = - 1 / \sqrt{\lambda_1} (M^t O_j)^t O_3 = - 1 / \sqrt{\lambda_1} (- M O_j)^t O_3 = 1 / \sqrt{\lambda_1} (M O_j)^t O_3\), but \(M O_j\) is a scalar multiple of \(O_1\) or \(O_2\), so, \(= 0\).
And so on, after all, \(\lambda_1\) has some even duplications, \((\lambda_1, \lambda_2 = \lambda_1, ..., \lambda_{2 l - 1} = \lambda_1, \lambda_{2 l} = \lambda_1)\), with the orthonormal eigenvectors, \((O_1, O_2, ..., O_{2 l - 1}, O_{2 l})\).
Doing likewise for each eigenvalue-positive-duplications, we have the eigenvalues, \((\lambda_1, ..., \lambda_{2 m})\) with the orthonormal eigenvectors, \((O_1, ..., O_{2 m})\): any 2 eigenvectors with different eigenvalues, \(O_j, O_l\), are inevitably orthogonal to each other, because \((\lambda_l - \lambda_j) {O_j}^t O_l = \lambda_l {O_j}^t O_l - \lambda_j {O_j}^t O_l = {O_j}^t M^t M O_l - (M^t M O_j)^t O_l = ((M^t M)^t O_j)^t O_l - (M^t M O_j)^t O_l = (M^t M O_j)^t O_l - (M^t M O_j)^t O_l = 0\), which implies that \({O_j}^t O_l = 0\).
For the eigenvalue-0-duplications, we take any orthonormal eigenvectors, by the definition of Gram-Schmidt orthonormalization of countable subset of vectors space with inner product.
So, we have the eigenvalues \((\lambda_1, ..., \lambda_{2 m}, 0, ..., 0)\) with the orthonormal eigenvectors, \((O_1, ..., O_{2 m}, O_{2 m + 1}, ..., O_n)\): any 2 eigenvectors with different eigenvalues, \(O_j, O_l\), are inevitably orthogonal to each other, as before.
Step 4:
Let us take \(O := \begin{pmatrix} O_1 & ... & O_n \end{pmatrix}\).
\(O\) is an orthogonal matrix, because \((O_1, ..., O_n)\) is orthonormal: \((O_1, ..., O_n)\)'s being orthonormal is nothing but \(O^t O = I\).
Step 5:
Let us see that \(O^t M O\) is as is demanded.
For each \(2 m \lt j\), \(M O_j = 0\), because \(M^t M O_j = 0\), so, \({O_j}^t M^t M O_j = 0\), but the left hand side is \((M O_j)^t M O_j = \Vert M O_j \Vert^2\), so, \(\Vert M O_j \Vert^2 = 0\), which implies that \(M O_j = 0\).
Let us see that \((O^t M O)^j_l = {O_j}^t M O_l\).
\((O^t M O)^j_l = (O^t)^j (M O)_l\), where \((O^t)^j\) denotes the \(j\)-th row of \(O^t\) and \((M O)_l\) denotes the \(l\)-th column of \(M O\).
\((O^t)^j = {O_j}^t\).
\((M O)_l = M O_l\).
So, \((O^t M O)^j_l = {O_j}^t M O_l\).
For each \(j = 2 r + 1\) for each \(r \in \{0, ..., m - 1\}\), for \(l = j + 1\), \({O_j}^t M O_l = {O_j}^t \sqrt{\lambda_j} O_j = \sqrt{\lambda_j}\), and for any other \(l\), \({O_j}^t M O_l = 0\), because when \(l \le 2 m\), \(M O_l\) is a scalar multiple of the other in the pair to which \(O_l\) belongs, and when \(2 m \lt l\), \(M O_l = 0\).
For each \(j = 2 r + 2\) for each \(r \in \{0, ..., m - 1\}\), for \(l = j - 1\), \({O_j}^t M O_l = {O_j}^t (- \sqrt{\lambda_j} O_j) = - \sqrt{\lambda_j}\), and for any other \(l\), \({O_j}^t M O_l = 0\), because when \(l \le 2 m\), \(M O_l\) is a scalar multiple of the other in the pair to which \(O_l\) belongs, and when \(2 m \lt l\), \(M O_l = 0\).
For each \(j\) such that \(2 m \lt j\), for each \(l \in \{1, .., n\}\), \({O_j}^t M O_l = 0\), because when \(l \le 2 m\), \(M O_l\) is a scalar multiple of the other in the pair to which \(O_l\) belongs, and when \(2 m \lt l\), \(M O_l = 0\).
That meas that \(O^t M O\) is as is demanded.