, 5 min read

Theorem of Stein and Rosenberg

Original post is here eklausmeier.goip.de/blog/2023/06-06-theorem-of-stein-rosenberg.


This post recaps content from Homer Walker in his handout, and Richard Varga's classic book "Matrix Iterative Analysis".

Below tables show the three standard iterations for solving the linear system $Ax=b$, with $A=(a_{ij})\in\mathbb{R}^{n\times n}$, and $x,b\in\mathbb{R}^n$; index $i$ is running $i=1(1)n$. Split $A$ into $A=D-L-U$, where $D$ is a diagonal matrix, $L$ is strictly lowertriangular, $U$ is strictly uppertriangular. $x^\nu$ is the $\nu$-th iterate.

Jacobi iteration Gauss-Seidel iteration SOR
$x_i^{\nu+1} = \left(b_i - \sum_{j\ne i}{a_{ij}x_j^\nu}\right)/a_{ii}$ $x_i^{\nu+1}=\left(b_i - \sum_{j<i}{a_{ij}x_j^{\nu+1}-\sum_{j>i}{a_{ij}x_j^\nu}}\right)/a_{ii}$ $x_i^{\nu+1}=(1-\omega)x_i+{\omega\over a_{ii}}\left(b_i - \sum_{j<i}{a_{ij}x_j^{\nu+1} - \sum_{j>i}{a_{ij}x_j^\nu}}\right)$
$x^{\nu+1}=D^{-1}[(L+U)x^\nu+b]$ $x^{\nu+1}=(D-L)^{-1}\left(Ux^\nu+b\right)$ $x^{\nu+1}=(D-\omega L)^{-1}\left\lbrace[(1-\omega)D+\omega U]x^\nu+\omega b\right\rbrace$
$T_J=D^{-1}(L+U)$ $T_1=(D-L)^{-1}U$ $T_\omega=(D-\omega L)^{-1}[(1-\omega)D+\omega U]$

The difference between Jacobi and Gauss-Seidel is that the Gauss-Seidel iteration already uses the newly computed elements $x_j$, for $j<i$. The SOR (Successive Over-Relaxation) does likewise but employs a "damping" or overshooting factor $\omega$ in addition to the Gauss-Seidel method. The case $\omega=1$ is the Gauss-Seidel method. $\omega>1$ is called overrelaxation, $\omega<1$ is called underrelaxation.

The iterations from above are called point iterations. If, instead, the $a_{ii}$ are taken as matrices, i.e., solve a linear equation in each iteration step, then the corresponding method is called a block-method.

Above iteration methods can be written as $x^{\nu+1}=Tx^{\nu}+c$, with $T\in\mathbb{R}^{n\times n}$. Let $\rho(T)$ be the spectral radius of $T$.

Theorem. The iterates $x^\nu$ converge if and only if $\rho(T)<1$.

Proof: See "Matrix Iterative Analysis", theorem 1.4 using Jordan normal form, or theorem 5.1 in Auzinger/Melenk (2017) using the equivalence of norms in finite dimensional normed vectorspaces. A copy is here Iterative Solution of Large Linear Systems.

Theorem (Stein-Rosenberg (1948)): If $a_{ij}\le0$ for $i\ne j$ and $a_{ii}>0$ for $i=1,\ldots,n$. Then, one and only one of the following mutually exclusive relations is valid:

  1. $\rho(T_J) = \rho(T_1) = 0$.
  2. $0<\rho(T_1)<\rho(T_J)<1$.
  3. $1=\rho(T_J)=\rho(T_1)$.
  4. $1<\rho(T_J)<\rho(T_1)$.

Thus, the Jacobi method and the Gauss-Seidel method are either both convergent, or both divergent. If they are convergent then the Gauss-Seidel is faster than the Jacobi method.

Proof: See §3.3 in "Matrix Iterative Analysis".

From R. Varga:

the Stein-Rosenberg theorem gives us our first comparison theorem for two different iterative methods. Interpreted in a more practical way, not only is the point Gauss-Seidel iterative method computationally more convenient to use (because of storage requirements) than the point Jacobi iterative matrix, but it is also asymptotically faster when the Jacobi matrix $T_J$ is non-negative

Philip Stein born 1890 in Lithuania, graduated in South Africa, died 1974 in London. R.L. Rosenberg was a coworker at the University Technical College of Natal, South Africa.

Theorem. If $A$ is strictly or irreducibly diagonally dominant, then, both the point Jacobi and the point Gauss-Seidel converge for any starting value.

Proof: See "Matrix Iterative Analysis" theorem 3.4 cleverly using the Stein-Rosenberg theorem and the Perron-Frobenius theorem. Alternatively, see theorem 5.5 in Auzinger/Melenk (2017). and just using computation.

The Stein-Rosenberg theorem compares Jacobi and Gauss-Seidel iteration under mild conditions. If those conditions are sharpened then quantitative comparisons between Jacobi and Gauss-Seidel iteration can be made. In particular, a relation between the eigenvalues of $T_J$ and $T_\omega$ can be given. Unrelated to Stein-Rosenberg but of general interest for the overrelaxation method is the theorem of Kahan (1958), which says that overrelaxation only converges for $0<\omega<2$, if at all. It is a necessary condition not a sufficient one.

Theorem (Kahan (1958)): $\rho(T_\omega)\ge\left|\omega-1\right|$. Equality holds only if all eigenvalues of $T_\omega$ are of modulus $|\omega-1|$.

Proof: See "Matrix Iterative Analysis" theorem 3.5.

Theorem (Ostrowski (1954)): Let $A=D-L-L^*$ be a Hermitian matrix and $D-\omega L$ is nonsingular for $0\le\omega\le2$. Then, $\rho(T_\omega)<1$ if and only if $A$ is positive definite and $0<\omega<2$.

Proof: See theorem 3.6 in "Matrix Iterative Analysis". The case $\omega=1$ was first proved by Reich (1949).

Corollary: If $A$ is positive definite then any splitting will converge provided $0<\omega<2$.

Proof: See Corollary 2 in §3.4 in "Matrix Iterative Analysis".

Theorem. Let $A=I-L-U$, where $L+U\ge0$ irreducible and $\rho(L+U)<1$, where $L$ and $U$ are strictly lower and upper triangular matrices. Then $\rho(T_\omega)<1$ and underrelaxation with $\omega\le1$ is not beneficial, i.e., $0<\omega_1<\omega_2\le1$ then

$$ 0\lt \rho(T_{\omega_2})\lt \rho(T_{\omega_1})\lt 1. $$

Proof: Theorem 3.16 in "Matrix Iterative Analysis" using the Stein-Rosenberg theorem.

The relation between the eigenvalues $\lambda$ of the overrelaxation method and the Jacobi iteration $\mu$ for consistently ordered $p$-cyclic matrices is

$$ (\lambda+\omega-1)^p = \lambda^{p-1}\omega^p\mu^p. \tag V $$

Theorem. Let $A$ be consistently ordered $p$-cyclic matrix with nonsingular submatrices. If $\omega\ne0$, $\lambda$ is a nonzero eigenvalue of $T_\omega$ and if $\mu$ satisfies (V), then $\mu$ is an eigenvalue of the block Jacobi matrix. Converseley, if $\mu$ is an eigenvalue of $T_J$ and $\lambda$ satisfies (V), then $\lambda$ is an eigenvalue of $T_\omega$.

Proof: See "Matrix Iterative Analysis" theorem 4.3, or Varga's paper "p-Cyclic Matrices: A Generalization Of The Young-Frankel Successive Overrelaxation Scheme", or see local copy.

From R. Varga:

the main result of this section is a functional relationship (V) between the eigenvalues $\mu$ of the block Jacobi matrix and the eigenvalues $\lambda$ of the block successive overrelaxation matrix. That such a functional relationship actually exists is itself interesting, but the importance of this result lies in the fact that it is the basis for the precise determination of the value of $\omega$ which minimizes $\rho(T_\omega)$.

For the special case $p=2$ we have:

Corollary: The Gauss-Seidel iteration is twice as fast as the Jacobi iteration: $\rho(T_1)=\rho(T_J)^2<1$. The relaxation factor $\omega$ that minimizes $\rho(T_\omega)$ is

$$ \omega = {2\over 1+\sqrt{1-\rho(T_J)^2} }. $$

For this $\omega$, the spectral radius $\rho(T_\omega)=\omega-1$.

So called M-matrices are relevant in the study of iteration methods. See M-matrix.