4.12. Matrix Norms#

This section reviews various matrix norms on the vector space of complex matrices over the field of complex numbers (Cm×n,C).

We know (Cm×n,C) is a finite dimensional vector space with dimension mn. We will usually refer to it as Cm×n.

Matrix norms will follow the usual definition of norms for a vector space.

Definition 4.139 (Matrix norm)

A function :Cm×nR is called a matrix norm on Cm×n if for all A,BCm×n and all αC it satisfies the following

  • [Positivity]

    A0

    with A=0A=O.

  • [Homogeneity]

    αA=|α|A.
  • [Triangle inequality]

    A+BA+B.

We recall some of the standard results on finite dimensional normed vector spaces.

All matrix norms are equivalent. Let and be two different matrix norms on Cm×n. Then there exist two constants a and b such that the following holds

aAAbAACm×n.

A matrix norm is a continuous function :Cm×nR.

4.12.1. Norms like p on Complex Vector Space#

Following norms are quite like p norms on finite dimensional complex vector space Cn. They are developed by the fact that the matrix vector space Cm×n has one to one correspondence with the complex vector space Cmn.

Definition 4.140 (Sum norm)

Let ACm×n and A=[aij].

Matrix sum norm is defined as

AS=i=1mj=1n|aij|

Definition 4.141 (Frobenius norm)

Let ACm×n and A=[aij].

Matrix Frobenius norm is defined as

AF=(i=1mj=1n|aij|2)12.

Definition 4.142 (Max norm)

Let ACm×n and A=[aij].

Matrix Max norm is defined as

AM=max1im1jn|aij|.

4.12.2. Properties of Frobenius Norm#

We now prove some elementary properties of Frobenius norm.

Lemma 4.86

The Frobenius norm of a matrix is equal to the Frobenius norm of its Hermitian transpose.

AHF=AF.

Proof. This is valid since the norm involves only the absolute values of entries in the matrix.

  1. Let

    A=[aij].
  2. Then

    AH=[aji]
  3. Then

    AHF2=(j=1ni=1m|aij|2)=(i=1mj=1n|aij|2)=AF2.
  4. Now

    AHF2=AF2AHF=AF.

Lemma 4.87 (Expansion in squared norms of columns)

Let ACm×n be written as a row of column vectors

A=[a1an].

Then

AF2=j=1naj22.

Proof. We note that

aj22=i=1maij22.

Now

AF2=(i=1mj=1n|aij|2)=(j=1n(i=1m|aij|2))=(j=1naj22).

We thus showed that that the square of the Frobenius norm of a matrix is nothing but the sum of squares of 2 norms of its columns.

Lemma 4.88

Let ACm×n be written as a column of row vectors

A=[a1am].

Then

AF2=i=1mai22.

Proof. We note that

ai22=j=1naij22.

Now

AF2=(i=1mj=1n|aij|2)=i=1mai22.

We now consider how the Frobenius norm is affected with the action of unitary matrices.

  1. Let A be any arbitrary matrix in Cm×n.

  2. Let U be some unitary matrices in Cm×m.

  3. Let V be some unitary matrices in Cn×n.

We present our first result that multiplication with unitary matrices doesn’t change Frobenius norm of a matrix.

Theorem 4.141

The Frobenius norm of a matrix is invariant to pre or post multiplication by a unitary matrix. i.e.

UAF=AF

and

AVF=AF.

Proof. We can write A as

A=[a1an].

So

UA=[Ua1Uan].

Then applying Lemma 4.87 clearly

UAF2=j=1nUaj22.

But we know that unitary matrices are norm preserving. Hence

Uaj22=aj22.

Thus

UAF2=j=1naj22=AF2

which implies

UAF=AF.

Similarly writing A as

A=[r1rm].

we have

AV=[r1VrmV].

Then applying Lemma 4.88 clearly

AVF2=i=1mriV22.

But we know that unitary matrices are norm preserving. Hence

riV22=ri22.

Thus

AVF2=i=1mri22=AF2

which implies

AVF=AF.

An alternative approach for the 2nd part of the proof using the first part is just one line

AVF=(AV)HF=VHAHF=AHF=AF.

In above we use Lemma 4.86 and the fact that V is a unitary matrix implies that VH is also a unitary matrix. We have already shown that pre multiplication by a unitary matrix preserves Frobenius norm.

Theorem 4.142 (Consistency of Frobenius norm)

Let ACm×n and BCn×p be two matrices. Then the Frobenius norm of their product is less than or equal to the product of Frobenius norms of the matrices themselves. i.e.

ABFAFBF.

Proof. We can write A as

A=[a1TamT]

where ai are m column vectors corresponding to rows of A. Similarly we can write B as

B=[b1bp]

where bi are column vectors corresponding to columns of B.

Then

AB=[a1TamT][b1bp]=[a1Tb1a1TbpamTb1amTbp]=[aiTbj].

Now looking carefully

aiTbj=ai,bj.

Applying the Cauchy-Schwartz inequality we have

|ai,bj|2ai22bj22=ai22bj22.

Now

ABF2=i=1mj=1p|aiTbj|2i=1mj=1pai22bj22=(i=1mai22)(j=1pbj22)=AF2BF2

which implies

ABFAFBF

by taking square roots on both sides.

Corollary 4.28

Let ACm×n and let xCn. Then

Ax2AFx2.

Proof. We note that Frobenius norm for a column matrix is same as 2 norm for corresponding column vector. i.e.

xF=x2xCn.

Now applying Theorem 4.142 we have

Ax2=AxFAFxF=AFx2xCn.

It turns out that Frobenius norm is intimately related to the singular value decomposition of a matrix.

Lemma 4.89 (Frobenius norm and singular values)

Let ACm×n. Let the singular value decomposition of A be given by

A=UΣVH.

Let the singular values of A be σ1,,σk where k=min(m,n). Then

AF=i=1kσi2.

Proof. This comes from the invariance of Frobenius norm with unitary transformations.

A=UΣVHAF=UΣVHF.

But

UΣVHF=ΣVHF=ΣF

since U and V are unitary matrices (see Theorem 4.141 ). Now the only nonzero terms in Σ are the singular values. Hence

AF=ΣF=i=1kσi2.

4.12.3. Consistency of a Matrix Norm#

Definition 4.143 (Consistent matrix norm)

A matrix norm is called consistent on Cn×n if

(4.19)#ABAB

holds true for all A,BCn×n. A matrix norm is called consistent if it is defined on Cm×n for all m,nN and (4.19) holds for all matrices A,B for which the product AB is defined.

A consistent matrix norm is also known as a sub-multiplicative norm.

With this definition and results in Theorem 4.142 we can see that Frobenius norm is consistent.

4.12.4. Subordinate Matrix Norm#

A matrix operates on vectors from one space to generate vectors in another space. It is interesting to explore the connection between the norm of a matrix and norms of vectors in the domain and co-domain of a matrix.

Definition 4.144

Let m,nN be given. Let α be some norm on Cm and β be some norm on Cn. Let be some norm on matrices in Cm×n. We say that is subordinate to the vector norms α and β if

AxαAxβ

for all ACm×n and for all xCn. In other words the length of the vector doesn’t increase by the operation of A beyond a factor given by the norm of the matrix itself.

If α and β are same then we say that is subordinate to the vector norm α.

We have shown earlier in Corollary 4.28 that Frobenius norm is subordinate to Euclidean norm.

4.12.5. Operator Norm#

We now consider the maximum factor by which a matrix A can increase the length of a vector.

Definition 4.145 (Operator norm)

Let m,nN be given. Let α be some norm on Cn and β be some norm on Cm. For ACm×n we define

AAαβmaxx0Axβxα.

The term Axβxα represents the factor with which the length of x increased by operation of A. We simply pick up the maximum value of such scaling factors over all nonzero x.

The norm as defined above is known as (αβ) operator norm, the (αβ)-norm, or simply the α-norm if α=β.

We need to verify that this definition satisfies all properties of a norm.

Nonnegativity

  1. Clearly if A=O then Ax=0 always, hence A=0.

  2. Conversely, if A=0 then Axβ=0xCn.

  3. Hence Ax=0 for every xCn.

  4. In particular this is true for the unit vectors eiCn.

  5. The i-th column of A is given by Aei which is 0.

  6. Thus each column in A is 0.

  7. Hence A=O.

Homogeneity

  1. Consider cC.

  2. Then

    cA=maxx0cAxβxα=|c|maxx0Axβxα=|c|A.

We now present some useful observations on operator norm before we can prove triangle inequality for operator norm.

  1. For any xker(A), Ax=0 hence we only need to consider vectors which don’t belong to the kernel of A.

  2. Thus we can write

    Aαβ=maxxker(A)Axβxα.
  3. We also note that

    Acxβcxα=|c|Axβ|c|xα=Axβxαc0,x0.
  4. Thus, it is sufficient to find the maximum on unit norm vectors:

    Aαβ=maxxα=1Axβ.
  5. Note that since xα=1 hence the term in denominator goes away.

Lemma 4.90 (Operator norm is subordinate)

The (αβ)-operator norm is subordinate to vector norms α and β. i.e.

AxβAαβxα.

Proof. For x=0 the inequality is trivially satisfied. For x0, by definition, we have

AαβAxβxαAαβxαAxβ.

Theorem 4.143 (Existence of the maximizer for operator norm)

There exists a vector xCn with unit norm (xα=1) such that

Aαβ=Axβ.

Proof. Recall that

Aαβ=maxxα=1Axβ.
  1. The norm function β is continuous on Cm.

  2. The mapping xAx is continuous.

  3. Hence the function xAxβ is continuous.

  4. The set {xCn|xα=1} is compact.

  5. Hence, the function attains a supremum value over this set.

We are now ready to prove triangle inequality for operator norm.

Lemma 4.91 (Triangle inequality for operator norm)

Operator norm as defined in Definition 4.145 satisfies triangle inequality.

Proof. Let A and B be some matrices in Cm×n.

  1. Consider the operator norm of matrix A+B.

  2. From previous remarks, there exists some vector xCn with xα=1 such that

    A+B=(A+B)xβ.
  3. Now

    (A+B)xβ=Ax+BxβAxβ+Bxβ.
  4. From subordinate property of operator norm, we have

    AxβAxα=A

    and

    BxβBxα=B

    since xα=1.

  5. Hence we have

    A+BA+B.

It turns out that operator norm is also consistent under certain conditions.

Lemma 4.92 (Consistency of operator norm)

Let α be defined over all mN. Let β=α. Then the operator norm

Aα=maxx0Axαxα

is consistent.

Proof. We need to show that

ABαAαBα.
  1. Now

    ABα=maxx0ABxαxα.
  2. We note that if Bx=0, then ABx=0.

  3. Hence we can rewrite as

    ABα=maxBx0ABxαxα.
  4. Now if Bx0 then Bxα0.

  5. Hence

    ABxαxα=ABxαBxαBxαxα

    and

    maxBx0ABxαxαmaxBx0ABxαBxαmaxBx0Bxαxα.
  6. Clearly

    Bα=maxBx0Bxαxα.
  7. Furthermore

    maxBx0ABxαBxαmaxy0Ayαyα=Aα.
  8. Thus we have

    ABαAαBα.

4.12.6. p-norm for Matrices#

We recall the definition of p norms for vectors xCn

xp={(i=1n|x|ip)1pp[1,)max1in|xi|p=.

The operator norms p defined from p vector norms are of specific interest.

Definition 4.146 (Matrix p-norm)

The p-norm for a matrix ACm×n is defined as

Apmaxx0Axpxp=maxxp=1Axp

where xp is the standard p norm for vectors in Cm and Cn.

As per Lemma 4.92 p-norms for matrices are consistent norms. They are also sub-ordinate to p vector norms.

Special cases are considered for p=1,2 and .

Theorem 4.144 (Max column sum, max row sum, spectral norms)

Let ACm×n.

For p=1 we have

A1=max1jni=1m|aij|.

This is also known as max column sum norm.

For p= we have

A=max1imj=1n|aij|.

This is also known as max row sum norm.

Finally for p=2 we have

A2=σ1

where σ1 is the largest singular value of A. This is also known as spectral norm.

Proof. Max column sum norm:

  1. Let

    A=[a1,an].
  2. Then

    Ax1=j=1nxjaj1j=1nxjaj1=j=1n|xj|aj1max1jnaj1j=1n|xj|=max1jnaj1x1.
  3. Thus,

    A1=maxx0Ax1x1max1jnaj1

    which the maximum column sum.

  4. We need to show that this upper bound is indeed an equality.

  5. Indeed for any x=ej where ej is a unit vector with 1 in j-th entry and 0 elsewhere,

    Aej1=aj1.
  6. Thus

    A1aj11jn.
  7. Combining the two, we see that

    A1=max1jnaj1.

Max row sum norm:

  1. For p=, we proceed as follows.

    Ax=max1im|j=1naijxj|max1imj=1n|aij||xj|max1jn|xj|max1imj=1n|aij|=xmax1imai1

    where ai are the rows of A.

  2. This shows that

    Axmax1imai1.
  3. We need to show that this is indeed an equality.

  4. Fix an i=k and choose x such that

    xj=sgn(akj).
  5. Clearly x=1.

  6. Then

    Ax=max1im|j=1naijxj||j=1nakjxj|=|j=1n|akj||=j=1n|akj|=ak1.
  7. Thus,

    Amax1imai1.
  8. Combining the two inequalities we get:

    A=max1imai1.

Spectral norm:

  1. Remaining case is for p=2.

  2. For any vector x with x2=1,

    Ax2=UΣVHx2=U(ΣVHx)2=ΣVHx2

    since 2 norm is invariant to unitary transformations.

  3. Let v=VHx.

  4. Then v2=VHx2=x2=1.

  5. Let k=min(m,n).

  6. Now

    Ax2=Σv2=(j=1k|σjvj|2)12σ1(j=1k|vj|2)12=σ1v2=σ1.
  7. This shows that

    A2σ1.
  8. Now consider some vector x such that v=(1,0,,0).

  9. Then

    Ax2=Σv2=σ1.
  10. Thus

    A2σ1.
  11. Combining the two, we get that A2=σ1.

4.12.7. The 2-norm#

Theorem 4.145

Let ACn×n have singular values σ1σ2σn. Let the eigen values for A be λ1,λ2,,λn with |λ1||λ2||λn|. Then the following hold

A2=σ1

and if A is nonsingular, then

A12=1σn.

If A is symmetric and positive definite, then

A2=λ1

and if A is nonsingular, then

A12=1λn.

If A is normal then

A2=|λ1|

and if A is nonsingular, then

A12=1|λn|.

4.12.8. Unitary Invariant Norms#

Definition 4.147 (Unitary invariant norm)

A matrix norm on Cm×n is called unitary invariant if UAV=A for any ACm×n and any unitary matrices UCm×m and VCn×n.

We have already seen in Theorem 4.141 that Frobenius norm is unitary invariant.

It turns out that spectral norm is also unitary invariant.

4.12.9. More Properties of Operator Norms#

In this section we will focus on operator norms connecting normed linear spaces (Cn,p) and (Cm,q). Typical values of p,q would be in {1,2,}.

We recall that

Apq=maxx0Axqxp=maxxp=1Axq=maxxp1Axq.

The table below [78] shows how to compute different (p,q) norms. Some can be computed easily while others are NP-hard to compute.

Table 4.1 Typical (pq) norms#

p

q

Apq

Calculation

1

1

A1

Maximum l1 norm of a column

1

2

A12

Maximum l2 norm of a column

1

A1

Maximum absolute entry of a matrix

2

1

A21

NP hard

2

2

A2

Maximum singular value

2

A2

Maximum l2 norm of a row

1

A1

NP hard

2

A2

NP hard

A

Maximum l1-norm of a row

The topological dual of the finite dimensional normed linear space (Cn,p) is the normed linear space (Cn,p) where

1p+1p=1.
  • 2-norm is dual of l2-norm. It is a self dual.

  • 1 norm and -norm are dual of each other.

When a matrix A maps from the space (Cn,p) to the space (Cm,q), we can view its conjugate transpose AH as a mapping from the space (Cm,q) to (Cn,p).

Theorem 4.146 (Operator norm and conjugate transpose)

Operator norm of a matrix always equals the operator norm of its conjugate transpose. In other words

Apq=AHqp

where

1p+1p=1,1q+1q=1.

Specific applications of this result are:

A2=AH2.

This is obvious since the maximum singular value of a matrix and its conjugate transpose are same.

A1=AH,A=AH1.

This is also obvious since max column sum of A is same as the max row sum norm of AH and vice versa.

A1=AH1.
A12=AH2.
A2=AH21.

We now need to show the result for the general case (arbitrary 1p,q).

Proof. TODO

Theorem 4.147 (1p norm)

A1p=max1jnajp.

where

A=[a1,an].

Proof. Expanding:

Axp=j=1nxjajpj=1nxjajp=j=1n|xj|ajpmax1jnajpj=1n|xj|=max1jnajpx1.

Thus,

A1p=maxx0Axpx1max1jnajp.

We need to show that this upper bound is indeed an equality.

Indeed for any x=ej where ej is a unit vector with 1 in j-th entry and 0 elsewhere,

Aejp=ajp.

Thus

A1pajp1jn.

Combining the two, we see that

A1p=max1jnajp.

Theorem 4.148 (p norm)

Ap=max1imaiq

where

1p+1q=1.

Proof. Using Theorem 4.146, we get

Ap=AH1q.

Using Theorem 4.147, we get

AH1q=max1imaiq.

This completes the proof.

Theorem 4.149 (Consistency of pq norms)

For two matrices A and B with p,q,s1, we have

ABpqBpsAsq.

Proof. We start with

ABpq=maxxp=1A(Bx)q.

From Lemma 4.90, we obtain

A(Bx)qAsq(Bx)s.

Thus,

ABpqAsqmaxxp=1(Bx)s=AsqBps.

Theorem 4.150 (Consistency of p norms)

For two matrices A and B and p1, we have

ABpABp.

Proof. We start with

ABp=maxxp=1A(Bx).

From Lemma 4.90, we obtain

A(Bx)A(Bx).

Thus,

ABpAmaxxp=1(Bx)=ABp.

Theorem 4.151 (Dominance of p-norm on p norm)

ApAp.

In particular

A1A1.

Also:

A2A2.

Proof. Choosing q= and s=p and applying Theorem 4.149

IApAppIp.

But Ip is the maximum p norm of any row of I which is 1. Thus

ApApp.

Consider the expression

minzC(AH)z0Azqzp.

The constraint zC(AH),z0 means there exists some vector uker(AH) such that z=AHu.

This expression measures the factor by which the non-singular part of A can change the length of a vector.

Theorem 4.152

The following bound holds for every matrix A:

minzC(AH)z0AzqzpAqp1.

If A is surjective (onto), then the equality holds. When A is bijective (one-one onto, square, invertible), then the result implies

minzC(AH)z0Azqzp=A1qp1.

Proof. The spaces C(AH) and C(A) have same dimensions given by rank(A).

  1. We recall that AA is a projector onto the column space of AH.

    w=Azz=Aw=AAzzC(AH).
  2. As a result we can write

    zpAzq=Awpwq

    whenever zC(AH) and z0.

  3. Now

    [minzC(AH)z0Azqzp]1=maxzC(AH)z0zpAzq=maxwC(A)w0Awpwqmaxw0Awpwq.
  4. When A is surjective, then C(A)=Cm.

  5. Hence

    maxwC(A)w0Awpwq=maxw0Awpwq.
  6. Thus, the inequality changes into equality.

  7. Finally

    maxw0Awpwq=Aqp

    which completes the proof.

4.12.10. Row Column Norms#

A common way of measuring the norm of a matrix is to compute the p norm for each row and then compute the q norm of the resultant column vector. Such norms are known as row column norms.

  1. If p=q=2, then the resultant norm is same as the Frobenius norm.

  2. If p=q=1, then the resultant norm is same as the sum norm.

  3. If p=q=, then the resultant norm is same as the max norm.

Thus, one can think of the row column norms as a generalization of these norms.

Definition 4.148 (Row-column norms)

Let A be an m×n matrix with rows ai as

A=[a1am]

Then we define

Ap,max1imaip=max1im(j=1n|aji|p)1p

where 1p<. In other words, we take p-norms of all row vectors and then find the maximum.

We define

A,=maxi,j|aij|.

This is equivalent to taking norm on each row and then taking the maximum of all the norms.

For 1p,q<, we define the norm

Ap,q[i=1m(aip)q]1q.

In other words, we compute p-norm of all the row vectors to form another vector and then take q-norm of that vector.

Note that the norm Ap, is different from the operator norm Ap. Similarly Ap,q is different from Apq.

Theorem 4.153

Ap,=Aq

where

1p+1q=1.

Proof. From Theorem 4.148 we get

Aq=max1imaip.

This is exactly the definition of Ap,.

Theorem 4.154

A1p=AHp,.

Proof. We note that:

A1p=AHq.

From Theorem 4.153

AHq=AHp,.

Theorem 4.155

For any two matrices A,B, we have

ABp,Bp,A.

Proof. Let q be such that 1p+1q=1.

  1. From Theorem 4.150, we have

    ABqABq.
  2. From Theorem 4.153

    ABq=ABp,

    and

    Bq=Bp,.
  3. Thus

    ABp,ABp,.

Theorem 4.156 (Relations between (p,q) and (pq) norms)

Relations between (p,q) norms and (pq) norms

A1,=A.
A2,=A2.
A,=A1.
A11=AH1,.
A12=AH2,

Proof. The first three are straight forward applications of Theorem 4.153. The next two are applications of Theorem 4.154. See also Table 4.1.

4.12.11. Block Diagonally Dominant Matrices and Generalized Gershgorin Circle Theorem#

In [35] the idea of diagonally dominant matrices (see Diagonally Dominant Matrices) has been generalized to block matrices using matrix norms. We consider the specific case with spectral norm.

Definition 4.149 (Block diagonally dominant matrix)

Let A be a square matrix in Cn×n which is partitioned in following manner

A=[A11A12A1kA21A22A2kAk1Ak2Akk]

where each of the submatrices Aij is a square matrix of size m×m. Thus n=km.

A is called block diagonally dominant if

Aii2j,jiAij2.

holds true for all 1ik.

If the inequality satisfies strictly for all i, then A is called block strictly diagonally dominant matrix.

Theorem 4.157 (Nonsingularity of block strictly diagonally dominant matrices)

If the partitioned matrix A of Definition 4.149 is block strictly diagonally dominant matrix, then it is nonsingular.

For proof see [35].

This leads to the generalized Gershgorin disc theorem.

Theorem 4.158 (Generalized Gershgorin disc theorem)

Let A be a square matrix in Cn×n which is partitioned in following manner

A=[A11A12A1kA21A22A2kAk1Ak2Akk]

where each of the submatrices Aij is a square matrix of size m×m.

Then each eigenvalue λ of A satisfies

λIAii2j,jiAij2 for some i{1,2,,k}.

For proof see [35].

Since the 2-norm of a positive semidefinite matrix is nothing but its largest eigen value, the theorem directly leads to the following result.

Corollary 4.29 (2 norm of a Hermitian positive semidefinite matrix)

Let A be a Hermitian positive semidefinite matrix. Let A be partitioned as in Theorem 4.158. Then its 2-norm A2 satisfies

|A2Aii2|j,jiAij2 for some i{1,2,,k}.