20.1. Stability of the Sparsest Solution#

We discuss various results related to the stability of the sparsest solution for the sparse approximation problem.

For convenience, we restate the problem. We represent the signal xCN as x=Da+e where a is a sparse approximation of x in D.

(20.1)#a^=arg minaCDa0 subject to xDa2ϵ.

Since we only know x and D (both a and e are unknown to us), hence in general it is not possible to reconstruct a exactly. The term d=aa^ will represent the reconstruction error (note that it is different from the approximation error e). It is important for us to ensure that the solution of the approximation problem is stable; i.e., in the presence of small approximation error e2, the reconstruction error d2 should also be small. For the sparse approximation problem (20.1), we cannot provide a uniqueness guarantee as such. Still, we can identify criteria which ensure that the solution remains stable when the approximation error is bounded.
Our analysis in this section will focus on identifying criteria which ensures this.

We start with generalizing the notion of spark for the noisy case.

  1. Suppose a and b are two solutions of (20.1).

  2. Then Dax2ϵ as well as Dbx2ϵ.

  3. Thus, both Da and Db lie in a ball of radius ϵ around x.

  4. Thus, the maximum distance between Da and Db can be 2ϵ.

  5. Alternatively, using triangle inequality we have

    D(ab)2=Dax+xDb)2Dax2+xDb)22ϵ.
  6. If we define d=ab, then

    Dd22ϵ.

20.1.1. Spark η#

Definition 20.1

Let ACN×D be some matrix. Consider all possible sub-sets of K columns. Let each such set form sub-matrix AΛCN×K where Λ denotes the index set of K indices chosen. We define sparkη(A) as the smallest possible K (number of columns) that guarantees

minΛσK(AΛ)η

where σK denotes the smallest singular value (i.e. K-th singular value) of the sub-matrix AΛ. Note that we are minimizing over all possible index sets Λ with |Λ|=K.

In words, this is the smallest number of columns (indexed by Λ) that can be gathered from A such that the smallest singular value of AΛ is no larger than η; i.e., there exists a sub-matrix of A consisting of sparkη(A) columns whose smallest singular value is η or less. At the same time, all submatrices of A with number of columns less than sparkη(A) have the smallest singular value larger than η.

Relationship with spark:

  1. When the smallest singular value is 0, then the columns are linearly dependent.

  2. Thus, by choosing η=0, we get the smallest number of columns K which are linearly dependent.

  3. This matches with the definition of spark.

  4. Thus,

    spark0(A)=spark(A).
  5. Since singular values are always non-negative, hence η0.

Matrix with unit norm columns:

  1. When columns of A are unit-norm (the case of dictionaries), then any single column sub-matrix has a singular value of 1. Hence,

    spark1(A)=1.
  2. Choosing a value of η>1 doesn’t make any difference since with a single column sub-matrix, we can show that

    sparkη(A)=1η1.

Monotonicity:

  1. Let η1>η2. Let

    K2=sparkη2(A).
  2. Then there exists a sub-matrix consisting of K2 columns of A whose smallest singular value is upper bounded by η2.

  3. Since η1>η2, η1 also serves as an upper bound for the smallest singular value for this sub-matrix.

  4. Clearly then K1=sparkη1(A)K2.

  5. Thus, we note that sparkη is a monotone decreasing function of η. i.e.

    sparkη1(A)sparkη2(A), whenever η1>η2.

Maximum value:

  1. We recall that the spark of A is upper bounded by its rank plus one.

  2. Assuming A to be a full rank matrix, we get following inequality:

    1sparkη(A)spark0(A)=spark(A)N+10η1.

We recall that if Av=0 then v0spark(A). A similar property can be developed for sparkη(A) also.

Theorem 20.1

If Av2η and v2=1, then v0sparkη(A).

Proof. For contradiction, let us assume that K=v0<sparkη(A).

  1. Let Λ=supp(v).

  2. Then Av=AΛvΛ.

  3. Also vΛ2=v2=1.

  4. We recall that the smallest singular value of AΛ is given by

    σmin(AΛ)=infx2=1AΛx2.
  5. Thus,

    AΛx2σmin(AΛ) whenever x2=1.
  6. Thus, in our particular case

    AΛvΛ2σmin(AΛ).

    AΛ has K columns with K<sparkη(A).

  7. Thus, from the definition of sparkη(A)

    σmin(AΛ)>η.
  8. This gives us

    Av2=AΛvΛ2>η

    which contradicts with the assumption that Av2η.

20.1.1.1. Coherence#

In the following, we will focus on the sparkη of a full rank dictionary D. We now establish a connection between sparkη and coherence of a dictionary.

Theorem 20.2

Let D be a full rank dictionary with coherence μ. Then

(20.2)#sparkη(D)1η2μ+1.

Proof. .

  1. We recall from Gershgorin’s theorem that for any square matrix ACK×K, every eigen value λ of A satisfies

    |λaii|j,ji|aij| for some i{1,,K}.
  2. Now consider a matrix A with diagonal elements equal to 1 and off diagonal elements bounded by a value μ.

  3. Then

    |λ1|j,ji|aij|jiμ=(K1)μ.
  4. Thus,

    (K1)μλ1(K1)μ1(K1)μλ1+(K1)μ.
  5. This gives us a lower bound on the smallest eigen value.

    λmin(A)1(K1)μ.
  6. Now consider any index set Λ{1,,D} and consider the submatrix DΛ with |Λ|=sparkη(D)=K.

  7. Define G=DΛHDΛ.

  8. The diagonal elements of G are one, while off-diagonal elements are bounded by μ.

  9. Thus,

    λmin(G)1(K1)μ(K1)μ1λmin(G)K11λmin(G)μK1λmin(G)μ+1.
  10. Since this applies to every sub-matrix DΛ, this in particular applies to the sub-matrix for which σmin(DΛ)η holds.

  11. For this sub-matrix

    λmin(DΛHDΛ)=σmin2(DΛ)η2.
  12. Thus

    K=sparkη(D)1λmin(G)μ+11η2μ+1.

20.1.2. Uncertainty with Spark η#

We now present an uncertainly result for the noisy case.

Theorem 20.3

If a1 and a2 satisfy xDai2ϵ,i=1,2, then

(20.3)#a10+a20sparkη(D), where η=2ϵa1a22.

Proof. .

  1. From triangle inequality we have

    D(a1a2)22ϵ.
  2. We define b=a1a2.

  3. Then Db22ϵ.

  4. Further define v=b/b2 as the normalized vector.

  5. Then

    Dv2=Db2b22ϵb2.
  6. Now define

    η=2ϵb2=2ϵa1a22.
  7. Then from Theorem 20.1 if Dv2η with v2=1, then v0sparkη(D).

  8. Finally,

    a10+a20a1a20=b0=v0sparkη(D).
  9. This concludes the proof.

This result gives us a lower bound on the sum of sparsity levels of two different sparse representations of same vector x under the given bound approximation error.

20.1.3. Localization#

We can now develop a localization result for the sparse approximation up to a Euclidean ball. This is analogous to the uniqueness result in noiseless case.

Theorem 20.4

Given a distance δ0 (bound on distance between two sparse representations) and ϵ (bound on norm of approximation error), set η=2ϵ/δ. Suppose there are two approximate representations ai,i=1,2 both obeying

xDai2ϵ and ai012sparkη(D).

Then a1a22δ.

Proof. .

  1. Since ai012sparkη(D), hence

    a10+a20sparkη(D).
  2. From Theorem 20.3, if we define

    ν=2ϵa1a22,

    then

    a10+a20sparkν(D).
  3. Combining the two, we get

    sparkη(D)a10+a20sparkν(D).
  4. Because of the monotonicity of sparkη(D), we have

    sparkη(D)sparkν(D)ην2ϵδ2ϵa1a22δa1a22

    which completes our proof.

This theorem says that if x has two different sufficiently sparse representations ai with small approximation errors, they fall within a small distance.

20.1.4. Stability using Coherence#

We can now develop a stability result for the (20.1) problem in terms of coherence of the dictionary.

Theorem 20.5

Consider an instance of the (20.1) problem defined by the triplet (D,x,ϵ). Suppose that a sparse vector aCD satisfies the sparsity constraint

a0<12(1+1μ)

and gives a representation of x to within error tolerance ϵ (i.e. xDa2ϵ). Every solution a^ of (20.1) must obey

a^a224ϵ21μ(2a01).

Proof. .

  1. Note that a need not be sparsest possible representation of x within the approximation error ϵ.

  2. But a is a feasible point of (20.1).

  3. Now since a^ is an optimal solution of (20.1) (thus sparsest possible), hence it is at least as sparse as a; i.e.,

    a^0a0.
  4. Due to Theorem 18.25,

    12spark(D)12(1+1μ)>a0.
  5. Thus, there exists a value η0 such that

    12spark(D)12sparkη(D)a0a^0.
  6. From Theorem 20.2 we recall that

    sparkη(D)1η2μ+1.
  7. Thus, we can find a suitable value of η0 such that we can enforce a stricter requirement:

    a012(1η2μ+1)12sparkη(D).
  8. From this we can develop an upper bound on η being

    a012(1η2μ+1)2a0μ1η2+μη21μ(2a01).
  9. If we choose η2=1μ(2a01), then

    a0=12(1η2μ+1)a012sparkη(D)a^0a012sparkη(D)

    continues to hold.

  10. We have two solutions a and a^ both of which satisfy

    a0,a^012sparkη(D)

    and

    xDa2,xDa^2ϵ.
  11. If we choose a δ=2ϵη, then applying Theorem 20.4, we will get

    aa^22δ2=4ϵ2η2=4ϵ21μ(2a01).