Dictionaries
Contents
18.6. Dictionaries#
In this section we review various properties associated with a dictionary
We recall that a dictionary
with
The vectors
Note that we are using the same symbol
This shouldn’t be causing any confusion.
When we write the subscript as
Often, we will be working with a subset of atoms in a dictionary.
Usually such a subset
of atoms will be indexed by an index set
Often we will need the notion of a subdictionary [79] described below.
18.6.1. Subdictionaries#
Definition 18.16 (Subdictionary)
A subdictionary is a linearly independent collection of atoms.
Let
Remark 18.5 (Rank of subdictionary)
A subdictionary is full rank.
This is obvious since it is a collection of linearly independent atoms.
For subdictionaries, often we will say
Let
and
Each dual vector
Therefore, the inverse Gram matrix lists the inner products between the dual vectors.
Sometimes we will be discussing tools
which apply for general matrices. We will use
the symbol
18.6.2. Spark#
Definition 18.17 (Spark)
The spark of a given matrix
Note that the definition of spark applies to all matrices (wide, tall or square). It is not restricted to the synthesis matrices for a dictionary.
Correspondingly, the spark of a dictionary is defined as the minimum number of atoms which are linearly dependent.
We recall that rank of a matrix is defined as the maximum number of columns which
are linearly independent.
Definition of spark bears remarkable resemblance yet its very hard
to obtain as it requires a combinatorial search over all possible
subsets of columns of
Example 18.21 (Spark)
Spark of the
identity matrixis 4 since all columns are linearly independent.
Spark of the
matrixis 2 since column 1 and 3 are linearly dependent.
If a matrix has a column with all zero entries, then the spark of such a matrix is 1. This is a trivial case and we will not consider such matrices in the sequel.
In general for an
synthesis matrix, .
A naive combinatorial algorithm to calculate the spark of a matrix is given below.
Algorithm 18.1 (A naive algorithm for computing the spark of a matrix)
Inputs:
: a matrix
Outputs:
: Spark of
Algorithm:
Let
.For every
:Identify
ways of choosing columns from columns of .For every choice of
columns:If columns are linearly dependent:
.Return.
All columns are linearly independent.
.
18.6.2.1. Spark and Nullspace#
Remark 18.6 (Spark and sparsity of null space vectors)
The
Proof. We proceed as follows:
If
then .Thus non-zero entries in
pick a set of columns in which are linearly dependent.Clearly
indicates the number of columns in the set which are linearly dependent.By definition, spark of
indicates the minimum number of columns which are linearly dependent.Hence the result:
18.6.2.2. Uniqueness-Spark#
Spark is useful in characterizing the uniqueness of the solution
of a
Theorem 18.23 (Uniqueness of a sparse solution for an underdetermined system via spark)
Consider a solution
then it is necessarily the sparsest solution.
Proof. Let
Due to Remark 18.6 we have
Now
Hence, if
for any other solution
This result is quite useful as it establishes a global optimality criterion for the
As long as
Note that we are only saying that if a sufficiently sparse solution is found then it is unique. We are not claiming that it is possible to find such a solution.
Obviously, the larger the spark, we can guarantee uniqueness for signals with higher sparsity levels. So a natural question is: How large can spark of a dictionary be? We consider few examples.
Example 18.22 (Spark of Gaussian dictionaries)
Consider a dictionary
Since a dictionary requires all its atoms to be unit-norms, hence we divide the each of the random vectors with their norms.
We know that with probability
any set of independent Gaussian random vectors is linearly independent.Also, since
hence a set of atoms is always linearly dependent.Thus
.
Thus, if a solution to EXACT-SPARSE problem contains
Example 18.23 (Spark of Dirac Fourier basis)
For
it can be shown that
In this case, the sparsity level of a unique solution must be less than
18.6.3. Coherence#
Finding out the spark of a dictionary
Definition 18.18 (Coherence of a dictionary)
The coherence of a dictionary
If the dictionary consists of two orthonormal bases, then coherence is also known as mutual coherence or proximity; see Definition 18.1.
We note that
We note that by definition
For an orthonormal basis
Thus
In the following, we will use the notation
The off-diagonal entries of the Gram matrix are captured by the
matrix
The inner product between any two atoms
If a dictionary is uniform in the sense that there is not much variation in
Definition 18.19 (Incoherent dictionary)
We say that a dictionary is incoherent if the coherence of the dictionary is small.
We are looking for dictionaries which are incoherent. In the sequel we will see how incoherence plays a role in sparse approximation.
Example 18.24 (Coherence of two ortho bases)
We established in Theorem 18.3 that coherence of two ortho-bases is bounded by
In particular we showed in Theorem 18.4 that
coherence of Dirac Fourier basis is
Example 18.25 (Coherence: Multi-ONB dictionary)
A dictionary of concatenated orthonormal bases is called a multi-ONB.
For some
Theorem 18.24 (Coherence lower bound)
A lower bound on the coherence of a general dictionary is given by
Definition 18.20 (Grassmannian frame)
If each atomic inner product meets this bound, the dictionary is called an optimal Grassmannian frame.
The definition of coherence can be extended to arbitrary matrices
Definition 18.21 (Coherence for arbitrary matrices)
The coherence of a matrix
Then coherence of
It is assumed that none of the columns in
18.6.3.1. Lower Bounds for Spark#
Coherence of a matrix is easy to compute. More interestingly it also provides a lower bound on the spark of a matrix.
Theorem 18.25 (Lower bound on spark in terms of coherence)
For any matrix
Proof. We note that scaling of a column of
We now construct the Gram matrix of
given by .We note that
since each column of
is unit norm.Also
Consider any
columns from and construct its Gram matrix.This is nothing but a leading minor of size
from the matrix .From the Gershgorin disk theorem, if this minor is diagonally dominant, i.e. if
then this sub-matrix of
is positive definite and so corresponding columns from are linearly independent.But
and
for the minor under consideration.
Hence for
columns to be linearly independent the following condition is sufficientThus if
then every set of
columns from is linearly independent.Hence, the smallest possible set of linearly dependent columns must satisfy
This establishes the lower bound that
This bound on spark doesn’t make any assumptions on the structure of the dictionary. In fact, imposing additional structure on the dictionary can give better bounds. Let us look at an example for a two ortho-basis [28].
Theorem 18.26 (Lower bound on spark for two ortho bases)
Let
For maximally incoherent two orthonormal bases, we know that
18.6.3.2. Uniqueness-Coherence#
We can now establish a uniqueness condition for sparse solution of
Theorem 18.27 (Uniqueness of a sparse solution of an underdetermined system via coherence)
Consider a solution
then it is necessarily the sparsest solution.
Proof. This is a straightforward application of Theorem 18.23 and Theorem 18.25.
It is interesting to compare the two uniqueness theorems: Theorem 18.23 and Theorem 18.27.
Theorem 18.23 uses spark, is sharp and is far more powerful than Theorem 18.27.
Coherence can never be smaller than
However, spark can be easily as large as
We recall from Theorem 18.8 that the bound for
sparsity level of sparest solution in two-ortho basis
which is a larger bound than Theorem 18.27 for general dictionaries by a factor of 2.
Thus, we note that coherence gives a weaker bound than spark for supportable sparsity levels of unique solutions. The advantage that coherence has is that it is easily computable and doesn’t require any special structure on the dictionary (two ortho basis has a special structure).
18.6.3.3. Singular Values of Subdictionaries#
Theorem 18.28 (Singular values of subdictionaries and coherence)
Let
Moreover, the singular values of the sub-dictionary
Proof. We recall from Gershgorin’s circle theorem that for
any square matrix
Now consider the matrix
with diagonal elements equal to 1 and off diagonal elements bounded by the coherence .Then
Thus,
This gives us a lower bound on the smallest eigen value.
Since
is positive definite ( is full-rank), hence its eigen values are positive. Thus, the above lower bound is useful only ifWe also get an upper bound on the eigen values of
given byThe bounds on singular values of
are obtained as a straight-forward extension by taking square roots on the expressions.
18.6.3.4. Embeddings using Subdictionaries#
Theorem 18.29 (Norm bounds for embeddings with real dictionaries)
Let
where
Proof. We can see that
Expanding we have
The terms in the R.H.S. for
are given bySumming over
, we getWe are now left with
off diagonal terms.Each of these terms is bounded by
Summing over the
off-diagonal terms we get:Thus,
Thus,
We get the result by slight reordering of terms:
We note that due to Theorem 18.15
Thus, the inequalities can be written as
Alternatively,
Finally, due to Theorem 18.11
This gives us
We now present the above theorem for the complex case. The proof is based on singular values. This proof is simpler and more general than the one presented above.
Theorem 18.30 (Norm bounds for embeddings with complex dictionaries)
Let
18.6.4. Babel Function#
Recalling the definition of coherence, we note that it reflects only the extreme correlations between atoms of dictionary. If most of the inner products are small compared to one dominating inner product, then the value of coherence is highly misleading.
In [77], Tropp introduced Babel function, which measures the maximum total coherence between a fixed atom and a collection of other atoms. The Babel function quantifies an idea as to how much the atoms of a dictionary are “speaking the same language”.
Definition 18.22 (Babel function)
The Babel function for a dictionary
where the vector
for sparsity level
Let us dig deeper into what is going on here.
For each value of
Let the atoms spanning one such subspace be identified by an index set
All other atoms are indexed by the index set
denote the atoms indexed by
We run it for every
We finally compute the maximum over all possible
We first make a few observations over the properties of Babel function. Babel function is a generalization of coherence.
Remark 18.7 (Babel function for
For
the coherence of
Theorem 18.31 (Monotonicity of babel function)
Proof. This is easy to see since the sum
cannot decrease as
Theorem 18.32 (An upper bound for Babel function)
Babel function is upper bounded by coherence as per
Proof. Note that
This leads to
18.6.4.1. Computation of Babel Function#
It might seem at first that computation of Babel function is combinatorial and hence prohibitively expensive. But it is not true.
Example 18.26 (Procedure for computing the Babel function)
We will demonstrate this through an example in this section. Our example synthesis matrix will be
From the synthesis matrix
We then take absolute value of each entry in
We now sort every row in descending order to obtain a
new matrix
First entry in each row is now
Now look at column 2 in
Thus,
i.e. the coherence is given by the maximum in the 2nd column of
Looking carefully we can note that for
while
For the running example the Babel function values are given by
We see that Babel function stops increasing after
18.6.4.2. Babel Function and Spark#
We first note that Babel function tells something about linear independence
of the columns of
Theorem 18.33 (Linear independence of atoms and Babel function)
Let
then all selections of
Proof. We recall from the proof of Theorem 18.25 that if
then every set of
Thus if
then all selections of
This leads us to a lower bound on spark from Babel function.
Lemma 18.1 (Lower bound on spark based on Babel function)
A lower bound of spark of a dictionary
Proof. For all
Finally
An earlier version of this result also appeared in [28] theorem 6.
18.6.4.3. Babel Function and Singular Values#
Theorem 18.34 (Subdictionary singular value bounds from Babel function)
Let
Proof. Consider the Gram matrix
Also let
so that
The Gershgorin Disc Theorem states that every
eigenvalue of
Since
Also we note that
since there are
Thus if
This is OK since
But the eigen values of
Corollary 18.2
Let
Proof. From previous theorem we have
Since the singular values are always non-negative,
the lower bound is useful only when
Theorem 18.35 (Uncertainty principle : Babel function)
Let
Proof. If
18.6.4.4. Babel Function and Gram Matrix of Subdictionaries#
Let
Theorem 18.36 (A bound on the norms of Gram matrix)
Proof. Since
Now each row consists of a diagonal entry
and off diagonal entries.The absolute sum of all the off-diagonal entries in a row is upper bounded by
.Thus, the absolute sum of all the entries in a row is upper bounded by
. is nothing but the maximum norm of rows of .Hence
Theorem 18.37 (A bound on the norms of inverse Gram matrix)
Suppose that
Proof. Since
We can write
as where consists of off-diagonal entries in .Recall that since atoms are unit norm, hence diagonal entries in
are 1.Each row of
lists inner products between a fixed atom and other atoms (leaving 0 at the diagonal entry).Therefore
since
norm of any row is upper bounded by the babel number .Now
can be written as a Neumann seriesThus
since
.Finally
Thus
18.6.4.5. Quasi Incoherent Dictionaries#
Definition 18.23 (Quasi incoherent dictionary)
When the Babel function of a dictionary grows slowly, we say that the dictionary is quasi-incoherent.