18.9. Dictionaries II#

This section continues the development of dictionaries for sparse and redundant representations.

18.9.1. Spark#

We present some more results on spark of a dictionary.

18.9.1.1. Upper Bounds for Spark#

Whenever a set of atoms in a dictionary are linearly dependent, the dependence corresponds to some vector in its null space. Thus, identifying the spark of a dictionary essentially amounts of sifting through the vectors in its null space and finding one with smallest \(\ell_0\)-“norm”. This can be cast as an optimization problem:

(18.57)#\[\begin{split}& \underset{\bv}{\text{minimize}} & & \| \bv \|_0 \\ & \text{subject to } & & \bDDD \bv = \bzero.\end{split}\]

Note that the solution \(\bv^*\) of this problem is not unique. If \(\bv^*\) is a solution that \(c \bv^*\) for any \(c \neq 0\) is also a solution. Spark is the optimum value of the objective function \(\| \bv \|_0\). We now define a sequence of optimization problems for \(k = 1, \dots, D\)

(18.58)#\[\begin{split} & \underset{\bv}{\text{minimize}} & & \| \bv \|_0 \\ & \text{subject to } & & \bDDD \bv = \bzero, v_k = 1.\end{split}\]
  1. The \(k\)-th problem constrains the solution to choose atom \(\bd_k\) from the dictionary.

  2. Since the minimal set of linearly dependent atoms in \(\bDDD\) will contain at least two vectors, hence \(\spark(\bDDD)\) would correspond to the optimal value of one (or more) of the problems (18.58).

  3. Formally, if we denote \(\bv_k^{0, *}\) as an optimal vector for the problem (18.58), then

    \[ \spark(\bDDD) = \underset{1 \leq k \leq D} {\text{minimize }}\| \bv_k^{0, *}\|_0. \]
  4. Thus, solving (18.57) is equivalent to solving all \(D\) problems specified by (18.58) and then finding the minimum \(\ell_0\)-“norm” among them.

  5. The problems (18.58) are still computationally intractable.

We now change each of the \(\ell_0\)-“norm” (18.58) minimization problems to \(\ell_1\)-“norm” minimization problems.

(18.59)#\[\begin{split} & \underset{\bv}{\text{minimize }} & & \| \bv \|_1 \\ & \text{subject to } & & \bDDD \bv = \bzero, v_k = 1.\end{split}\]
  1. We have a convex objective and convex (linear) constraints. These are tractable problems.

  2. Let us indicate an optimal solution of (18.59) as \(\bv_k^{1, *}\).

  3. Since \(\bDDD \bv_k^{1, *} = \bzero\), hence \(\bv_k^{1, *}\) is feasible for (18.58).

  4. Thus,

    \[ \| \bv_k^{0, *}\|_0 \leq \| \bv_k^{1, *}\|_0. \]
  5. This gives us the relationship

    \[ \spark(\bDDD) \leq \underset{1 \leq k \leq D}{\text{minimize }} \| \bv_k^{1, *}\|_0. \]

We formally state the upper bound on \(\spark(\bDDD)\) in the following theorem [28].

Theorem 18.70

Let \(\bDDD\) be a dictionary. Then

\[ \spark(\bDDD) \leq \underset{1 \leq k \leq D}{\text{minimize }} \| \bv_k^{1, *}\|_0 \]

where \(\bv_k^{1, *}\) is a solution of the problem (18.59).

18.9.2. Coherence#

In this subsection we develop some more bounds using coherence of a dictionary. As usual, we will be considering an overcomplete dictionary \(\bDDD \in \CC^{N \times D}\) consisting of \(D\) atoms. The coherence of \(\bDDD\) is denoted by \(\mu (\bDDD)\). In short we will simply write it as \(\mu\). A subdictionary will be indexed by an index set \(\Lambda\) consisting of linearly independent atoms.

Theorem 18.71

Suppose that \((K - 1) \mu < 1\) and assume that \(| \Lambda | \leq K\). Then

\[ \| \bDDD_{\Lambda}^{\dag} \|_{2 \to \infty} \leq \frac{1}{\sqrt{ 1 - ( K - 1) \mu}}. \]

Equivalently, the rows of \(\bDDD_{\Lambda}^{\dag}\) have \(\ell_2\) norms no greater than \(\frac{1}{\sqrt{ 1 - ( K - 1) \mu}}\).

Proof. We recall that the operator norm \(\| \bDDD_{\Lambda}^{\dag} \|_{2 \to \infty}\) computes the maximum \(\ell_2\) norm among the rows of \(\bDDD_{\Lambda}^{\dag}\). TODO COMPLETE ITS PROOF.

The following definition is due to [22, 28].

Definition 18.33

Let \(\bG = \bDDD^H \bDDD\) be the Gram matrix for dictionary \(\bDDD\). We define \(\mu_{1/2}(\bG)\) as the smallest number \(m\) such that the sum of magnitudes of a collection of \(m\) off-diagonal entries in a single row or column of the Gram matrix \(\bG\) is at least \(\frac{1}{2}\).

This quantity was introduced in [28] for developing more accurate bounds compared to bounds based on coherence. At that time the idea of Babel function was not available. A careful examination reveals that \(\mu_{1/2}(\bG)\) can be related to Babel function.

Theorem 18.72

\[ \mu_{1/2}(\bG) \geq \frac{1}{2\mu}. \]

Proof. Since \(\mu\) is the maximum absolute value of any off diagonal term in \(\bG = \bDDD^H \bDDD\), hence sum of any \(m\) terms, say \(T\), is bounded by

\[ T \leq m \mu. \]

Thus

\[ T \geq \frac{1}{2} \implies m \mu \geq \frac{1}{2} \implies m \geq \frac{1}{2\mu}. \]

Since \(\mu_{1/2}(\bG)\) is the minimum number of off diagonal terms whose sum exceeds \(1/2\), hence

\[ \mu_{1/2}(\bG) \geq \frac{1}{2\mu}. \]

The following result is due to [28].

Theorem 18.73

\[ \spark(\bDDD) \geq 2 \mu_{1/2}(G) +1. \]

Proof. We proceed as follows.

  1. Let \(\bh \in \NullSpace(\bDDD)\).

  2. Then

    \[ \bDDD \bh = \bzero \implies \bG \bh = \bDDD^H \bDDD \bh = \bzero. \]
  3. Subtracting both sides with \(\bh\) we get

    \[ \bG \bh - \bh = (\bG - \bI) \bh = -\bh. \]
  4. Let \(\Lambda = \supp(\bh)\).

  5. By taking columns indexed by \(\Lambda\) from \(\bG - \bI\) and corresponding entries in \(\bh\), we can write:

    \[ (\bG - \bI)_{\Lambda} \bh_{\lambda} = - \bh. \]
  6. Taking \(\ell_{\infty}\) norm on both sides we get

    \[ \| \bh \|_{\infty} = \| (\bG - \bI)_{\Lambda} \bh_{\lambda} \|_{\infty}. \]
  7. We know that

    \[ \| (\bG - \bI)_{\Lambda} \bh_{\lambda} \|_{\infty} \leq \| (\bG - \bI)_{\Lambda} \|_{\infty} \| \bh_{\lambda} \|_{\infty} \]
  8. It is easy to see that:

    \[ \| \bh_{\lambda} \|_{\infty} = \| \bh \|_{\infty}. \]
  9. Thus

    \[ \| \bh \|_{\infty} \leq \| (\bG - \bI)_{\Lambda} \|_{\infty} \| \bh \|_{\infty}. \]
  10. This gives us

    \[ \| (\bG - \bI)_{\Lambda} \|_{\infty} \geq 1. \]
  11. But \(\| (\bG - \bI)_{\Lambda} \|_{\infty}\) is nothing but the maximum sum of magnitudes of off diagonal entries in \(\bG\) along a row in \(\bG_{\Lambda}\).

  12. Consider any row in \((\bG - \bI)_{\Lambda}\).

  13. One of the entries in the row (on the main diagonal of \(\bG - \bI\)) is 0.

  14. Thus, there are a maximum of \(|\Lambda | - 1\) nonzero entries in the row.

  15. \(\Lambda\) is smallest when \(|\Lambda | = \spark(\bDDD)\).

  16. For such a \(\Lambda\), there exists a row in \(\bG\) such that the sum of \(\spark(\bDDD) - 1\) off diagonal entries in the row exceeds 1.

  17. Let \(n\) denote the minimum number of off diagonal elements on a row or a column of \(\bG\) such that the sum of their magnitudes exceeds one.

  18. Clearly

    \[ \spark(\bDDD) - 1 \geq n. \]
  19. It is easy to see that

    \[ n \geq 2 \mu_{1/2} (\bG) \]

    i.e. minimum number of off diagonal elements summing up to 1 or more is at least twice the minimum number of off diagonal elements summing up to \(\frac{1}{2}\) or more on any row (or column due to Hermitian property).

  20. Thus

    \[ \spark(\bDDD) - 1 \geq 2 \mu_{1/2} (G). \]
  21. Rewriting, we get

    \[ \spark(\bDDD) \geq 2 \mu_{1/2} (G) + 1. \]

18.9.3. Babel function#

In this subsection, we provide a more general development of Babel function for a pair of dictionaries.

  1. When we consider a single dictionary, we will use \(\bDDD\) as the dictionary.

  2. When considering a pair of dictionaries of equal size, we would typically label them as \(\Phi\) and \(\Psi\) with both \(\Phi, \Psi \in \CC^{N \times D}\).

  3. We will assume that the dictionaries are full rank as they span the signal space \(\CC^N\).

  4. Why a pair of dictionaries?

    1. We consider \(\Phi\) as a modeling dictionary from which the sparse signals

      \[ \bx \approx \Phi \ba \]

      are built.

    2. \(\Psi\) on the other hand is the sensing dictionary which will be used to compute correlations with the signal \(\bx\) and try to estimate the approximation \(\ba\).

    3. Ideally, \(\Phi\) and \(\Psi\) should be same.

    4. But in real life, we may not know \(\Phi\) correctly.

    5. Hence, \(\Psi\) would be a dictionary slightly different from \(\Phi\).

18.9.4. p-Babel Function#

See [43] for reference.

Definition 18.34 (\(p\)-Babel function over \(\Lambda\))

Consider an index set \(\Lambda \subset \{1, \dots, D \}\) indexing a subset of atoms in \(\Phi\) and \(\Psi\). The \(p\)-Babel function over \({\Lambda}\) is defined as

(18.60)#\[\mu_p(\Phi, \Psi, \Lambda) \triangleq \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \phi_j, \psi_l \rangle |^p \right )^{\frac{1}{p}}.\]

What is going on here?

  1. Consider the row vector

    \[ \bv^l = \psi_l^H \Phi_{\Lambda}. \]
  2. This vector contains inner products of modeling atoms in \(\Phi\) indexed by \(\Lambda\) with the sensing atom \(\psi_l\).

  3. Now

    \[ \| \bv^l \|_p = \left ( \sum_{i} | v^l_i |^p \right )^{\frac{1}{p}} = \left ( \sum_{j \in \Lambda} | \langle \phi_j, \psi_l \rangle |^p \right )^{\frac{1}{p}} \]
  4. This is the term in (18.60).

  5. Thus

    \[ \mu_p(\Phi, \Psi, \Lambda) = \underset{l \notin \Lambda}{\sup} \| v^l \|_p. \]
  6. \(\| v^l \|_p\) is a measure of the correlation of the sensing atom \(\psi_l\) with a group of modeling atoms in \(\Phi\) indexed by \(\Lambda\) using the \(p\)-norm.

  7. \(\mu_p(\Phi, \Psi, \Lambda)\) attempts to find out a sensing atom from \(\Psi\) outside the index set \(\Lambda\) which is most correlated to the group of modeling atoms in \(\Phi\) indexed by \(\Lambda\) and returns the maximum correlation value.

  8. Different choices of \(p\)-norm lead to different correlation values.

We can also measure a correlation of sensing and modeling atoms inside the index set \(\Lambda\).

Definition 18.35 (Complementary \(p\)-Babel function over \(\Lambda\))

A complement to the \(p\)-Babel function measures the amount of correlation between atoms inside the support \(\Lambda\):

(18.61)#\[\mu_p^{\text{in}}(\Phi, \Psi, \Lambda) \triangleq \underset{i \in \Lambda}{\sup} \, \mu_p \left (\Phi_{\Lambda}, \Psi_{\Lambda}, \Lambda \setminus \{i \} \right). \]

\(\mu_p(\Phi_{\Lambda}, \Psi_{\Lambda}, \Lambda \setminus \{i \})\) computes the correlation of \(i\)-th sensing atom in \(\Psi\) with the modeling atoms in \(\Phi\) indexed by \(\Lambda \setminus \{i \}\) i.e. all modeling atoms in \(\Lambda\) except the \(i\)-th modeling atom.

Finally \(\mu_p^{\text{in}}(\Phi, \Psi, \Lambda)\) finds the maximum correlation of any sensing atom inside \(\Lambda\) with modeling atoms inside \(\Lambda\) (leaving the corresponding modeling atom).

So far, we have focused our attention to a specific index set \(\Lambda\). We now consider all index sets with \(|\Lambda | \leq K\).

Definition 18.36 (\(p\) Babel function)

The Babel function for a pair of dictionaries \(\Phi\) and \(\Psi\) as a function of the sparsity level \(K\) is defined as

(18.62)#\[\mu_p(\Phi, \Psi, K) \triangleq \underset{|\Lambda| \leq K}{\sup}\, \mu_p(\Phi, \Psi, \Lambda).\]

Correspondingly, the complement of Babel function is defined as

(18.63)#\[\mu_p^{\text{in}}(\Phi, \Psi, K) \triangleq \underset{|\Lambda| \leq K}{\sup}\, \mu_p^{\text{in}}(\Phi, \Psi, \Lambda).\]

It is straightforward to see that

(18.64)#\[\mu_p^{\text{in}}(\Phi, \Psi, K) \leq \mu_p(\Phi, \Psi, K-1). \]

Now consider the special case where \(\bDDD=\Phi=\Psi\). In other words, the sensing and modeling dictionaries are same.

We obtain

(18.65)#\[\mu_p(\bDDD, \Lambda) = \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \bd_j, \bd_l \rangle |^p \right )^{\frac{1}{p}}.\]
(18.66)#\[\mu_p^{\text{in}}(\bDDD, \Lambda) = \underset{i \in \Lambda}{\sup} \, \mu_p \left (\bDDD_{\Lambda}, \Lambda \setminus \{i \} \right). \]
(18.67)#\[\mu_p(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_p(\bDDD, \Lambda).\]
(18.68)#\[\mu_p^{\text{in}}(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_p^{\text{in}}(\bDDD, \Lambda).\]

Further by choosing \(p=1\), we get

(18.69)#\[\mu_1(\bDDD, \Lambda) = \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \bd_j, \bd_l \rangle | \right ).\]
(18.70)#\[\mu_1^{\text{in}}(\bDDD, \Lambda) = \underset{i \in \Lambda}{\sup} \, \mu_1 \left ( \bDDD_{\Lambda}, \Lambda \setminus \{i \} \right). \]
(18.71)#\[\mu_1(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_1(\bDDD, \Lambda).\]
(18.72)#\[\mu_1^{\text{in}}(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_1^{\text{in}}(\bDDD, \Lambda).\]

Finally compare this definition of \(\mu_1(\bDDD, K)\) with the standard definition of Babel function as

(18.73)#\[\mu_1(K) = \underset{|\Lambda| = K}{\max} \; \underset {\psi}{\max} \sum_{\Lambda} | \langle \psi, \bd_{\lambda} \rangle |,\]

where the vector \(\psi\) ranges over the atoms indexed by \(\Omega \setminus \Lambda\).

We also know that \(\mu_1(K)\) is an increasing function of \(K\). Thus, replacing \(|\Lambda| = K\) with \(|\Lambda| \leq K\) doesn’t make any difference to the value of \(\mu_1(K)\).

Careful observation shows that the definitions of \(\mu_1(K)\) in (18.73) and \(\mu_1(\bDDD, K)\) in (18.71) are exactly the same.

18.9.5. Exact Recovery Coefficient#

we introduce a measure of similarity between a subdictionary and the remaining atoms from the dictionary known as the exact recovery coefficient.

Definition 18.37 (Exact recovery coefficient)

The Exact Recovery Coefficient [77, 78, 79] for a subdictionary \(\bDDD_{\Lambda}\) is defined as

\[ \ERC(\bDDD_{\Lambda}) = 1 - \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1. \]

We will also use the notation \(\ERC(\Lambda)\) when the dictionary is clear from context.

The quantity is called exact recovery coefficient since for a number of algorithms the criteria \(\ERC(\Lambda) > 0\) is a sufficient condition for exact recovery of sparse representations.

18.9.5.1. ERC and Babel Function#

We present a lower bound on \(\ERC(\Lambda)\) in terms of Babel function.

Theorem 18.74 (ERC lower bound: babel function)

Suppose that \(|\Lambda| = k \leq K\). A lower bound on Exact Recovery Coefficient is

\[ \ERC(\Lambda) \geq \frac{1 - \mu_1(K- 1) - \mu_1(K)}{1 - \mu_1(K-1)}. \]

It follows that \(\ERC(\Lambda) > 0\) whenever

\[ \mu_1(K- 1) + \mu_1 (K) < 1. \]

Proof. 1. Let us expand the pseudo-inverse \(\bDDD_{\Lambda}^{\dag}\).

\[\begin{split} \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 &= \underset{\omega \notin \Lambda}{\max} \left \|\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{-1} \bDDD_{\Lambda}^H \bd_{\omega} \right \|_1\\ &\leq \|\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{-1} \|_{1 \to 1} \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1. \end{split}\]
  1. For the Gram matrix \(\bG = \bDDD_{\Lambda}^H \bDDD_{\Lambda}\) we recall from Theorem 18.37 that:

    \[ \| \bG^{-1} \|_{1} \leq \frac{1}{1 - \mu_1(k - 1)} \leq \frac{1}{1 - \mu_1(K - 1)} . \]
  2. For the other term we have

    \[ \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1 = \underset{\omega \notin \Lambda}{\max} \sum_{\lambda \in \Lambda}| \langle \bd_{\omega}, \bd_{\lambda} \rangle | \leq \mu_1(k) \leq \mu_1(K). \]
  3. Thus, we get

    \[ \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \leq \frac{\mu_1(K)}{1 - \mu_1(K - 1)}. \]
  4. Putting back in the definition of Exact Recovery Coefficient:

    \[ \ERC(\Lambda) = 1 - \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \geq 1 - \frac{\mu_1(K)}{1 - \mu_1(K - 1)}. \]
  5. This completes the bound on ERC.

  6. Now, we verify the condition for \(\ERC(\Lambda) > 0\).

    \[\begin{split} \mu_1(K) + \mu_1(K - 1) < 1 &\iff \mu_1(K) < 1 - \mu_1(K - 1) \\ &\iff \frac{\mu_1(K)}{1 - \mu_1(K - 1)} < 1 \\ &\iff 1 - \frac{\mu_1(K)}{1 - \mu_1(K - 1)} > 0. \end{split}\]
  7. Thus, if \(\mu_1(K) + \mu_1(K - 1) < 1\), then the lower bound on \(\ERC(\Lambda)\) is positive leading to \(\ERC(\Lambda) > 0\).

18.9.5.2. ERC and Coherence#

On the same lines we develop a coherence bound for ERC.

Theorem 18.75 (ERC lower bound: coherence)

Suppose that \(|\Lambda| = k \leq K\). A lower bound on Exact Recovery Coefficient is

\[ \ERC(\Lambda) \geq \frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu}. \]

It follows that \(\ERC(\Lambda) > 0\) whenever

\[ K \mu \leq \frac{1}{2}. \]

Proof. .

  1. Following the proof of Theorem 18.74 for the Gram matrix \(\bG = \bDDD_{\Lambda}^H \bDDD_{\Lambda}\) have:

    \[ \| \bG^{-1} \|_{1} \leq \frac{1}{1 - \mu_1(K - 1)} \leq \frac{1}{1 - (K - 1)\mu}. \]
  2. For the other term we have

    \[ \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1 \leq \mu_1(K) \leq K \mu. \]
  3. Thus, we get

    \[ \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \leq \frac{K \mu}{1 - (K - 1)\mu}. \]
  4. Putting back in the definition of Exact Recovery Coefficient:

    \[ \ERC(\Lambda) \geq 1 - \frac{K \mu}{1 - (K - 1)\mu} = \frac{1 - (2K - 1)\mu}{1 - (K - 1)\mu}. \]
  5. This completes the bound on ERC.

  6. Now, we verify the condition for \(\ERC(\Lambda) > 0\).

    \[ K \mu \leq \frac{1}{2} \implies 2 K \mu \leq 1 \implies 1 - 2K \mu \geq 0 \implies 1 - 2K \mu + \mu > 0. \]
  7. And

    \[ K \mu \leq \frac{1}{2} \implies 1 - K \mu \geq \frac{1}{2} \implies 1 - K \mu + \mu \geq \frac{1}{2} + \mu. \]
  8. Thus \(K \mu \leq \frac{1}{2}\) ensures that both numerator and denominator for the coherence lower bound on \(\ERC(\Lambda)\) are positive leading to \(\ERC(\Lambda) > 0\).

A more accurate bound on \(K\) is presented in the next theorem.

Theorem 18.76

\(\ERC(\Lambda) > 0\) holds whenever

\[ K < \frac{1}{2} \left (1 + \frac{1}{\mu} \right ) \]

where \(K = | \Lambda| \).

Proof. .

  1. Assuming \(1 - (K - 1)\mu > 0\), we have

    \[\begin{split} &\frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu} > 0 \\ \iff & 1 - (2K - 1) \mu > 0\\ \iff & 1 > (2K - 1) \mu \\ \iff & 2K - 1 < \frac{1}{\mu}\\ \iff & K < \frac{1}{2} \left (1 + \frac{1}{\mu} \right ). \end{split}\]
  2. From Theorem 18.75, we have

    \[ \ERC(\Lambda) \geq \frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu}. \]
  3. Thus under the given conditions, we have

    \[ \ERC(\Lambda) > 0. \]
  4. We also need to show that under these conditions

    \[ 1 - (K - 1)\mu > 0. \]
  5. We can see that:

    \[\begin{split} & 2K - 1 < \frac{1}{\mu} \\ \implies & 2K - 2 < \frac{1}{\mu} - 1\\ \implies & 2 (K - 1) \mu < 1 - \mu \\ \implies & - (K - 1) \mu > \frac{\mu}{2} - \frac{1}{2}\\ \implies & 1 - (K - 1) \mu > \frac{1}{2} + \frac{\mu}{2}\\ \implies & 1 - (K - 1) \mu > 0. \end{split}\]

18.9.5.3. Geometrical Interpretation of ERC#

Definition 18.38 (Antipodal convex hull of a subdictionary)

The antipodal convex hull [78] of a subdictionary \(\bDDD_{\Lambda}\) is defined as the set of signals given by

\[ \AAA_1 (\Lambda) = \{\bDDD_{\Lambda} \bx \ST \bx \in \CC^{\Lambda} \text{ and } \| \bx \|_1 \leq 1\}. \]

It is the smallest convex set that contains every unit multiple of every atom.

  1. We recall that \(\bP_{\Lambda} = \bDDD_{\Lambda} \bDDD_{\Lambda}^{\dag}\) is the orthogonal projector on to the column space of \(\bDDD_{\Lambda}\).

  2. Therefore \(\bc_{\omega} = \bDDD_{\Lambda}^{\dag} \bd_{\omega} \in \CC^{\Lambda}\) is a coefficient vector which can be used to synthesize this projection.

  3. In other words:

    \[ \bP_{\Lambda} \bd_{\omega} = \bDDD_{\Lambda} \bDDD_{\Lambda}^{\dag} \bd_{\omega} = \bDDD_{\Lambda} \bc_{\omega}. \]
  4. Thus, the quantity \(1 - \| \bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1\) measures how far the projected atom \(\bP_{\Lambda} \bd_{\omega}\) lies from the boundary of \(\AAA_1 (\Lambda)\).

  5. If every projected atom lies well within the antipodal convex hull, then it is possible to recover superpositions of atoms from \(\Lambda\).

  6. This happens because coefficient associated with an atom outside \(\Lambda\) must be quite large to represent anything in the span of the subdictionary whenever \(\ERC(\Lambda) > 0\).

18.9.6. Dirac DCT Dictionary#

Theorem 18.77

The \(p\)-Babel function for Dirac-DCT dictionary is given by

\[ \mu_p(k) = k^{\frac{1}{p}} \mu \Forall 1\leq k \leq N. \]

In particular, the standard Babel function is given by

\[ \mu_1(k) = k\mu \]

Proof. TODO prove it.