Legendre-Fenchel and Rockafellar duality

1. Definitions and properties
2. The algorithm of Chambolle and Pock
3. Primal-dual formulation of the discrete isotropic total variation
4. Primal-dual formulation of the discrete Huber total variation
5. References

We will sum up here some basis of convex analysis that are necessary to manipulate the total variation based optimization problems. Those results are particularly simple and powerful at the same time, most of them are taken from [1] and references therein. We refer the reader to [1--5] for much complete information.

1 Definitions and properties

Consider a real vector space $E$ and let $E^\star$ denotes its dual space that is the set of continuous linear mapping from $E$ to $\RR$. Let $\overline{\mathbb{R}}$ denotes the set $\RR \cup \{-\infty,+\infty\}$ and $\langle \cdot , \cdot \rangle$ the bilinear mapping from $E^\star \times E$ to $\RR$ defined by $\langle \varphi , u \rangle = \varphi(u)$ for any $\varphi \in E^\star$ and $u \in E$.

We denote by $\Gamma(E)$ the set of function $F:E \to \overline{\RR}$ which are pointwise supremum of a family of continuous affine functions over $E$. We denote by $\Gamma_0(E)$ the subset of $\Gamma(E)$ composed of functions $F \in \Gamma(E)$ other than the constants $+\infty$ and $-\infty$.

A function $F:E \to \overline{\RR}$ is an element of $\Gamma(E)$ if and only if $F$ is a convex lower semi continuous (l.s.c.) function from $E$ into $\overline{\RR}$ (and if $F$ takes the value $-\infty$ then $F$ is identically equal to $-\infty$).

In addition, it is interesting to note that a function is convex if and only if its epigraph is convex, and is l.s.c. if and only if its epigraph is closed (see Prop. 2.1 and 2.3 in [1]).

Let $F$ and $G$ be two functions from $E$ into $\overline{\RR}$, the following properties are equivalent to each other:

$G$ is the pointwise supremum function of the set of the continuous affine functions that are everywhere less than $F$;
$G$ is the largest minorant of $F$ in $\Gamma(E)$.

When (i) or (ii) is satisfied, the function $G$ is then called the $\Gamma$-regularization of function $F$.

Now remark that a continuous affine function over $E$ writes $u \mapsto \left< \varphi , u\right> + \alpha$ for $\varphi \in E^\star$ and $\alpha \in \RR$. We will abusively call $\varphi$ the slope and $\alpha$ the y-intercept of such a function. Given a function $F:E\to \overline{\RR}$, the continuous affine function over $E$ with slope $u^\star$ and y-intercept $\alpha$ is everywhere less than $F$ if and only if $\alpha \leq -F^\star(\varphi)$ where

\begin{equation} \label{eq:legendre_fenchel} F^\star(\varphi) = \sup_{u \in \dom{F}} \left< \varphi , u \right> - F(u) \;, \end{equation}

and notation $\dom{F}$ stands for the domain of $F$ that is the set of vectors $u \in E$ such as $F(u) < + \infty$.

If $F:E \to \overline{\RR}$, then formula \eqref{eq:legendre_fenchel} defines a function $F^\star$ from $E^\star$ into $\overline{\RR}$ called the Legendre-Fenchel transform of $F$ (notice that $F^\star$ is also known as the polar or conjugate function of $F$).

From its definition, one directly sees that the Legendre-Fenchel transform $F^\star$ of a function $F$ is an element of $\Gamma(E^\star)$ as it can bee seen as the supremum pointwise of the family of continuous affine functions $\left( \mathcal{A}_{u} \right)_{u \in \dom{F}}$ over the dual space $E^\star$, such as

$$ \forall u \in \dom{F}, \quad \mathcal{A}_{u} : \varphi \mapsto \left< \varphi , u\right> - F(u)\;. $$

Assuming from now that $E$ is a reflexive space (for instance an Hilbert space), writing the Legendre-Fenchel transform of the function $F^\star$ leads to $F^{\star\star}: u \mapsto \sup_{\varphi \in E^\star} \left< \varphi , u \right>-F^\star(\varphi)$, which is an element of $\Gamma(E)$.

The bi-Legendre-Fenchel transform $F^{\star\star}$ of a function $F$ is none other than the $\Gamma$-regularization of $F$. In particular $F \in \Gamma(E)$ if and only if $F^{\star\star} = F$.

Note also that when $E$ is an Hilbert space, the dual space $E^\star$ identifies to the primal space $E$, which means that for any element $\varphi \in E^\star$, the terms $\left< \varphi, u\right>$ in Equation \eqref{eq:legendre_fenchel} can be replaced by the inner product between $u$ and an element $v_{\varphi} \in E$ . In that case $F^\star$ can be seen as a function of the primal space $E$ (instead of $E^\star$). In practice Proposition 3 is of great use to derive a primal-dual reformulation of an optimization problem when the cost function decomposes as the sum of terms where at least one lies in $\Gamma(E)$.

2 The algorithm of Chambolle and Pock

We will here briefly recall the formulation of the celebrated first order primal dual algorithm of Chambolle and Pock [7], which allows to address various total variation based image processing tasks and comes with strong convergence theorems.

Consider $X$ and $Y$ two finite real vector spaces, an inner product $\langle \cdot, \cdot \rangle$ over $Y$ and the generic saddle-point problem

\begin{equation} \label{eq:cp.primal-dual} \arg\,\min_{x \in X}\max_{y \in Y} ~ G(x) + \langle Kx , y \rangle - F^\star(y) \;, \end{equation}

where $F \in \Gamma_0(Y)$, $G \in \Gamma_0(X)$ and $K : X \mapsto Y$ denotes a continuous linear operator. Recall that thanks to Proposition 3, for any $x\in X$ we have $F(Kx) = F^{\star\star}(Kx)$, therefore one can interpret Equation \eqref{eq:cp.primal-dual} as a primal-dual formulation of the primal problem

\begin{equation} \label{eq:cp.primal} \arg\,\min_{x \in X} ~ G(x) + F(Kx) \;, \end{equation}

under the assumption that the supremum operator can be replaced by a maximum operator when writing $F^{\star\star}(Kx)$ as the Legendre-Fenchel transform of $F^\star$ at $Kx$, more precisely when we have $F^{\star\star}(Kx) = \max_{y \in Y} \langle Kx,y\rangle - F^\star(y)$.

The Chambolle and Pock proximal splitting algorithm was proved to converge toward a solution of \eqref{eq:cp.primal-dual}, for a particular setting of its parameters.

Chambolle-Pock resolvant algorithm for problem \eqref{eq:cp.primal-dual}.

Initialization: Choose $\tau, \sigma > 0$, $\theta \in [0,1]$, $\left(x^0,y^0\right) \in X \times Y$ and set $\bar{x}^0 = x^0$ (notice that convergence of this algorithm toward a solution of the primal-dual problem \eqref{eq:cp.primal-dual} was proven in [7] for $\theta = 1$ when $\tau \sigma {|||K|||}^2 < 1$).

Iterations ($k \geq 0$): Update $x^k,y^k$ and $\bar{x}^k$ as follows

\begin{eqnarray} y^{k+1} &=& \arg\,\min\limits_{\substack{y \in Y }} ~ \tfrac{1}{2\sigma} \left\|y-(y^k+\sigma K \bar{x}^k)\right\|_2^2 + F^\star(y) \label{cp.dual} \\ x^{k+1} &=& \arg\,\min\limits_{\substack{x \in X}} ~ \tfrac{1}{2\tau} \left\|x-\left(x^k- \tau K^* y^{k+1} \right)\right\|_2^2 + G(x) \label{cp.primal} \\ \bar{x}^{k+1} &=& x^{k+1} + \theta \left(x^{k+1}-x^{k}\right) \nonumber \end{eqnarray}

$~$

Notice that some accelerated variants of this algorithm were also proposed by the same authors, which under regularity assumptions on $F^\star$ and $G$, permit to achieve better convergence rates (see Algorithms 2 and 3 in [7]).

3 Primal-dual formulation of the discrete isotropic total variation

Let $\Omega$ denotes a subset of $\ZZ^2$ and $u \in \RR^\Omega$ an image with domain $\Omega$. The discrete isotropic total variation of $u$ is defined as

\begin{equation} \label{eq:tv} \TV(u) = \|\nabla u\|_{1,2} := \sum_{x\in\Omega} \left|\nabla u(x)\right|_2\;, \end{equation}

where $\nabla: \RR^\Omega \mapsto \RR^\Omega \times \RR^\Omega$ denotes a two-dimensional finite differences operator and $| \cdot |_2$ denotes the $\ell^2$ Euclidian norm over $\RR^2$.

For any image $u \in \RR^\Omega$, we have $$ % \TV(u) = \max_{p \in \RR^\Omega \times \RR^\Omega} \langle \nabla u, p \rangle \quad \text{subject to} \quad \|p\|_{\infty,2} \leq 1\;, % $$

This result is well known and is proved in this page which is dedicated to the total variation and its usual variants (isotropic, anisotropic, Huber). Let us here only have a look to a the main steps of the proof in order to understand how duality is being used to get this result.

$~$

First we prove that the Legendre-Fenchel transform of the norm $\| \cdot \|_{1,2}$ (defined in Equation \eqref{eq:tv}) is given by \begin{equation*} \forall p \in \RRRR{\Omega}, \quad \|p\|^\star_{1,2} = \delta(p) := \left\{ \begin{array}{cl} 0 & \text{if } \|p\|_{\infty,2} \leq 1\\ +\infty & \text{otherwise.} \end{array} \right. \end{equation*}
Function $\delta$ is called the indicator function of the closed unit ball for the (dual) norm $\|\cdot\|_{\infty,2}$.
Now remark that the norm $\| \cdot \|_{1,2}$ is an element of $\Gamma(\RRRR{\Omega})$ since it is convex and l.s.c., therefore we have $\TV(u) = \|\nabla u\|_{1,2} = \|\nabla u\|_{1,2}^{\star\star}$, thus \begin{equation*} \TV(u) = \sup_{\RRRR{\Omega}} \langle \nabla u, p \rangle - \|p\|_{1,2}^\star = \sup_{\RRRR{\Omega}} \langle \nabla u, p \rangle - \delta(p) \end{equation*}
Last restrict the supremum to the domain of $\delta$ (that is the closed unit ball for the norm $\|\cdot\|_{\infty,2}$) since $\delta$ is $+\infty$ outside. As this set is a bounded set, the supremum operator can be replaced by a maximum operator (see for instance Proposition 1.2 in [1]).

4 Primal-dual formulation of the discrete Huber total variation

Let again $\Omega$ denotes a subset of $\ZZ^2$ and $u \in \RR^\Omega$ an image with domain $\Omega$. The discrete Huber total variation is a smooth approximation of the discrete isotropic total variation, it is obtained by replacing in \eqref{eq:tv} the $\ell^2$ norm $|\cdot|_2$ of the gradient by the Huber function $| \cdot |_\alpha$ which is defined by

\begin{equation*} % \forall g \in \RR^2, \quad \left| g \right|_\alpha = \left\{ \begin{array}{cl} \frac{\left| g \right|_2^2}{2 \alpha} & \text{if } \left| g \right|_2 \leq \alpha \\ \left| g \right|_2 - \tfrac{\alpha}{2} & \text{otherwise,} \end{array} \right. % \end{equation*}

for a given real number $\alpha > 0$. The Huber function is itself a smooth approximation of the $\ell^2$ norm of the $\RR^2$ space.

The discrete Huber total variation (with Huber smoothness parameter $\alpha$) is defined as

\begin{equation} \label{eq:htv} % \forall u \in \RR^\Omega, \quad \HTValpha(u) = \|\nabla u\|_{1,\alpha}^\text{Huber} := \sum_{(x,y) \in \Omega} \left| \nabla u (x,y) \right|_\alpha \;, % \end{equation}

this variant of the total variation is used to get rid of an undesirable effect caused by the total variation called the staircasing effect (see several examples: denoising, inverse problems, constrained minimization).

For any image $u \in \RR^\Omega$, for any real number $\alpha > 0$, we have $$\HTValpha(u) = \max_{p \in \RRRR{\Omega}} \langle \nabla u , p \rangle - \tfrac{\alpha}{2} \|p\|_2^2 \quad \text{subject to} \quad \|p\|_{\infty,2} \leq 1$$

Agin we refer to this page for a complete proof. Let us here only detail the main steps of the proof.

$~$

First prove that the Legendre-Fenchel transform of the Huber function $|\cdot|_\alpha$ is given by \begin{equation*} % \forall p(x) \in \RR^2, \quad \left| p(x) \right|_\alpha^\star = \left\{ \begin{array}{cl} \tfrac{\alpha}{2} \left| p(x) \right|_2^2 & \text{if } \left| p(x) \right|_2 \leq 1 \\ + \infty & \text{otherwise.} \end{array} \right. % \end{equation*}
This can be proved by remarking that $|\cdot|_\alpha$ is the Moreau-Yosida [5,6] regularization (or Moreau enveloppe) with parameter $\alpha$ of the $\ell^2$ norm.
Then we prove that the Legendre-Fenchel transform of $\|\cdot\|_{1,\alpha}^\text{Huber}$ is \begin{equation*} {\| \cdot \|_{1,\alpha}^\text{Huber}}^\star = \left( p \in \RRRR{\Omega} \to \tfrac{\alpha}{2} \|p\|_2^2 + \delta(p) \right) \end{equation*}
Since $\|\cdot\|_{1,\alpha}^\text{Huber}$ is a convex and l.s.c. function, it is an element of $\Gamma \left(\RRRR{\Omega} \right)$, thus we have $\HTValpha(u) = \|\nabla u \|_{1,\alpha}^\text{Huber} = {\|\nabla u\|_{1,\alpha}^\text{Huber}}^{\star\star}$ and therefore
$$\HTValpha(u) = \sup_{p \in \RRRR{\Omega}} \langle \nabla u , p \rangle - {\| p \|_{1,\alpha}^\text{Huber}}^\star = \sup_{p \in \RRRR{\Omega}} \langle \nabla u , p \rangle - \tfrac{\alpha}{2} \|p\|_2^2 - \delta(p) \;.$$
Last restrict the supremum to the domain of $\delta$ (that is the closed unit ball for the norm $\|\cdot\|_{\infty,2}$) since $\delta$ is $+\infty$ outside. As this set is a bounded set, the supremum operator can be replaced by a maximum operator (see for instance Proposition 1.2 in [1], Chapter 2).

5 References

Ekeland, I., and Témam, R.: ``Convex analysis and variational problems'', volume 28 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, english edition, 1999. Translated from the French.
Fenchel, W.: ``On Conjugate Convex Functions'', Canad. J. Math., vol. 1, pp. 73–77, 1949.
Rockafellar, R. T.: ``Extension of Fenchel's duality theorem for convex functions'', Duke Math. Journ., vol. 33, pp. 81–90, 1960.
Rockafellar, R. T.: ``Convex Analysis'', Princeton University Press, 1970.
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, vol. 93, pp. 273–299, 1965.
Yosida K.: Functional Analysis, Berlin, 1965. Translated under the title Funktsionalnyi analiz, Moscow, 1967.
Chambolle, A., Pock, T.: ``A first-order primal-dual algorithm for convex problems with applications to imaging'', Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011.