Primal-Dual Hybrid Gradient Algorithm (PDHG)

This page introduces the mathematics behind the Primal-Dual Hybrid Gradient Algorithm. For an applied point of view, please see the user’s guide to this method.

The general problem

The Primal-Dual Hybrid Gradient Algorithm (PDHG) algorithm, as studied in [CP2011a], is a first order method for non-smooth convex optimization problems with known saddle-point structure

\max_{y \in Y} \min_{x \in X} \big( \langle L x, y\rangle_Y + g(x) - f^*(y) \big) ,

where X and Y are Hilbert spaces with inner product \langle\cdot,\cdot\rangle and norm \|.\|_2 = \langle\cdot,\cdot\rangle^{1/2}, L is a continuous linear operator L: X \to Y, g: X \to [0,+\infty] and f: Y \to [0,+\infty] are proper, convex and lower semi-continuous functionals, and f^* is the convex (or Fenchel) conjugate of f, (see convex conjugate).

The saddle-point problem is a primal-dual formulation of the primal minimization problem

\min_{x \in X} \big( g(x) + f(L x) \big).

The corresponding dual maximization problem is

\max_{y \in Y} \big( g^*(-L^* x) - f^*(y) \big)

with L^* being the adjoint of the operator L.

The algorithm

PDHG basically consists in alternating a gradient-like ascent in the dual variable y and a gradient-like descent in the primal variable x. Additionally, an over-relaxation in the primal variable is performed.


Choose \tau > 0, \sigma > 0, \theta \in [0,1], x_0 \in X, y_0 \in Y, \bar x_0 = x_0


For n > 0 update x_n, y_n, and \bar x_n as follows:

y_{n+1}         &= \text{prox}_{\sigma f^*}(y_n + \sigma L \bar x_n),

x_{n+1}         &= \text{prox}_{\tau g}(x_n - \tau  L^* y_{n+1}),

\bar x_{n+1}    &= x_{n+1} + \theta (x_{n+1} - x_n),

Here, \text{prox} stands for proximal operator.

Step sizes

A simple choice of step size parameters is \tau = \sigma < \frac{1}{\|L\|}, since the requirement \sigma \tau \|L\|^2 < 1 guarantees convergence of the algorithm. Of course, this does not imply that this choice is anywhere near optimal, but it can serve as a good starting point.


If g or f^* is uniformly convex, convergence can be accelerated using variable step sizes as follows:

Replace \tau \to \tau_n, \sigma \to \sigma_n, and \theta \to \theta_n and choose \tau_0 \sigma_0 \|L\|^2 < 1 and \gamma > 0. After the update of the primal variable x_{n+1} and before the update of the relaxation variable \bar x_{n+1} use the following update scheme for relaxation and step size parameters:

\theta_n        &= \frac{1}{\sqrt{1 + 2 \gamma \tau_n}},

\tau_{n+1}      &= \theta_n \tau_n,

\sigma_{n+1}    &= \frac{\sigma_n}{\theta_n}.

Instead of choosing step size parameters, preconditioning techniques can be employed, see [CP2011b]. In this case the steps \tau and \sigma are replaced by symmetric and positive definite matrices T and \Sigma, respectively, and convergence holds for \| \Sigma^{1/2}\,L\, T^{1/2}\|^2 < 1.

For more on proximal operators and algorithms see [PB2014]. The implementation of PDHG in ODL is along the lines of [Sid+2012].