The Greedy Algorithm for Submodular Maximization

In Part 1, I defined submodular functions, showed that they capture the pattern of diminishing returns across sensor placement, document summarization, influence maximization, and many other domains, and formalized the constrained maximization problem: given a monotone submodular function $f$ and a budget $k$ , find $\max_{S \subseteq \mathcal{V}} f(S)$ subject to $|S| \leq k$ . I also showed that this problem is NP-hard: enumerating all $\binom{n}{k}$ candidate subsets is intractable for any realistic problem size.

The question becomes: how do we find a good solution efficiently, given that finding the exact optimum is out of reach? The answer lies in approximation algorithms.

Approximation Algorithms

An $\alpha$ -approximation algorithm for a maximization problem is an algorithm that runs in polynomial time and, for every instance of the problem, returns a solution whose value is at least $\alpha \cdot \text{OPT}$ , where $\text{OPT}$ is the value of the optimal solution. The factor $\alpha \in (0, 1]$ is called the approximation ratio (or performance guarantee).

For instance, a $\frac{1}{2}$ -approximation algorithm for a maximization problem always produces a solution worth at least half of the true optimum. Always. On every instance. This is a worst-case guarantee, not an average-case one.

Three reasons to care about approximation algorithms:

Polynomial-time solutions to NP-hard problems. We cannot solve these problems exactly in polynomial time (unless P $=$ NP), but we can get provably close.
A rigorous metric for comparing heuristics. Instead of “this heuristic works well on my benchmark,” you get “this algorithm is within 63.2% of optimal on every possible input.” The guarantee does not depend on the dataset.
Implicit bounds on the optimum. When an $\alpha$ -approximation algorithm returns a solution of value $V$ , it also tells you that $\text{OPT} \leq V / \alpha$ . This is useful when computing the exact optimum is too expensive; you get a bound for free.

For minimization problems, the convention flips: $\alpha \geq 1$ , and the algorithm’s output is at most $\alpha \cdot \text{OPT}$ . A $2$ -approximation for a minimization problem never exceeds twice the optimal cost.

The Greedy Algorithm

The greedy algorithm for monotone submodular maximization under a cardinality constraint is exactly the algorithm you would write as a first attempt. Start with an empty set. At each step, scan all remaining elements, pick the one with the largest marginal gain, and add it to the solution. Repeat $k$ times.

\boxed{ \begin{aligned} & \textbf{Greedy}(f, \mathcal{V}, k) \\ & \quad S_0 \leftarrow \varnothing \\ & \quad \textbf{for } j = 1 \textbf{ to } k\textbf{:} \\ & \quad \quad e^* \leftarrow \arg\max_{e \in \mathcal{V} \setminus S_{j-1}} f(e \mid S_{j-1}) \\ & \quad \quad S_j \leftarrow S_{j-1} \cup \{e^*\} \\ & \quad \textbf{return } S_k \end{aligned} }

That is the entire algorithm. No relaxation step, no rounding, no linear program. Just a loop with a max operation.

Runtime. At each of the $k$ iterations, the algorithm evaluates $f(e \mid S_{j-1})$ for every element $e \in \mathcal{V} \setminus S_{j-1}$ . Each marginal gain computation requires one call to the value oracle for $f$ . The total cost is $\mathcal{O}(n \cdot k)$ oracle queries.

A Worked Example: Sensor Coverage

Consider 6 candidate sensor locations on a grid, labeled $\{1, 2, 3, 4, 5, 6\}$ , and a budget of $k = 3$ . Each sensor covers a circular region, and the submodular function $f(S)$ measures the total area covered by the union of all sensors in $S$ .

Greedy Algorithm Debugger

Six candidate sensor locations on a grid. You have a budget of k = 3 sensors. Each sensor covers a circular region. Step through the greedy algorithm to see which sensors it picks and why — or toggle "Try it yourself" to make your own choices and compare.

Step 0 of 3Selected: {}Coverage: 0.0%

Try it yourself — click a candidate in the table to override greedy's pick

Solid circles = selected sensors. Dashed arcs = new area each remaining sensor would add (only arcs outside the current coverage are drawn).

State Inspector

Sensor	New coverage	Marginal gain
3Greedy	15.2%
4	15.2%
6	15.2%
1	13.7%
5	13.4%
2	11.8%

Step 1. The algorithm computes the marginal gain $f(\{e\}) - f(\varnothing) = f(\{e\})$ for each candidate $e \in \{1, \dots, 6\}$ , since the current set is empty. Suppose sensor 3 covers the largest area individually. We set $S_1 = \{3\}$ .

Step 2. The algorithm recomputes the marginal gain of each remaining element with respect to the current solution $S_1 = \{3\}$ . Sensor 5, which covers a region that barely overlaps with sensor 3, has the highest additional area. We set $S_2 = \{3, 5\}$ .

Step 3. Same logic. Sensor 1 adds the most new area given that sensors 3 and 5 are already placed. We set $S_3 = \{3, 5, 1\}$ .

The crucial observation: at each step, the marginal gain of the selected sensor decreases. The first sensor might cover 120 $\text{m}^2$ on its own; the second adds 95 $\text{m}^2$ of new area; the third adds only 60 $\text{m}^2$ . This is submodularity at work: the same sensor would have contributed more if placed earlier, when less ground was already covered.

For a ground set this small, you can enumerate all $\binom{6}{3} = 20$ possible subsets and verify that the greedy solution is close to (or exactly equal to) the true optimum. On realistic instances with thousands of candidate locations, enumeration is impossible, but the greedy solution is still guaranteed to be within a precise factor of the optimum.

The Approximation Guarantee

The following result, due to Nemhauser, Wolsey, and Fisher (1978)¹, is arguably the single most celebrated theorem in submodular optimization.

Theorem. Let $f : 2^{\mathcal{V}} \rightarrow \mathbb{R}_+$ be a monotone submodular function, and let $k$ be a positive integer. The Greedy algorithm returns a set $S_k$ satisfying:
$f(S_k) \geq \left(1 - \frac{1}{e}\right) \cdot f(S^*)$
where $S^* = \arg\max_{|S| \leq k} f(S)$ is the optimal solution and $e \approx 2.718$ is Euler’s number.

Since $1 - 1/e \approx 0.632$ , the greedy algorithm always returns a solution worth at least $63.2\%$ of the true optimum. The guarantee holds for every monotone submodular function and every cardinality constraint $k$ . No randomization, no special structure, no tuning.

Two additional facts make this result striking:

The bound is tight: there exist instances where the greedy algorithm achieves exactly a $(1 - 1/e)$ fraction of the optimum and no better.
The bound is optimal: Feige (1998)² proved that no polynomial-time algorithm can achieve an approximation ratio better than $(1 - 1/e)$ for this problem, unless P $=$ NP.

Greedy is not just “a decent heuristic.” It is the best any efficient algorithm can do.

Greedy vs Random

Pick a submodular objective below, then run the race. Greedy picks the element with the highest marginal gain at each step. Random picks blindly. Both select k elements from a ground set of n. How close does each strategy get to the optimum?

Hire a team that covers the most distinct skills. Each candidate knows 2–5 skills; overlapping skills don't count twice.

Ground set n

Budget k

Why Does It Work? A Proof Sketch

The proof has three key steps. I will present them in a way that emphasizes the geometric intuition rather than the full formalism.

Step 1: Each Greedy Step Closes a Fraction of the Gap

Let $S_j$ be the greedy solution after $j$ steps, and let $S^*$ be the optimal solution with $|S^*| \leq k$ . Define the gap at step $j$ as:

\delta_j \coloneqq f(S^*) - f(S_j)

The gap measures how far the current greedy solution is from the optimum. At step $j+1$ , the greedy algorithm picks the element $e^*$ with the largest marginal gain $f(e^* \mid S_j)$ .

Here is the key argument. The optimal solution $S^*$ contains at most $k$ elements. By the monotonicity of $f$ , we know that $f(S^* \cup S_j) \geq f(S^*)$ , so the total marginal gain of adding all elements of $S^* \setminus S_j$ to $S_j$ is at least $f(S^*) - f(S_j) = \delta_j$ . By submodularity, the marginal gains of individual elements of $S^*$ (with respect to $S_j$ ) sum to at least this gap³:

\sum_{e \in S^* \setminus S_j} f(e \mid S_j) \geq f(S^*) - f(S_j) = \delta_j

Since $|S^*| \leq k$ , at least one element in $S^*$ has marginal gain at least $\delta_j / k$ . The greedy algorithm picks the best element overall, so:

f(e^* \mid S_j) \geq \frac{\delta_j}{k}

In words: each greedy step closes at least a $1/k$ fraction of the remaining gap.

Step 2: The Gap Shrinks Geometrically

The inequality above means that after step $j+1$ :

\delta_{j+1} = f(S^*) - f(S_{j+1}) \leq \delta_j - \frac{\delta_j}{k} = \delta_j \cdot \left(1 - \frac{1}{k}\right)

Applying this recursively from the initial gap $\delta_0 = f(S^*) - f(\varnothing)$ . (Assume $f(\varnothing)=0$ , which is standard for these problems.)

\delta_k \leq f(S^*) \cdot \left(1 - \frac{1}{k}\right)^k

Step 3: Bounding the Geometric Compound

The expression $(1 - 1/k)^k$ is a well-known sequence from calculus. It is monotonically increasing and converges to $1/e \approx 0.368$ as $k \to \infty$ ⁴.

$k$	$(1 - 1/k)^k$
$1$	$0.000$
$2$	$0.250$
$5$	$0.328$
$10$	$0.349$
$50$	$0.364$
$100$	$0.366$
$\to \infty$	$1/e \approx 0.368$

Convergence of (1 − 1/k)^k

The sequence rises steeply for small k and levels off well before k = 10. The dashed line marks the limit 1/e ≈ 0.368. Hover a point to see its value.

Since $(1 - 1/k)^k \leq 1/e$ for all $k \geq 1$ , we get:

\delta_k \leq \frac{f(S^*)}{e}

Rearranging:

f(S_k) = f(S^*) - \delta_k \geq f(S^*) - \frac{f(S^*)}{e} = \left(1 - \frac{1}{e}\right) \cdot f(S^*)

That is the entire proof structure. Each step closes at least a $1/k$ fraction of the gap; the gap compounds geometrically; and the geometric compound $(1 - 1/k)^k$ is bounded above by $1/e$ . The result is clean because submodularity gives us the per-step bound, and elementary calculus handles the rest.

Greedy Approximation Staircase

Each step closes at least a 1/k fraction of the remaining gap. The teal bars show f(S_j) rising; the faded region above is the gap δ_j. Step through to watch the gap compound geometrically.

Budget k

5 / 5

f(S₅) = 0.6723δ₅ = 0.3277(1 − 1/e) = 0.6321

Lazy Greedy: Exploiting Diminishing Returns for Speed

The standard greedy algorithm recomputes the marginal gain $f(e \mid S_{j-1})$ for every remaining element $e$ at every step $j$ . This is wasteful: submodularity guarantees that marginal gains can only decrease over time. If an element had a low marginal gain at step $j-1$ , it will have an even lower gain at step $j$ . Why recompute it?

Minoux (1978)⁵ formalized this observation into Lazy Greedy. The idea is to maintain a max-heap (priority queue) of upper bounds on marginal gains:

\boxed{ \begin{aligned} & \textbf{LazyGreedy}(f, \mathcal{V}, k) \\ & \quad S_0 \leftarrow \varnothing \\ & \quad \text{Initialize max-heap } H \text{ with } \rho_e \leftarrow +\infty \text{ for all } e \in \mathcal{V} \\ & \quad \textbf{for } j = 1 \textbf{ to } k\textbf{:} \\ & \quad \quad \textbf{loop:} \\ & \quad \quad \quad e \leftarrow H.\text{pop}() \\ & \quad \quad \quad \rho_e \leftarrow f(e \mid S_{j-1}) \quad \text{// recompute actual gain} \\ & \quad \quad \quad \textbf{if } \rho_e \geq H.\text{peek}()\textbf{:} \\ & \quad \quad \quad \quad S_j \leftarrow S_{j-1} \cup \{e\} \\ & \quad \quad \quad \quad \textbf{break} \\ & \quad \quad \quad \textbf{else:} \\ & \quad \quad \quad \quad H.\text{push}(e, \rho_e) \\ & \quad \textbf{return } S_k \end{aligned} }

At each step, Lazy Greedy pops the element $e$ with the highest upper bound from the heap and recomputes its actual marginal gain. If the recomputed gain is still the largest (i.e., $\rho_e \geq$ the next-highest upper bound in the heap), then by submodularity $e$ must have the largest true marginal gain among all remaining elements. The algorithm selects $e$ and moves on. Otherwise, it pushes $e$ back into the heap with its updated bound and tries the next candidate.

The worst-case runtime is still $\mathcal{O}(n \cdot k)$ ; you might end up recomputing every element at every step. In practice, however, Lazy Greedy often needs only a handful of recomputations per step, because elements whose gains were already low at the previous step are even lower now and stay deep in the heap. The speedup varies depending on the problem instance, but factors of 5x to 100x over standard greedy are common. The approximation guarantee remains $(1 - 1/e)$ , unchanged.

If you have worked with lazy evaluation in functional programming or deferred computation in systems engineering, the pattern is familiar: avoid doing work until you know it matters.

When Greedy Fails

The $(1 - 1/e)$ guarantee depends critically on monotonicity. When $f$ is non-monotone (i.e., adding elements can decrease $f$ ), the greedy algorithm can perform arbitrarily badly.

The thesis chapter I drew this material from contains a concrete counterexample. Consider a ground set $\mathcal{V} = \{1, 2, \dots, n\}$ and define the non-monotone submodular function:

f(S) = \begin{cases} |S| & \text{if } n \notin S \\ 2 & \text{otherwise} \end{cases}

Let $2 \leq k < n$ . At the first step, the greedy algorithm adds element $n$ , because its marginal gain is $f(\{n\}) - f(\varnothing) = 2$ , which is the largest among all singletons (every other element has a gain of $1$ ). Once $n$ is in the solution, it stays there; the algorithm only adds elements, never removes them. The consequence: no matter what the algorithm picks in the remaining $k-1$ steps, the function value is stuck at $2$ , because the presence of $n$ forces $f(S) = 2$ for any $S$ containing $n$ .

The optimal solution is to take any $k$ elements other than $n$ , yielding $f(S^*) = k$ . The greedy solution achieves $f(S_k) = 2$ . The approximation ratio is $2/k$ , which goes to $0$ as $k$ grows. The algorithm gets trapped by a locally attractive choice that is globally catastrophic.

The lesson: for non-monotone submodular functions, standard greedy is the wrong tool. Randomized variants like the Random Greedy algorithm (Buchbinder et al. 2014)⁶ recover a constant approximation ratio of $1/e \approx 0.368$ by introducing randomness into the selection step, at the cost of a weaker guarantee.

The type of constraint also matters. With a cardinality constraint, the greedy algorithm achieves $(1 - 1/e)$ . With a single matroid constraint (a generalization of cardinality constraints), the same ratio holds, though the analysis is more involved⁷. With knapsack constraints, where each element has a cost and the total budget is limited, different algorithms are needed, and the landscape of results becomes considerably more complex.

Looking Ahead

The greedy algorithm requires $\mathcal{O}(n \cdot k)$ function evaluations. Lazy Greedy speeds things up in practice but offers the same worst-case bound. For moderate problem sizes, this is fine. For massive datasets ( $n$ in the millions, $k$ in the thousands), even $\mathcal{O}(n \cdot k)$ becomes prohibitive.

In Part 3, I will introduce the Stochastic Greedy algorithm (Mirzasoleiman et al. 2015)⁸, which replaces the full scan over $\mathcal{V}$ at each step with a random subsample of size $\mathcal{O}((n/k) \log(1/\varepsilon))$ . The total cost drops from $\mathcal{O}(n \cdot k)$ to $\mathcal{O}(n \cdot \log(1/\varepsilon))$ ; the factor of $k$ becomes a logarithmic term. The approximation guarantee degrades only slightly, to $(1 - 1/e - \varepsilon)$ in expectation, where $\varepsilon > 0$ is a tunable parameter.

Key Takeaways

The greedy algorithm for monotone submodular maximization under a cardinality constraint is simple: at each step, pick the element with the largest marginal gain. Repeat $k$ times.
It achieves an approximation ratio of $(1 - 1/e) \approx 0.632$ , provably the best any efficient algorithm can achieve for this problem.
The proof relies on a geometric argument: each greedy step closes at least a $1/k$ fraction of the remaining gap to the optimum, and after $k$ steps the residual gap is at most $f(S^*)/e$ .
Lazy Greedy exploits the diminishing returns property to skip redundant computations, often yielding 5x-100x speedups in practice with the same guarantee.
The greedy algorithm requires $\mathcal{O}(n \cdot k)$ function evaluations. For massive datasets, even this linear dependence on $k$ is too expensive, motivating the Stochastic Greedy algorithm in Part 3.
Monotonicity is critical: without it, greedy can achieve an approximation ratio as bad as $2/k$ , which is effectively useless for large $k$ .