Toward developing faster algorithms for minimizing submodular functions

Published November 7, 2023

By Haotian Jiang , Postdoctoral Researcher

Share this page

This research paper was presented at the 64^th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science.

FOCS 2023 paper: Toward developing faster algorithms for minimizing submodular functions

Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks of graphs to deciphering the complexities of economic landscapes through utility functions, and even navigating the enigmatic world of random variables via entropy functions, they offer valuable insights into challenging problems. Their wide-ranging applicability has made them pivotal tools for modeling and optimization in various theoretical computer science domains, including operations research and game theory. In recent years, submodular functions have gained prominence in solving optimization problems within machine learning (ML) applications. These tasks encompass vital areas such as feature selection and clustering, as illustrated in Figure 1. Additionally, submodular functions are instrumental in applications like sensor placement and graphical models. For further exploration, comprehensive resources are available in Bilmes’ insightful survey (opens in new tab) and Bach’s standard textbook (opens in new tab) on this subject.

Two graphics. The left graphic depicts the process of feature selection, beginning with all the features on the top, then the unselected features crossed in the middle, and finally the selected features remain at the bottom. The right graphic shows the process of clustering, where a set of points in 2D are assigned different colors so that points with the same color are physically close to each other to form a cluster. — Figure 1. Application of submodular function optimization to feature selection, on the left, and clustering on the right.

Algorithm design for submodular function minimization

In a joint paper with researchers from Stanford University, “Sparse Submodular Function Minimization(opens in new tab) (opens in new tab),” presented at FOCS 2023(opens in new tab) (opens in new tab), we investigate the problem of minimizing a submodular function in the standard model. Here, we assume that the submodular function can be accessed through an evaluation oracle that returns the value \( f(S) \) in response to a query with a set \( S \). This is the most classical and well-studied model for studying algorithm design for minimizing submodular functions.

Before we discuss our study, it’s important to bear in mind that a submodular function \( f \) is defined on subsets of a finite set of elements \( V \) that satisfy a diminishing marginal difference property. That is, for any two subsets \( S \subseteq T \) and any element \( e \in V \setminus T \), the marginal value of \( e \) when added to the smaller set \( f(S \cup {e}) – f(S) \) is at least the marginal value of \( e \) when added to the bigger set \( f(T \cup {e}) – f(T) \).

In the 1980s, foundational work (opens in new tab) revealed that submodular functions could be minimized in polynomial time, marking a significant breakthrough. Since then, researchers have made substantial progress in the quest for faster algorithms for submodular function minimization (SFM). Despite these efforts, fundamental questions persist, such as determining the minimum number of queries required to minimize any given submodular function—a concept referred to as the problem’s query complexity.

Currently, the most advanced algorithm needs to make \( \widetilde{O}(n^2) \) queries for any given submodular function, while the best lower bound is only \( \widetilde{\Omega}(n) \), where \(n\) is the size of the ground set on which the submodular function is defined. This disparity results in a substantial gap, leaving an \(n\)-fold difference between the existing upper and lower bounds.

Given this considerable difference, a natural question arises: What additional structural assumptions could potentially pave the way for faster algorithms in submodular function minimization (SFM)? One prevalent assumption is sparsity, which posits that the size of the set minimizing the submodular function is small. This holds particular relevance in diverse applications, including signal processing, feature selection, and compressed sensing. In these scenarios, solutions are expected to exhibit sparse non-zero entries, making it important to understand how algorithmic complexity depends on sparsity, as it provides insights into the intricate combinatorial and geometric structures of the problems.

Interestingly, existing algorithmic techniques developed over the past four decades for SFM do not yield improved runtimes even when the solution is sparse. Therefore, it is imperative to develop innovative techniques that can drive advancements in sparse SFM and bridge the existing gap between upper and lower bounds.

Parallel algorithms for submodular function minimization

Exploring beyond SFM’s query complexity, recent research has shed light on the importance of sparse SFM, particularly in understanding the inherent adaptivity of parallel algorithms (known as parallel complexity) designed to solve the problem. Research has shown that any parallel algorithm for SFM requires a minimum adaptivity that is a polynomial in the size of the ground set.

Our results improve both parallel and sequential algorithms for SFM. For example, consider a scenario where the minimizer of the given submodular function is \(\widetilde{O}(1)\)-sparse. In this context, our parallel algorithm runs in a nearly constant number of rounds, while our sequential algorithm makes a nearly linear number of queries. This achievement stands in stark contrast with the previous best parallel upper bound of \(\widetilde{O}(n)\) and the best query complexity upper bound of \(\widetilde{O}(n^2)\).

Fast first-order methods for exact submodular function minimization

Current fast algorithms for SFM rely on cutting-plane methods, a standard class of convex optimization techniques applied to the Lovász extension—a natural continuous extension of the given submodular function. However, restricting the optimization domain to sparse solutions doesn’t significantly expedite cutting-plane methods beyond a logarithmic factor. To address this, we shifted our approach and employed first-order methods, including stochastic mirror descent, to minimize the Lovász extension. These methods, non-Euclidean generalizations of stochastic gradient descent, are more attuned to problem geometry. Unlike cutting-plane methods, first-order methods exhibit a polynomial convergence rate, rather than a polylogarithmic dependency on the additive error concerning the optimal solution.

This rate of convergence indicates that first-order methods are better suited for approximate submodular function minimization, while our goal is to solve it exactly. Using the sparsity assumption, we developed a new algorithmic framework for SFM based on a new concept of duality. We used this framework to demonstrate how first-order methods, with substantially reduced accuracy requirements, can be applied to solve SFM exactly.

Toward faster algorithms for SFM and its applications

These techniques not only promise advancements for sparse SFM but also provide a foundation for tackling other fundamental problems in SFM theory. Our algorithms for sparse SFM serve as valuable starting points for designing improved algorithms for related problems. They offer potential insights into developing polynomial-time algorithms for SFM with lower query and parallel complexity, opening avenues for future research.

Traditionally, research on submodular function minimization has focused on the global properties of the problem over the past four decades. Sparse SFM, in contrast, enables us to explore local and more refined structures of submodular functions. Our work introduces new algorithmic tools that better use these structural properties, a vital aspect for applications in ML and operations research, because these areas often have special structures. Beyond advancing sparse SFM, our paradigm paves the way for the development of enhanced algorithms for SFM and its diverse applications.