Consistent Nonparametric Estimation for Heavy-tailed Sparse Graphs
- Christian Borgs ,
- Jennifer Chayes ,
- Henry Cohn ,
- Shirshendu Ganguly
We study graphons as a non-parametric generalization of stochastic block models, and show how to obtain compactly represented estimators for sparse networks in this framework. Our algorithms and analysis go beyond previous work in several ways. First, we relax the usual boundedness assumption for the generating graphon and instead treat arbitrary integrable graphons, so that we can handle networks with long tails in their degree distributions. Second, again motivated by real-world applications, we relax the usual assumption that the graphon is defined on the unit interval, to allow latent position graphs where the latent positions live in a more general space, and we characterize identifiability for these graphons and their underlying position spaces.
We analyze three algorithms. The first is a least squares algorithm, which gives an approximation we prove to be consistent for all square-integrable graphons, with errors expressed in terms of the best possible stochastic block model approximation to the generating graphon. Next, we analyze a generalization based on the cut norm, which works for any integrable graphon (not necessarily square-integrable). Finally, we show that clustering based on degrees works whenever the underlying degree distribution is atomless. Unlike the previous two algorithms, this third one runs in polynomial time.