Generalized Points-to Graphs: A Precise and Scalable Abstraction for Points-to Analysis

ACM Transactions on Programming Languages and Systems (TOPLAS) | , Vol 42(2): pp. 1-78

Related File

Computing precise (fully flow- and context-sensitive) and exhaustive (as against demand-driven) pointsto information is known to be expensive. Top-down approaches require repeated analysis of a procedure
for separate contexts. Bottom-up approaches need to model unknown pointees accessed indirectly through
pointers that may be defined in the callers and hence do not scale while preserving precision. Therefore,
most approaches to precise points-to analysis begin with a scalable but imprecise method and then seek to
increase its precision. We take the opposite approach in that we begin with a precise method and increase its
scalability. In a nutshell, we create naive but possibly non-scalable procedure summaries and then use novel
optimizations to compact them while retaining their soundness and precision.
For this purpose, we propose a novel abstraction called the generalized points-to graph (GPG), which views
points-to relations as memory updates and generalizes them using the counts of indirection levels leaving
the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottomup procedure summaries in terms of memory updates and control flow between them. Their compactness
is ensured by strength reduction (which reduces the indirection levels), control flow minimization (which
removes control flow edges while preserving soundness and precision), and call inlining (which enhances the
opportunities of these optimizations).
The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing
precision. This is the reason GPGs are very small even for main procedures that contain the effect of the
entire program. This allows our implementation to scale to 158 kLoC for C programs.
At a more general level, GPGs provide a convenient abstraction to represent and transform memory in the
presence of pointers. Future investigations can try to combine it with other abstractions for static analyses
that can benefit from points-to information.