Hashing Modulo Alpha-Equivalence
- Krzysztof Maziarz ,
- Tom Ellis ,
- Alan Lawrence ,
- Andrew Fitzgibbon ,
- Simon Peyton Jones
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'21) |
Published by ACM | Organized by ACM
In many applications one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than O(n^2) in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our method is to use a weak (commutative) hash combiner at exactly one point in the construction, which admits an algorithm with O(n*(log n)^2) time complexity. We prove that the use of the commutative combiner nevertheless yields a strong hash with low collision probability.