Melford: Using Neural Networks to Find Spreadsheet Errors
- Rishabh Singh ,
- Benjamin Livshits ,
- Ben Zorn
MSR-TR-2017-5 |
Spreadsheets are widely used for financial and other types of important numerical computations. Spreadsheet errors have accounted for hundreds of millions of dollars of financial losses, but tools for finding errors in spreadsheets are still quite primitive. At the same time, deep learning techniques have led to great advances in complex tasks such as speech and image recognition. In this paper, we show that applying neural networks to spreadsheets allows us to find an important class of error with high precision. The specific errors we detect are cases where an author has placed a number where there should be a formula, such as in the row totaling the numbers in a column. We use a spatial abstraction of the cells around a particular cell to build a classifier that predicts whether a cell should contain a formula whenever it contains a number.
Our approach requires no labeled data and allows us to rapidly explore potential new classifiers to improve the effectiveness of the technique. Our classifier has a low false positive rate and finds more than 150 real errors in a collection of 70 benchmark workbooks. We also applied Melford to almost all of the financial spreadsheets in the EUSES corpus and within hours confirmed real errors that were previously unknown to us in 26 of the 696workbooks. We believe that applying neural networks to helping individuals reason about the structure and content of spreadsheets has great potential.