NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

EACL'2024 |

Writing formulas in spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas in spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2FORMULA, aimed at generating executable formulas grounded in a spreadsheet table, given a natural language (NL) query as input. To achieve this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions.

We implement the NL2FORMULA task by providing a sequence-to-sequence baseline model called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to other baseline models. Furthermore, we compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2FORMULA task and advocate for further investigation.