Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets

Publié octobre 12, 2023

Par Jack Williams , Senior Researcher Ian Drosos , Researcher Kasra Ferdowsi , Research Intern

Partagez cette page

These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication.

Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets

Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities of natural language for programming. Among the tools used in this context, spreadsheets stand out as the preferred choice. The integration of LLMs into spreadsheets promises to substantially enhance their functionality and the user experience. At the same time, it’s well known that spreadsheet users commonly though inadvertently introduce errors (opens in new tab), and this can carry significant risks. For example, in 2010, a spreadsheet used in a Harvard economic analysis (opens in new tab) to inform austerity measures imposed on Greece was discovered to contain multiple errors (opens in new tab).

Microsoft is actively pursuing (opens in new tab) research focused on developing co-auditing tools and techniques, with an initial emphasis on spreadsheets. These tools are designed to help users verify the results generated by LLMs. At VL/HCC 2023 (opens in new tab), we introduce two new spreadsheet tools, ColDeco and FxD, specifically built to help users thoroughly examine and debug their programs within spreadsheets. Additionally, it is worth mentioning that the paper on FxD was awarded the Honorable Mention (opens in new tab).

ColDeco: An end-user inspection tool

Working with tables in spreadsheets is a common task, and the ability to add a calculated column can be incredibly useful. A calculated column not only adds information but also facilitates tasks like filtering and sorting. Generative AI can enable users to create sophisticated calculated columns in tables. However, verification of AI-generated code in this scenario is crucial because AI can misinterpret the user’s intent or overlook important data.

In our paper, “ColDeco: An End User Spreadsheet Inspection Tool for AI-Generated Code,” we introduce ColDeco, a no-code inspection tool for calculated columns. ColDeco uses helper columns and row grouping to help users understand how an AI-generated column works and locate any errors.

To describe how ColDeco works, we’ll use an example table containing people’s first, middle, and last names in separate columns. Our user asks the system to “create a column called ‘Abbreviation’ that takes the first letter of each part of the name.” In this example, there’s an error in the generated code that fails to handle rows with no middle names, causing some Abbreviation cells to be empty.

First, the model generates a program that computes an abbreviation for each row and adds it to the new Abbreviation column. ColDeco’s interface automatically opens as a side panel, as shown in Figure 1.

The Inspect Columns view displays any generated columns, accompanied by a natural language description of the generated code. The Inspect Rows view displays a subset of the table, organized by behavior. The Row Inspection view uses dataflow analysis to group rows, highlighting key distinct execution behaviors. In our example, this view quickly draws the user’s attention to the two rows that fail to calculate an abbreviation.

Two graphics. The first graphic depicts a table with columns: “First Name”, “Middle Name”, “Last Name”, “DoB”, and “Abbreviation”. There are 11 rows. As examples, row 3 contains the information: First Name: Christopher, Middle Name: Michael, Last Name: Fleming, DoB: 11/5/1995, Abbreviation: CMF. Row 9 contains the information: First Name: William, Middle name is empty, Last Name: Smith, DoB: 6/3/1968, Abbreviation is empty. The second graphic depicts a side panel with two sections. The first section is the Inspect Columns view (labelled 1a). A single column named “Abbreviation” and a corresponding description is shown. The second section is the Inspect Rows view (labelled 1b). It contains a table with columns “Index”, “First Name”, “Middle Name”, “Last Name”, and “Abbreviation”. Within the table there are two groups of rows. The first group has an example row: Index: 4716, First Name: William, Middle Name is empty, Last Name: Smith, Abbreviation is empty. The second group has an example row: Index: 8984, First Name: Christopher, Middle Name: Michael, Last Name: Flemming, Abbreviation: CMF. — Figure 1. The initial view of the ColDeco side panel. An Abbreviation program is generated by the AI and added to the table as a new column. The Inspect Columns view (1a) shows the column generated by the AI, including a description of how the code works. The Inspect Rows view (1b) groups rows into different behaviors, indicating that there are errors in two rows.

If our user wants to investigate an error, they can expand a generated column into multiple helper columns, illustrated in Figure 2. These helper columns are visible in both the table (2a) and the side panel (2b), and they show the intermediate values. The user can now see that the missing abbreviations are caused by an error that occurred when the system tried to take the first and middle initials.

Two graphics. The first graphic (labelled 2a) depicts a table with 4 columns: “DoB”, “text concatenation”, “1st letter of Last Name”, “Abbreviation”. As examples, row 3 contains the information: DoB: 11/5/1995, text concatenation: CM, 1st letter of Lan Name: F, Abbreviation: CMF. Row 9 contains the information DoB: 6/3/1968, text concatenation: is empty, 1st letter of Lan Name: S, Abbreviation: is empty. The second graphic (labelled 2b) depicts a side panel showing the Inspect Columns view. A tree view shows “Abbreviation” as the root with two children: “1st letter of Last Name” and “text concatenation”, corresponding to the columns in the table. Each column in the tree view has a corresponding description. — Figure 2. The ColDeco side panel after a user expands the Abbreviation column into two additional helper columns. Each additional column has a description.

FxD: A functional debugger

Not every spreadsheet task involves generating a new table column. Moreover, many users are already well acquainted with spreadsheet formulas. This brings us to our second tool, a spreadsheet formula debugger, introduced in the paper, “FxD: a functional debugger for dysfunctional spreadsheets.”

We employed a user-centered approach when designing FxD, extensively reviewing existing literature on functional programming debuggers. This informed the four key features we implemented into FxD:

Live debugging. FxD dynamically updates as a user edits a formula, allowing for quick formula modification and exploration (Figure 3, image 1).

Hybrid formula tracing. The debugger combines step-based evaluation (Figure 3, image 1) with tree-based derivations (Figure 3, image 3) to provide a step-by-step breakdown of the formula. Substeps are hidden behind expandable cards to prevent user overload.

Subformula coloring. Color coding highlights changes in a formula as FxD evaluates it. This facilitates the tracking of these updates when a user hovers over a step (Figure 3, images 2 and 4).

Information inspector. Context-aware tooltips improve the user experience. One example is table previews when a user hovers over ranges in functions like VLOOKUP. These tooltips offer insights into the range, surrounding context, and the lookup column used by the containing function (Figure 3, image 3).

Four graphics, each graphic describing a different feature of the debugger. The formula being debugged is ‘=IF(G3 < (B1 + B2) * (1 + B3), “low”, “high”)’. The first graphic (labelled 1) shows the formula and its evaluation trace. Each step in the trace shows the formula with some part evaluated. The last step is the value “low” which is the result of the formula. The second graphic (labelled 2) shows a step being highlighted. The step has a before formula and after formula, with multiple parts evaluated. Each part that is evaluated is highlighted with the same color in the “before” and “after” formula. The third graphic (labelled 3) shows a cell range being hovered on and a range information inspector being shown. The inspector shows a preview of the grid for the corresponding range. The fourth graphic (labelled 4) shows a step being highlighted and an evaluated subpart being hovered over. The user hovers over the value 15 in the “after” formula and the corresponding formula “B1 + B2” in the “before” formula is underlined. — Figure 3. The FxD debugger. Image 1 shows the edited formula and evaluation steps. The steps update as a user edits the formula. Image 2 shows subformula coloring, which highlights a subformula and its value upon hovering. Image 3 shows an information inspector that previews the range referenced in a formula. Image 4 shows the concurrent evaluation of multiple subformulas. When the user hovers over a value, the corresponding subformula is underlined.

Growing importance of AI code verification

As the complexity of AI-generated code rises, the need for tools to verify accuracy becomes increasingly critical. In response, we developed these two co-audit tools tailored to spreadsheets. Moving forward, a key consideration lies in managing the complexity of these tools. Our vision is that debugging tools will become infused with generative AI to assist users in both generating and verifying workflows.

Review our paper on co-auditing in general to learn more.