Advancing Excel as a programming language with Andy Gordon and Simon Peyton Jones

已发布

Advancing Excel as a programming language with Andy Gordon and Simon Peyton Jones on the Microsoft Research Podcast

Episode 120 | May 5, 2021 

Today, people around the globe—from teachers to small-business owners to finance executives—use Microsoft Excel to make sense of the information that occupies their respective worlds, and whether they realize it or not, in doing so, they’re taking on the role of programmer. 

In this episode, Senior Principal Research Manager Andy Gordon, who leads the Calc Intelligence team at Microsoft Research, and Senior Principal Researcher Simon Peyton Jones provide an inside account of the journey Excel has taken as a programming language, including the expansion of data types that has unlocked greater functionality and the release of the LAMBDA function, which makes the Excel formula language Turing-complete. They’ll talk specifically about how research has influenced Excel and vice versa, programming as a human-computer interaction challenge, and a future in which Excel is the first language for budding programmers and a tool for incorporating probabilistic reasoning into our decision-making.  

Learn more: 

Subscribe to the Microsoft Research Podcast (opens in new tab):
iTunes (opens in new tab) | Email (opens in new tab) | Android (opens in new tab) | Spotify (opens in new tab) | RSS feed (opens in new tab) | Microsoft Research Newsletter (opens in new tab)  


Transcript 

TEASER (SIMON PEYTON JONES): I don’t think I ever dreamt that we could offer something as exotic as LAMBDA. You could really write literally any program in Excel now. Certainly, it becomes computationally much more powerful. 

[MUSIC BREAK] 

ANDY GORDON: Welcome to the Microsoft Research Podcast.  

[MUSIC BREAK] 

GORDON: My name is Andy Gordon. The new LAMBDA function has been announced in Excel, and I’m here to tell you about that with my colleague Simon Peyton Jones. Simon, would you like to introduce yourself? 

SIMON PEYTON JONES: Yeah, so, I’ve been here at Microsoft Research for 22 years, since 1998. I did my undergraduate degree at Cambridge. I worked for a few years in a small company, and then I worked at University College London and Glasgow University as a professor before moving to Microsoft. And my research interest has always been in functional programming, purely functional programming, as a radical and elegant attack on the entire enterprise of writing programs. But what about you, Andy? How did you get into functional programming and, indeed, Excel? 

GORDON: Well, I was doing a PhD on lazy functional programming back in the late ’80s. In fact, when you guys were starting Haskell, my PhD was on input-output for Haskell, and I was actually delighted you personally invited me to sit on the Haskell committee to help standardize input-output using monads. And I joined Microsoft actually in ’97, so I was one of the first employees at Microsoft Research Cambridge.  

PEYTON JONES: That’s right. You narrowly predated me. [LAUGHTER] 

GORDON: Not just that. I interviewed you. That was a tough decision. [LAUGHTER] 

PEYTON JONES: Well, you made a good call. [LAUGHTER] 

GORDON: I think mostly. Yeah, so, about my research. Yeah, I started out in functional programming, and I’ve done a bunch of other things in, like, security. And then about 10 years, I got into probabilistic programming for machine learning, which funny enough, led me to spreadsheets because we built a system to support writing probabilistic programs of data within spreadsheets. And at that point, I realized that to take things much further, it would be good to join forces with your work with the Excel team. And in fact, you were involved with the Excel team pretty early on in your time at Microsoft. Do you want to tell us about that? 

PEYTON JONES: Yeah, pretty early, because when I first joined Microsoft, I thought to myself, “What can I do that would advance the course of functional programming within Microsoft?” And then I thought, “Well, Excel is the world’s most widely used functional programming language.” It’s not a very powerful one, perhaps, but when you write a formula in a spreadsheet, you are writing in a purely functional language. So, no side effects. You don’t say, you know, “=print3+7”; it wouldn’t make any sense. And moreover, it’s extremely widely used, so there must be a hundred times as many users of formulae in Excel as there are professional programmers in the entire planet. And so, I was thinking maybe we should think about what would it take to start from Excel, but to grow it using the North Star of mainstream functional programming languages and try to see how much more powerful we could make it. Then I started talking to research colleagues, like Alan Blackwell here at Cambridge and Margaret Burnett in Oregon, and we started to come up with some ideas about defining functions of Excel. We had two principal things. The first was, Excel provides 600-odd built-in functions, but it doesn’t provide you with a way to make a new function out of existing ones. So, in every other programming language, you can define a procedure, perhaps, by writing some code, wrapping it up, giving it a name and some parameters, and then you can call it repeatedly. Not so in Excel. If you want to do that, you have to write your procedure in VBA or JavaScript or something, and that’s kind of, like, crazy. We’d like you to be able to define new functions using the existing formula language. Every other programming language lets you do that. Why not Excel? So, we came up with a design for doing that, and we turned it into a research paper. It was published in ICFP in about 2002 or ’03. It was called “User-defined functions in Excel.” So then, we started to realize that it wasn’t just enough to have user-defined functions. They needed to be able to take structured data. It wasn’t enough for them to take scalar data. And this is a part which you’ve been much more involved in since the need for rich data in Excel, which sort of complements the need for user-defined functions. So, maybe you could tell us a bit about that. 

GORDON: Yeah. I mean, you’re right, Simon. I mean, until pretty recently, the only kind of data that you could have in Excel was text or numbers. That’s all you could have sitting in cells, and so, if you’re trying to turn the formula language into a proper functional programming language, you really need a lot more structures like that. There’s only so many functions you can write that just take two scalars as input and produce another one as an output. There’s not many of those. So, for example, you need a raise. 

PEYTON JONES: You mean as arguments and results of functions? 

GORDON: Yes. You want to be able to process big pieces of data in one go, and it’s also important to be able to store them in cells. You know, within a sheet-defined function, you might want to have a computation and have it sort of spread out within the grid and then return a whole array, say, as the result of a sheet-defined function. So, generally, we need first-class arrays in the language. 

PEYTON JONES: Yeah, we should pause a bit just to explain about what is a sheet-defined function. The idea that we put in this paper for defining new functions in Excel was to say imagine you could take a worksheet and nominate certain cells as the input and one cell as the output and give it a name. And then when you call a function with that name, when you call that function, it is as if you had created a fresh copy of that worksheet, filled in the input values, the arguments of the function in those input cells, calc’d the worksheet, and taken the result out. That was our model for defining a new function. Now, if you want to define a function that works over arrays, for example, “Sort this array” or “Pick the smallest element of it,” then you can’t just take an array of a fixed size, like a three-element array or a 30-element array. We’d like to have an array of arbitrary size, which presumably would then land in a single cell in the worksheet that defines the function. 

GORDON: This is a great vision, and you wrote a paper about that. So, how do things develop, Simon? 

PEYTON JONES: Well, then, so having got the idea, I then would travel to Redmond, the Microsoft mothership, on a fairly regular basis, you know, two or three times a year. So I started to become very familiar with the Excel team, and I discovered that they were very warm towards the idea—we had lots of interesting technical conversations about how Excel might work—but Excel is a very big product, and it doesn’t move quickly. It’s like a sort of supertanker. So, for a long time, I felt a bit like a fly bashing against the bow of the supertanker, and there was always a good reason why now is not a good time to do it, and no criticism for that. There were other priorities, and there still are. But what actually happened in the end was that I got a bit discouraged, and I went away for about 10 years. I forget what provoked me to . Maybe it was our lab director, Andrew Blake, who said, “You should really have another go at this.” But it turned out it matched up with their priorities, so we then started a much more active partnership between Microsoft Research Cambridge and Excel on this whole idea of sheet-defined functions and data. But things came out in a very different order than I anticipated. I had started on the sheet-defined function idea, but in fact, the first thing that happened was actually to do with data types and rich structured data involved in functions. In particular, the first thing that came out as a change to the product was called dynamic arrays, and you were quite involved in that, weren’t you? Tell us a bit about it. 

GORDON: So, dynamic arrays are really interesting because if you go way back, you discover that the formula language in Excel could in fact construct arrays. You could refer to bits of grid, a range, and then pull that into a formula and compute with it, but what you couldn’t do was return the whole array in one go. 

PEYTON JONES: You could do things like add two arrays together or add 1 to every element in an array. You could treat arrays as first-class values within a formula. 

GORDON: Right, and so you could lift formulas over whole arrays, so, like, plus 1 and so on, or you could lift over the whole array and you could get the result, but you could only have a scalar in a cell, and so, it was very awkward to return the whole array. There was a notion of a thing called array formulas, and if you were on Windows, the way you entered them was to do control-shift-enter, I think after you’d [LAUGHTER]—yeah, you’d type the formula in, and then you had to remember to go control-shift-enter. It was very complicated. You would select the range, you know, multiple cells that were to hold the array, and you needed to know in advance how big it was, and then you would type in the formula and end it with control-shift-enter, and then that one formula would define the whole array. So, it was really cumbersome. 

PEYTON JONES: [LAUGHTER] It’s sort of hilarious, really, when you think about it. Curly braces would appear around the formula.  

GORDON: So, it was this advanced feature, and there were a few people who knew how to use it. In fact, there was a person who actually wrote a book called Ctrl+Shift+Enter, which was this sort of guidebook to how you could sort of use this feature, but it was very difficult to use. But now, I think since about 2018, we’ve had dynamic arrays. And this is an amazing feature. You don’t have this separate world of array formulas. Ordinary formulas can return arrays, and you don’t have to decide in advance how big the array is going to be. Instead, you just write the formula, and if it computes an array, then we see it spills out from the cell. Maybe you write a formula that, you know, is a 3 × 3 array, and it will just spill out from—you write the formula in the top left corner, and then the rest of the array will fill out. And then if you want to have another calculation somewhere else—if the original array was in cell A1, then if you say “A1 sharp” then that expression, that formula, A1 sharp, refers to the whole array, and then you can compute to it. You can add that whole array to another one or take the sum of it or take the average of it. 

PEYTON JONES: So there you don’t need to know how big the array in A1 sharp is, and indeed, in different calc cycles, it might be different sizes. 

GORDON: Right. 

PEYTON JONES: That’s really important; the chain of formulae will work regardless of the sizes of the arrays involved, which may vary with the data. 

GORDON: When dynamic arrays came out, we didn’t have LAMBDAs, so I think for everyone in the team, we knew that LAMBDAs were the thing and combinators together with LAMBDAs like map and reduce and fold and scan—these kinds of combinators that are familiar from functional languages like Haskell and others—we knew that those would be really important for programming with arrays, but when they first came out, we didn’t have those. So, we’ve been struggling a little bit to really get the full power of dynamic arrays. There’s various tricks you can use to construct dynamic arrays, but certain kind of common patterns are still quite hard to use, which changes with LAMBDAs coming in. 

PEYTON JONES: Dynamic arrays still produced a lot of excitement at the time because they really unlocked a huge amount of functionality. It was only previously available through this very arcane control-shift-enter mechanism, and then only if you knew the size of the array. So, they’re dynamic. They really work over a variable size. It wasn’t just arrays, either. So, arrays are a very natural data type for Excel, but we also talked to our colleagues in the product team a lot about records, and that came out at a similar kind of time, as well, didn’t it? 

GORDON: Yes. So, these are known as entities in Excel, and you can get them from the data tab in the ribbon, and the funny thing is that they are records, but there isn’t actually a function to create a record from scratch. Instead, the way it’s been introduced is that Excel has got this intelligence feature that can recognize certain names of entities that occur in the workbook. So, if you refer to the name of a company or if you refer to a name—say, Cambridge—Excel can recognize Cambridge as a geographic entity, like the city of Cambridge in the UK, where Simon and I are at the moment, and what Excel will do is go out to Bing’s database, fetch facts about the city of Cambridge, wrap them up in a record, and then drop that into the cell, and if it’s the name of a company, then it would also grab facts like the current stock price or the name of the CEO and drop that into the grid. 

PEYTON JONES: You have to do something to make it do that, don’t you? It doesn’t just spot—anytime you say “Cambridge,” it doesn’t just go—yeah, there’s some gesture. 

GORDON: You go into the data ribbon, and actually, one of the really cool things about those entities is they’re known as linked entities. It’s not just a record, but they also have this link to where they came from, sort of a pointer to the web, if you like. So, it means that if you open up a stock data type in a workbook, Excel will actually automatically refresh properties like the price, and then it would recalc anything that depends on that. So, these have become really popular as ways of building financial models or getting properties of geographies. 

PEYTON JONES: But from a programming point of view, the interesting thing about this was the value in the cell was no longer a scalar. It really is a record, and you could programmatically say “A1.price” to extract the field. That’s the first time that kind of rich, structured data of nested records was available. 

GORDON: Yeah, I think that’s the first time Excel had “dot” actually as an operator. So, we now have, like, “dot,” like in many other programming languages, with IntelliSense, so it kind of gives you an option for different fields you might want to pull out of the record.  

PEYTON JONES: It was a funny dynamic that, for a programming languages person, you’d say, “Well, we need an introduction form, which lets you create records, and then an elimination form that lets you take them apart”—that‘s the dot. But the way it came out in the product is we have the elim form but without the intro form. I think we all get an intro form, but it was just an interesting way in which working with a product group makes things come out in a different way, probably a way that’s more useful for users in the first instance than we would have done as academics. I thought that dynamic was interesting. 

GORDON: Yeah, actually, I mean, stepping back, Simon, I think this is one of the really big things that I’ve learned from working with the Excel team—I mean, we’ve been working really intensely on this the last five years—which is the kind of people who use Excel. I mean, it’s a really broad population—it’s a super popular application—but the kind of people who use it are generally not programmers, people who really care about writing code for its own sake. Like most people who are sort of functional programmers, say, Haskell programmers, are really passionate about programming, whereas people who are using Excel, not always, but generally, they are really keen to get some other job done, like they’re an accountant, they’re someone working in finance, maybe a teacher tracking statistics about how their students are doing, and they’re what are known as end-user programmers. They are writing code—they’re writing programs—but really for their own purposes. So, when the Excel team thinks about how best to serve end-user programmers, they often do things a little differently than if you were trying to produce a feature for a very sophisticated programmer. Entities are a good example of that, where the Excel team realized that end users who maybe cared about stocks would just want a very convenient way to go from the name of a company to an entity that would automatically update the stock price rather than having to have maybe a function called “record” that they would call and have to put in arguments to determine which entity to fetch, and I think that’s been quite successful, that end users have really been able to use these features, and they don’t really know what a record is, but they can understand that particular entities matter for what they’re trying to do. 

PEYTON JONES: We also laid a lot of answers on compositionality. I think for the first time, we had not just data that wasn’t a scalar, but data that was arbitrarily nested. So, a record for a country might contain a field that is the states of that country, and that’s an array, an array of states. So, it’s a record that contains an array, and that array contains entities themselves that are descriptors, records about the state. So, you get the sort of compositional, deeply nested form of structures that we’re familiar with in programming languages, but which is really not in Excel at all until this point. 

GORDON: Yeah, it’s really rich. For example, Microsoft has done this amazing deal with Wolfram, the company that does the WolframAlpha knowledge base, and so Microsoft is giving access to a huge range of data types from Wolfram, like chemical compounds and stars and different food items with their calories and so forth, so there’s a great repertoire of data and a lot of it is nested in the way that you’re talking about. 

PEYTON JONES: We started to go further than that, even. After we got arrays and records—those are like two data types—you might think, “Well, what if we want more?” So, we started to think, “What would it mean to make Excel extensible with respect to its data types?” Maybe you could walk up to it as a third-party developer—now not as an end user, but as a real programmer—and plug in something that adds another data type to Excel, let’s say images, right? So, you might have an abstract data type of images with operations you provide that can overlay them or blur them or rotate them, and then you have a little functional programming language whose data types are images, and Excel serves as a sort of orchestration language that lets you write formally they’re connected, but the new data type is provided as a third-party plug-in. That’s not available yet. It’s part of our—you know, your and my vision—and I think the team in Redmond takes it pretty seriously, and we’ve designed the implementation to accommodate that, but we’re starting with some fixed data types. But the idea that ultimately, we could turn Excel into an extensible platform for arbitrary data types that do really quite big, remarkable things in a financial contract, say, is quite an exciting one, I think. 

GORDON: Yeah, that’s gonna be really cool because we’ve seen that customers often have used things like previous extensibility features like VBA to in a way sort of hack up some of these features for representing things like financial contracts or financial instruments and sort of simulate things like arrays and records using strings and various other tricks like that. But once this feature comes out, it’ll be possible for third-party developers to sort of really enrich the product with new data types. It’ll be extremely cool. 

[MUSIC BREAK] 

PEYTON JONES: So, we talked quite a bit about the way in which our partnership has influenced the product, but the reverse has also happened. In fact, we’ve used the credibility with the lab of this very effective product group partnership to justify growing our research program into a group that we call the Calc Intelligence group, which you run now. Maybe you could just tell us a few examples of the broader research-based things we’ve been doing on the back of all this. 

GORDON: Yeah, we’ve done a range of different things, and we’ve got papers out on different topics. So, one example is what we call elastic SDFs. So, Simon, you’d explained the idea of sheet-defined functions, that we can describe a calculation by a piece of grid with some of the cells being inputs and some of them being outputs, and I think one thing in your original paper that you didn’t cover was what would happen if one of the inputs was actually a range, was like an array of items. And this is one of the cool things that we’ve worked on together, is like what if the example that you gave was maybe a row of like three items, but you want to generalize to a sheet-defined function that wouldn’t work just on a row of three items but maybe would work on a row of five items or any sort of array input size. And very often, we can do a sort of analysis of the formulas, and if they are processing those three inputs in a uniform way, typically it would be exactly the same formula as applied to each of them, a sort of parallel computation on the three inputs, and maybe with aggregates at the end where maybe you do some calculations and then you sum the results. Then that calculation can be sort of stretched—it can be elasticized, is what we say—to cover maybe an input of size five. So, we wrote this paper that I think is really cool that figured out a theory for when it is possible to generalize an example calculation of the grid that has fixed sizes to arbitrary sizes, and then even better, we worked with HCI experts who are in the Calc Intel team to do a user study to ask, “Well, how do people actually find this in practice?” And we did a study and found that they loved it, that it was a very natural way to program sheet-defined functions. So, that’s a paper I’m really proud of. 

PEYTON JONES: It was a funny paper, though, wasn’t it, because it straddles everything from user studies right the way through to a proof of most principal generalization, but we found it quite difficult to publish. 

GORDON: We did. We shouldn’t be bitter. Let’s not be bitter, Simon.  

[LAUGHTER] 

PEYTON JONES: We’re not bitter at all. 

GORDON: We’re not bitter. 

PEYTON JONES: And in our field, there are kind of three major programming language conferences— ICFP, POPL, and PLDI—and on this one, we got the clean sweep. We were rejected by ICFP, rejected by POPL, and rejected by PLDI. 

GORDON: Bravo.  

[LAUGHTER] 

PEYTON JONES: That’s right. It was a great paper. It is now published in the Journal of Functional Programming. 

GORDON: We’re laughing now. 

PEYTON JONES: It’s a funny story, but there’s a thing behind it, which is that it is interdisciplinary work, right? It covers the stretch from programming languages theory through to HCI. That means it doesn’t actually fit in any of the existing buckets very neatly. So, that can be a challenge when you’re doing interdisciplinary research. 

GORDON: I think it is a shame that these programming language conferences have a hard time, you know, accepting papers that use HCI methods. I’d love it if we could start to have maybe tracks of HCI research at these conferences or co-locate with HCI conferences. 

PEYTON JONES: Yeah, but after all, what’s the purpose of a programming language? It is the user interface for the computer, isn’t it? The measure of whether a programming language is any good is whether it helps people get programs to work correctly, more quickly, so it’s an HCI problem, programming. 

GORDON: It totally is, and having worked a bit with HCI people, I’m really humbled, actually. I mean, there are some great HCI folks in our team, and I learned such a lot from talking to them, and the thing is they really start from the users, talking to people, and asking them, “What are the sort of problems you’re having that might need some new features?” And then they try and build features and then compare them. In the SDF paper that I’ve just been talking about, we had two different ways of processing arrays, and then the study was actually a comparison of the two. And I’ve really come to respect HCI people for the kind of methods they do to actually get rigorous results when working with people. I think it would be really great if we could have more co-location of HCI research conferences with more kind of technical PL conferences, because I think people would learn from each other in ways that just doesn’t happen at the moment, when they’re separated out at different conferences. 

PEYTON JONES: So, Calc View is another example of such a project, isn’t it, about sort of HCI aspects of programming? You want to just tell us a bit about Calc View? What is it? 

GORDON: Yeah. We actually published this at an HCI conference called VL/HCC. So, it’s just a radically different view of the grid. We have a spreadsheet grid, but then we have this second representation of it, a bit like a programming language. So, we use a textual notation, and we would literally have the text, like, maybe “A1=2”, then maybe “A2=A1+1”. And you would actually see formulas like that, equations like that, sort of visibly as if it was like a program. 

PEYTON JONES: A textual program. 

GORDON: Like a textual program. 

PEYTON JONES: You can edit with Emacs kind of thing. 

GORDON: Exactly. And so, the idea of Calc View is that you have the grid, and then to the right of it, you have this textual view, and certain things are really easy to do in the textual view. Like, for example, we have this nice notation for a formula that has been copied many times. You know, so if the same formula is copied from A1 to A100, you could literally say, “A2:A100=A1+1”. And so, that’s a formula that would basically increment a counter. But textually, it’s a very short program, a bit like a loop, and if you want to sort of change it, you can just change the one copy of that formula, and then every copy in the grid gets updated. And so, we did a user study, and again, we found that people were more effective at certain tasks using Calc View. 

PEYTON JONES: And what I thought was interesting about it was that in programming, we’re so used to the textual view of a program and we execute it later, and in Excel, we’re so used to just seeing the data and having it execute sort of online all the time, you barely see the formulae, and with Calc View, we’re trying to do it both at the same time and show you the data view continuously calc’d on one side of the screen and the textual view on the other side of the screen, and you can edit in either and it affects the other. I thought that was a really interesting project. 

GORDON: So, Simon, when you started working with the Excel team, it wasn’t really clear, was it, what the formula language was? There wasn’t really a sort of programming language style description of the Excel formulas. 

PEYTON JONES: Yeah, that’s right. When I first started, I sort of looked at the formula language and thought, “Hey, how hard can this be? You know, it’s just some formula. It’s pretty simple syntax,” and then the more I dug into what Excel actually does, it’s really hard to find good descriptions about it. You’ll find that, for example, references are first-class values. If you say “F(A1:A3)”, you’re not just passing a little 3 vector of the values in A1 to A3; you’re passing a reference to the range A1 to A3 as a first-class value to a function that can do sort of introspective things like, say, what is the row number of the last element. I didn’t really realize that at all. So it turns out there are quite a lot of wrinkles like this. So, I started to write down a paper called “The Semantics of Excel.” It was meant to be a document that says, “Never mind the implementation. This is the semantics of a formula,” and it was split into two in the end, the semantics of a single formula and then the semantics of calc, which is how is a whole spreadsheet full of formulas calc’d and how do we make sure that it’s always kept up to date and calc’d in the right order with the right cells being made dirty? That led to all sorts of interesting conversations, sometimes with the Excel engineers who implemented particular features, because really, only they knew the honest-to-goodness ground truth about this. Again, it was an interesting dialogue between somebody who’s coming at it completely from a semantics and programming languages point of view, and the engineers who’d actually built it. But there was no independent standalone description in that kind of way. And that semantics then led to, “Oh, let’s implement it. Let’s implement a reference semantics for Excel,” I think then as an F# program. So, that was meant to be, as it were, an Excel formula evaluator look-alike, a standalone, completely separate from the Excel code base, that implemented the reference model, because after all that, it could be a useful standalone entity. We thought that in an abstract way, and then you had the idea it wasn’t just abstractly useful—we might be able to concretely use it. That turned into Calc.ts, didn’t it? 

GORDON: Yes. So, we did a hackathon about three years ago, mid-2017. So, we had this F# implementation that was actually getting pretty complete, at least the formula language. We didn’t have all the worksheet functions, but we could evaluate a lot of formulas, and we were starting to wonder, “Maybe we could take Excel formulas into other products.” And so, Excel and Word have got an add-in model based on JavaScript. So, basically, we did this hackathon where we added Excel formulas to Word. It was sort of this grand hack. We had this code in F#, and we wanted to turn it into JavaScript, so we used—oh, what was it called? Is it WebSharper? Yes, WebSharper is this sort of transpiler that compiles F# code to JavaScript. And so, we took this hunk of code and got it running in the hackathon, and then we made this silly video, and you were actually on the judging panel, but we won the MSR Cambridge hackathon that summer. It was great fun. Basically, we made a silly video that made people laugh, so that was great, but I mean, I think people liked the idea that we could take the formulas on a journey outside Excel, and then a really interesting thing happened, that this vice president of engineering, a gentleman called Aleš Holeček, was visiting Cambridge, was looking at our work, and you know, he has a view about what’s happening right across the different parts of Excel, and there is a web version of Excel, the web browser. 

PEYTON JONES: This is Excel Online, right? 

GORDON: Excel Online, yeah. 

PEYTON JONES: Every time you go to a web browser and wake up Excel, that’s what you get. 

GORDON: Yeah, and the way it works—at least the way it worked until then—was that it actually opened the Excel workbook in Azure inside a virtual machine, and then the client, the browser, was sort of like a dumb client, if you like; it didn’t do any calculation there. It did the UI where you could enter formulas and you could enter data, but everything was calculated in Azure. 

PEYTON JONES: If you entered a number, it would have to send a message back to a machine halfway across the planet to add 3 and 4 and send the result back to the web browser. 

GORDON: Exactly. So, it felt a bit sluggish. And there was a big push to make the product more usable, and so they decided they wanted to evaluate formulas in the browser. But unfortunately, the C++ code base that goes back to, I guess, 1985, when Excel started, it just wasn’t practical to run that code base in the browser, and so Aleš said to us, “You guys can help out. You guys need to go out to Israel because the Web Excel team is based in Herzliya in Israel, and they need a calc engine.” So, we had this amazing work trip out to Israel, met the guys and started collaborating, and we realized pretty early on that we couldn’t ship the F# code, so we did a big rewrite, and we rewrote it in TypeScript and then started a big push to write lots of the worksheet functions in TypeScript. And web Excel, because it’s a website, could move really fast, so the amazing thing was that, like, six months after the hackathon, the initial version of Calc.ts went live, and it was starting to speed up people’s calculations because it was doing the calculations in the browser. We had to do a bit of research to really make this viable, because not all the data in the cloud workbook ends up in the browser, so we had to have a calc engine that worked in the presence of partial data, and then we hit some precision problems because the original C++ code doesn’t use IEEE arithmetic, believe it or not, because it was originally written—  

PEYTON JONES: Predated IEEE arithmetic. This is like code that has been around since forever. [LAUGHTER] 

GORDON: That’s right. But although it wasn’t possible to do the following for all of the C++ code, we could use WebAssembly to take the core of the mathematics part of the calc engine that did 40-point arithmetic, and it was in C++, and we were able to compile that to JavaScript using WebAssembly or using that tool chain—first to WebAssembly, then back to JavaScript. And so now, we’ve got great performance numbers, so accuracy is, like, 99.95 percent. We can evaluate more than 99 percent of formulas, and a slightly goofy statistic is that there’s so many customers of Excel for the web that Calc.ts saves them a total of seven years every single day because it speeds up the response you get without needing to wait for the server. 

PEYTON JONES: Also, it must be helping global warming, right? If you’re gonna add 3 and 4, then that should be one machine instruction. If you’ve gotta send it across the planet to a data center, that must be like bazillions of machine instructions, you know, that are just being wasted.  

[MUSIC BREAK] 

PEYTON JONES: Going back to Excel as a product, that brings us sort of more or less up to date to the, um, the release of the LAMBDA function. Maybe you could say a little bit about what that is. 

GORDON: I think we’re all really super thrilled about having LAMBDAs in Excel. So, LAMBDA, it’s the same LAMBDAs that you have in programming languages. It’s an anonymous function. You can, say, double LAMBDA X—right now, you’d write in Excel as “LAMBDA X, X+X” and then that formula returns a function that if you give it an X, doubles it. Same as in any functional programming language. And this is going to be really great for users of Excel because it’s going to make formulas more readable. Excel formulas are notorious because they tend to be really complicated if you’re trying to do something clever, these notorious megaformulas that are really big things, that are really hard to read but are sort of squished down to a single cell, and LAMBDA will let you give names to parts of those, and we also have LET that let’s you give names to subexpressions within a formula. So, overall, it’s gonna make expressions much more readable and also much more reusable. You can define a computation once and then you can call it from many places, and just like in other programming languages, you get the great benefit that if there is something wrong with your initial definition or it was a preliminary definition for testing and you want to improve it and fix a bug, you just need to fix it in the one place and then that will propagate to all the uses. 

PEYTON JONES: So, this is another funny thing, right? So, when Margaret and Alan and I were initially thinking about user-defined functions, we thought, “Well, we have to define it as a worksheet.” I don’t think I ever dreamt that we could offer something as exotic as LAMBDA, which sounded very geeky and programming-language-y, and yet, it serves the same function. You can define named abstractions, but because it doesn’t require much UI—it’s just another form of formula—it’s been much easier to introduce as the first form of the product. I still hope that we’ll get sheet-defined functions, but it turns out that LAMBDA, and this really is the full-on lexically scoped LAMBDA—you can define Church numerals, you can do the whole thing, you can define the Y combinator—this is real LAMBDA—it is now part of Excel, and that’s amazing. That makes the language Turing complete. Because we can write the Y combinator, you could really write literally any program in Excel now. But that’s a qualitative shift, right? So, certainly, it becomes computationally much more powerful, and because we can name these LAMBDAs, you can have functions that call functions that call functions. We can even write recursive functions. These named LAMBDAs can call each other. 

GORDON: Yes. I mean, an example I love—it’s a very simple one, but just reversing a string. Believe it or not, although Excel has had lots of functions for processing strings, you can’t write a function—before LAMBDA, anyway—within the formula language that can reverse a string. But with LAMBDAs, you can just write a nice, little recursive function that does that. There’s real demand for a bit of functionality like that, but it wasn’t possible to do within the formula language. 

PEYTON JONES: And I love the way this is sort of empowering users, right? Now, users can write these new functions themselves and, in fact, can write functions specific to their particular domain or area of expertise or work group, and then call those functions just as easily as the built-in ones. I think that’s a huge, huge change.  

GORDON: It’s gonna be really interesting to see, you know, the kind of things people do and how we can share them. I’m hoping there’ll be a sort of open-source community build up around LAMBDAs. 

PEYTON JONES: Yeah, I really hope that there’ll be a sort of third-party library and people who say, “Here’s my library for doing operations on”—I don’t know—“tensors” or something. And we’re gonna need to provide better support for libraries and richer data types, but I think that’ll come. And then people can produce libraries themselves. 

GORDON: Yeah. It’s really great. I mean, ’cause we’ve been using LAMBDAs internally for a few years now, and it’s just a different kind of experience. It’s live programming. You’ve got a grid, you’ve got data just right in front of you. It’s just really highly interpretive. You know, you can change one LAMBDA one place and instantly all the changes percolate. 

PEYTON JONES: You’re saying that even as a programming language person, doing functional programming in Excel with LAMBDAs and dynamic arrays feels quite different and more engaging than doing programming in, I don’t know, Haskell or F#. Is that what you mean?  

GORDON: Yeah. Exactly. 

PEYTON JONES: That actually leads me to just—we’re sort of, I guess, drawing towards the end. I was kind of reflecting on what have we learned from all this. One of the things that for me would be a very exciting outcome, if we can really make Excel into a language that doesn’t have this sort of glass ceiling that prevents you getting beyond a certain point, that lets you write functions that can call functions that can call functions that can have data that has data that has data inside it, then we could maybe imagine introducing programming to children for the first time through the medium of Excel. So, the child’s first programming language, maybe alongside something like Scratch, which is wildly popular, could be a spreadsheet, initially experienced in this very direct visceral way through just entering data and visualizing it with a chart, but then some simple data transformations using functions and then the ability to wrap those functions up into functions of your own. It would bring the pieces of my life together—my functional programming life, my Microsoft life, and my education life. It could bring that all together into, “We teach children programming using Excel for the first time.”  

GORDON: I think that’s amazing. I think that’s a fantastic, exciting vision, Simon. Those are things you’ve worked on. I mean, it’s really cool that those things are coming together, but I actually really believe that the spreadsheet environment is a great place to learn because it’s so alive, you know? You make a little change and you can instantly see the results. It’s a really sort of evocative way to program and to learn. 

PEYTON JONES: Anything else that you feel you’ve learned from this experience? 

GORDON: There’s two things. I mean, I’ve touched on HCI and programming languages, so I won’t say so much about that except that I see a lot of excitement now in that interplay. There’s a few, you know, professors—I’m thinking of people like Eleanor Glassman or Amy Coe—professors who are on the intersection of programming languages in HCI and are doing really cool stuff, and there’s summer schools on the topic. So, I think that’s really growing as a research area, and I’m really excited to be part of that direction. The other thing is, like I said near the start, that before getting into spreadsheets, I’d spent quite a while looking at probabilistic programming, which is a way of expressing sort of Bayesian decision-making using code, and I sort of put that aside for a bit because I really, you know, I thought the way to achieve that, to sort of empower people to do Bayesian reasoning, to sort of use probabilistic reasoning to make good decisions, I think the way to get there is to get it into spreadsheets. And so, I think the kind of things that we’ve been talking about are a step in that direction. When I was doing probabilistic programming, that was really aimed at quite sophisticated users, people who are like data scientists, but I think we are seeing in society generally a really big need for the general public to understand probabilities and to understand uncertainty. I’m thinking about things like election forecast. I’m thinking about the various uncertainties around COVID. I think to be sort of numerate about these uncertainties, well, you need to be numerate to deal with them properly, to make good decisions, and to be numerate, you need to take the uncertainty into account, which is usually using probabilities. So, I think that the kind of features that we’re adding to Excel will make easier being able to visualize probabilities, to have charts that take probabilities into account, to sort of package up ways of visualizing probability distributions inside LAMBDAs so we can give textual descriptions of probabilities. People find, you know, a number like, you know, probability of 0.7 a little bit hard to understand; there’s a big idea called natural frequencies, where you basically remove the points and you can say what 0.7 means, and studies have shown that people respond much better to those kinds of natural frequencies—70 out of 100—than probabilities like 0.7. So I think that the programming features we’re adding to Excel will allow people to build libraries that make it easier to construct, to, say, represent distributions using natural frequencies, and eventually to make charts that represent probabilities nicely. And then a bit further out—this is quite a long-term dream—the idea of actually using Excel to build probabilistic models of situations will then be possible given the kind of work we’re doing on adding LAMBDA to Excel. So, that’s my long-term dream of where we’re gonna go with LAMBDAs and these other features in Excel. 

PEYTON JONES: More generally, I’m quite excited about the possibilities of taking end-user programming seriously, by which I mean that because Excel starts from being an end-user tool—that’s what it has always been; by end-users, people who are trying to get some other job done—that if we can bring that constituency of end users on a journey with us to empower them, to give them the tools to express more and do more, it would be quite exciting to see how far we can go. And I think that functional programming using Excel, there’s a really high ceiling there that can go a long way. But I think just as we were talking about with Calc View and your sense that the programming using this just felt different, I think it may take us on a journey that isn’t altogether where we expect. [MUSIC STARTS PLAYING UNDER DIALOGUE] We’ve already seen that in the dynamics that we’ve discussed between us and the product group, so I think the future’s quite exciting there. 

GORDON: Thanks, Simon. It’s been great talking with you this afternoon. We’re so excited about this work, and hope everyone listening is, too. Do check out the Microsoft Research website for more information. Thanks for joining us on the podcast.  

继续阅读

查看所有播客