Program synthesis and the art of programming by intent with Dr. Sumit Gulwani

已发布 2019年11月20日

分享这个页面

Sumit Gulwani on the Microsoft Research podcast

Episode 99 | November 20, 2019

Dr. Sumit Gulwani (opens in new tab) is a programmer’s programmer. Literally. A Partner Research Manager in the Program Synthesis, or PROSE, group at Microsoft Research, Dr. Gulwani is a leading researcher in program synthesis and the inventor of many intent-understanding, programming-by-example and programming-by-natural language technologies – aka, the automation of “what I meant to do and wanted to do, but my computer wouldn’t let me” tasks.

Today, Dr. Gulwani gives us an overview of the exciting “now” and promising future of program synthesis; reveals some fascinating new applications and technical advances; tells us the story behind the creation of Excel’s popular Flash Fill (opens in new tab) feature (and how a Flash Fill Fail elicited a viral tweet that paved the way for new domain investments); and shares a heartwarming story of how human empathy facilitated an “ah-ha math moment” in the life of a child, and what that might mean to computer scientists, educators, and even tech companies in the future.

Microsoft Research Podcast (opens in new tab): View more podcasts on Microsoft.com
iTunes (opens in new tab): Subscribe and listen to new podcasts each week on iTunes
Email (opens in new tab): Subscribe and listen by email
Android (opens in new tab): Subscribe and listen on Android
Spotify (opens in new tab): Listen on Spotify
RSS feed (opens in new tab)
Microsoft Research Newsletter (opens in new tab): Sign up to receive the latest news from Microsoft Research

Transcript

Sumit Gulwani: Ninety–nine percent of people who use computers do not know programming and they get stuck with repetitive, tedious tasks. But these are quite creative people and it is just that programming, as it exists today, creates an artificial barrier to program computers. Program synthesis can allow these end–users to express themselves naturally and program computers as easily as they would command a personal assistant.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Dr. Sumit Gulwani is a programmer’s programmer. Literally. A Partner Research Manager in the Program Synthesis, or PROSE, group at Microsoft Research, Dr. Gulwani is a leading researcher in program synthesis, and the inventor of many intent-understanding, programming-by-example and programming-by-natural language technologies.

Today, Dr. Gulwani gives us an overview of the exciting “now” and promising future of program synthesis; reveals some fascinating new applications and technical advances; tells us the story behind the creation of Excel’s popular Flash Fill feature (and how a Flash Fill Fail elicited a viral tweet that paved the way for new domain investments); and shares a heartwarming story of how human empathy facilitated an “ah-ha math moment” in the life of a child, and what that might mean to computer scientists, educators and even tech companies in the future. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: Sumit Gulwani, welcome to the podcast!

Sumit Gulwani: Thanks, Gretchen. It’s great to be here.

Host: So I like to situate guests at beginning, and you’re Partner Research Manager in the Program Synthesis, or PROSE group at Microsoft Research and you describe yourself as a scientist seeking connections… between ideas, between research and practice, and with people in varied roles. So in other words, you’re a dot connector! In broad strokes, tell us what gets you up in the morning? What “X” are you and the other members of your group trying to solve for?

Sumit Gulwani: It is simply the opportunity to work with the fantastic team that I am part of. We have richest experts in programming languages, formal methods, software engineering, machine–learning, and even human computer interaction. We have researchers and engineers working closely with program managers and user–experience designers. I’m also terribly excited about the charter of this team, which is to advance the state–of–the art in program synthesis and to deliver these innovations as magical experiences across a wide range of Microsoft products.

Host: Well, we have a lot of ground to cover today on the topic of program synthesis, so let’s set the stage, and operationalize the term. What is program synthesis and why is it significant?

Sumit Gulwani: Program synthesis is any capability to automatically generate programs, or code fragments, from user’s intent expressed to some natural form like input/output examples, demonstrations, partial programs or even keywords or natural language. This has the potential to facilitate two disruptions. First, creation of 100x more programmers, because ninety nine percent of people who use computers do not know programming and they get stuck with repetitive, tedious tasks. But these are quite creative people and it is just that programming, as it exists today, creates an artificial barrier to program computers. Program synthesis can allow these end users to express themselves naturally and program computers as easily as they would command a personal assistant.

Host: Hmm.

Sumit Gulwani: The second disruption relates to improving the productivity of existing developers and data scientists by 10 to 100x in many task domains. Most of the code that these folks are writing is simply boilerplate code.

Host: Hmm.

Sumit Gulwani: And there is very little algorithmic creativity involved. Program synthesis can liberate programmers from having to focus on boring details and enable them to focus on creative aspects of programming.

Host: So let me drill in a little bit there on the claims of 100x and 10x to 100x increase in productivity and so on. What data do you have to support the claims that this is going to make our productivity go way up?

Sumit Gulwani: We did some user studies related to accomplishing a given task. So what took Python programmers around thirty minutes to accomplish was doable, using our program synthesis technologies, in less than a minute. And this was quite representative of the tasks in the space of socalled data wrangling or data cleaning.

Host: Right, right. So is this technology being used and you’re seeing it in action and seeing these increases, or is it just on, sort of, research/study end of things right now?

Sumit Gulwani: In fact, it is very real. The first mass market deployment of the program synthesis technology was in the form of this feature called Flash Fill in Excel, which has been in Excel since 2013. And since then, we have released similar experiences based on example–based interaction across many different Microsoft products.

Host: Well let’s zoom in and talk about Flash Fill, which you just mentioned, and it synthesizes string transformation programs quickly with minimal input/output examples, sometimes even just one? Tell us about Flash Fill. What inspired the idea in the first place? And tell us, technically, how you went about building in those efficiencies.

Sumit Gulwani: Ten years ago, I was flying from Frankfurt to Seattle, and there was a lady sitting next to me in the airplane. She was impressed to know that I have a PhD in computer science and that I work for Microsoft Research, so I ought to help her with the task that she was struggling with. She opens up her laptop, fires up Excel, shows me a column of names in the format first name, space, last name, and asks me, how can she reformat in the form last name, comma, first name? Now at the time, I had no idea about the programming model underneath Excel, so I had excuse myself out of the situation! After returning home, when I searched for the solution to that problem on Excel Help forums, it is then that I realized that there were many, many people who struggled with simple repetitive tasks, and were soliciting the help of an expert on a Help forum while communicating their intent using a few input/output examples. A typical interaction would take place over a course of several days. Now, this inspired me to develop the Flash Fill system that can automate the role of the expert on the Help forum and bring down the interaction time from a few days to a few seconds. The key technical aspect of Flash Fill is the search strategy that it uses. Instead of searching over a general purpose programming language, we restrict the search to an appropriately designed domain-specific language that includes operators for string transformations such as regular expressions, substring, concatenate, and limited form of conditionals to allow for dealing with data in different formats.

Host: Okay.

Sumit Gulwani: Secondly, instead of blindly enumerating programs over this underlying DSL, we leverage logical properties of the operators in this DSL to allow for a goal-directed search. So essentially, we flow the input/output examples down the grammar of the DSL. And this identifies programs that are consistent with the examples. The third idea stems from the observation that we also use several heuristics to guide the search over many choices that remain, even after logical reasoning has pruned most of the possibilities in the search tree. Authoring and maintaining these heuristics has been an expensive proposition, so recently we have actually started using machine learning to learn these heuristics from data that relates to past exploration of the search space over various benchmark tasks. Another defining aspect of Flash Fill, as you mentioned, is the fact that it can learn the user’s intent from very few examples.

Host: Right.

Sumit Gulwani: And it does this by ranking programs to pick a candidate from among the many programs that satisfy the user’s examples. We prefer programs that are smaller, simpler, use fewer constants, use smallersized constants. We also examine the output generated by these programs on the other test inputs that are available. We prefer those programs that generate outputs that are similar and look uniform. For instance, suppose we have one program that generates all outputs that are valid dates and another program there generates mostly dates, but also some gibberish stuff.

Host: Right.

Sumit Gulwani: Then we have an additional reason to prefer the first program.

Host: Let’s talk a bit about a scenario where Flash Fill didn’t work. I’ll call it a Flash Fill Fail. Tell us that story and then tell us what you learned as a researcher from that experience.

Sumit Gulwani: Flash Fill can automate a wide variety of string transformations. However, there are many transformation tasks that it cannot do such as number transformations or date transformations.

Host: Huh.

Sumit Gulwani: But the user–experience is so inviting that people invariably want to give it a try even on tasks that it was never meant for. And then talk about it when it does not work.

Host: Right.

Sumit Gulwani: So there was this recent tweet on October 2018 that made fun of Flash Fill and was reshared more than 10,000 times. Twitter user Darren wrote: “AI is going to take over the world. But look what Excel auto–populated for me today.” Darren gave an example converting D-e-c to December, and Flash Fill autocompleted J-a-n to Janember and O-c-t to Octember. My team found it funny enough to print this tweet on a t-shirt. And now this t-shirt is my dress shirt for my keynotes. The positive thing here is that this kind of feedback has been extremely useful for us to decide what new domains to invest into for program synthesis.

Host: Mmmm. While we’re on the topic of Flash Fill, I don’t need to remind anyone that Microsoft Excel is one of the planet’s most widely–used software programs and, by extension, as Ben Zorn points out, the spreadsheet is among the most widely used programming languages, so we’re firmly in ‘product’ territory here with Excel, and you’re firmly planted in ‘research’ territory. How do you manage the balance between academic research and product engineering?

Sumit Gulwani: So by the way, Ben Zorn has been a great mentor and collaborator and has shaped some of my thinking on this topic. The key is to not treat this as a conflict. The most important bit that facilitates this is to make thoughtful choices behind the problem definitions that we pick as researchers. Problems that will satisfy our intellectual curiosity, but will also justify our corporate funding. Flash Fill was the most important turning point in my career. I went from, instead of searching for the hardest problem I can solve, to searching for the simplest problem that will have the most impact.

(music plays)

Host: So I imagine program synthesis technology applies – or will apply – in many other cases besides Excel. So what other applications have you developed the capabilities of program synthesis for?

Sumit Gulwani: So we have developed program synthesis capabilities for a variety of map transformations. I already talked about string transformations, number transformations, date transformations… we’ve also developed these capabilities for lookup-based transformations and column-splitting transformations.

Host: Hmm.

Sumit Gulwani: Another area that we are heavily invested in is in the space of filter–based transformations, and specifically for file ingestion. Normally you may spend a few hours writing parsers to extract this information, but programming–by–example experiences allow you to simply specify one to two examples of various fields in the output table and the parsers can be automatically synthesized.

Host: Hmm.

Sumit Gulwani: Another related domain that we have looked at is that of re–shaping tables in semi-structured spreadsheets. Turns out that fifty percent Excel spreadsheets are semistructured, meaning the data is logged into some ad hoc format. While this is easy to visualize, it becomes tricky to analyze it. We can enable re–shaping of these semi-structured tables using an example-based experience where the user can simply specify a few example tuples in the intended output table. So all of these capabilities come under the broad umbrella of so-called data wrangling…

Host: Yeah.

Sumit Gulwani: …which is the task of transforming data from one semi-structured format to another to facilitate further downstream processing.

Host: Hmm.

Sumit Gulwani: Data scientists apparently spend eighty percent of their time wrangling data, bringing it into a form that they can then build ML models over. Program synthesis can facilitate easier and faster data wrangling.

Host: Okay.

Sumit Gulwani: Another domain that we are developing some synthesis capabilities for is that of repetitive editing inside documents such as Word, PowerPoint, or even code. We are also investigating development of program synthesis from natural language capabilities for several domains, including querying tables, visualizations…

Host: Huh.

Sumit Gulwani: …and even machine learning workflows. So the scope for program synthesis is quite broad simply given that programming is so broad!

Host: Yeah.

Sumit Gulwani: Any task domain where you can describe your intent naturally is a potentially useful candidate for program synthesis.

Host: Looking to the future then, there are some exciting new developments in program synthesis that leverage recent advances in symbolic reasoning and machine learning. So tell us about these developments, and where we are heading with these innovative threads of research in what you called predictive synthesis and modeless synthesis.

Sumit Gulwani: The first prototype of Flash Fill that I developed would take three to four examples, on average, per scenario. The Excel team told me they cannot ship Flash Fill until I made it work with one example on most simple cases.

Host: No pressure, though…!

Sumit Gulwani: Otherwise users would lose trust in the system or would make fun, you know, like the tweet that I showed you. Recently, I was challenged to do even better. Now you might wonder, how much better can we be than one example?

Host: I… I can’t even imagine.

Sumit Gulwani: Well, how about zero examples? So initially, when this was proposed to me, I thought this was a crazy idea. How can you read the mind of the user from zero examples?

Host: That’s what I was going to say is, you’re reading my mind now. Just go into my brain and decide what I’m thinking…

Sumit Gulwani: But then when I thought more about it, it started to make sense for some domains. Consider, for instance, the task of extracting tabular data from a custom text file or a webpage. Sure enough, you know, giving one to two examples of each field is a much better experience than having to write the parser yourself. But if there are tens fields, then giving one to two examples of each field can itself be a tedious task. But when I show such a semi-structured document to a human, they can often guess the underlining tabular structure. So I thought, why can’t machines do that? So this is what we call predictive synthesis. The idea here is to synthesize an intended program from several inputs as opposed to few input/output examples, and we enable this by learning or synthesizing a structure within the inputs. We have developed predictive synthesis capabilities for a few domains, including splitting a column into multiple columns or extracting tables from text files, web pages, and even PDF documents. Our predictive synthesis capability for extracting tables from text files ships as part of SQL Server Management Studio, where it is used to power the Flat File Import Wizard.

Host: Mmm.

Sumit Gulwani: And apparently now this is the most–used wizard across all of SSMS. We also recently shipped our predictive synthesis capability for extracting tables from PDF documents and from webpages as part of PowerBI. Another new development is what we call modeless synthesis. There are many mundane, repetitive tasks that users may not even naturally think of as something that can be programmed. For instance, converting all red text to green in a PowerPoint slide deck, or all dates in U.S. format to European format in a Word document. And this is where there’s a need for a personalized agent that can non-intrusively watch, learn, and make suggestions. So recently we shipped an agent in preview mode in Visual Studio that looks out for any repetitive code edits. When it identifies a couple of code transformations that can be related, it generalizes them into a more generalized script and uses that to proactively suggest all other places where I might need to make that edit.

Host: Right. We’ve talked a lot about the inner workings of program synthesis, but I’m really interested to know what kinds of form factors might be useful for different applications of the technology. So give us an example of where and how someone might make use of program synthesis and on what kind of device?

Sumit Gulwani: Most of the use cases that I have talked about have been for end–users to help them automate their tasks. One big leap that we are now investing in is to synthesize a readable, editable program in a programming language of choice. This will be especially important when the synthesized program needs to be executed on big data or deployed for future executions. Hence, we have been investing in generating readable code in specific target languages like Python, R, PySpark, and involving user–specific libraries like Pandas. Besides facilitating transparency, this also provides a lot of educational value, and most significantly, would make it possible to incorporate these synthesized programs inside a developer or data scientist’s existing workflow in IDEs or Notebooks. I’m terribly excited about Notebooks in particular. Program synthesis is good about generating small fragments of code which matches the granularity of what users write in Notebook cells. Moreover, program synthesis can generate readable code in various target languages and using various libraries. And this should address the challenge that Notebook users will have around polyglot programming and discoverability issues around new libraries and SDKs. And Notebooks provide an ideal platform for that interactivity.

Host: So cross-disciplinary research is one of four big bets you lay out in a blog post from last year, Sumit. And your own rather prolific publication record in program synthesis spans a surprisingly diverse array of computer science conferences. And you’ve talked already a little about diversity and multi–disciplinarity, but I want you to go a little further here and talk about why it’s important in research, particularly as it relates to program synthesis.

Sumit Gulwani: This is where the Indian adage “one plus one equals eleven,” that refers to the sum being more than its parts, comes true. Now, program synthesis, specifically, is a highly interdisciplinary topic. We need serious crossdisciplinary innovation to build useful technologies and usable experiences over those technologies. This has led to several publications in programming languages and software engineering conferences like POPL, PLDI, OOPSLA and ICSE. Then we use machine learning to learn various heuristics that are difficult to author by hand and maintain. And this has led to publications and conferences like ICML and ICLR. Since all of this work falls under the broad space of AI, we have several publications in AAAI and IJCAI as well. Some killer applications have been in the space of data wrangling. And because of the domain relevance, we have published in data conferences like, SIGMOD, VLDB, and KDD. And last by not the least, we have to pay attention to usability issues as well. And this has led to publications in HCI conferences like UIST and CHI. And if you think about the graduate system that produces us, the emphasis is on creating an individualistic identity, preparing students to…

Host: Right.

Sumit Gulwani: …define hard problems of their own and to come up with solutions of their own. And they tend to become deep experts in a narrow area.

Host: So, very siloed…

Sumit Gulwani: Exactly. And then the natural tendency is to continue exploration in that deep vertical. But I think it requires a different kind of thinking and leap to start appreciating the importance of other research areas and to figure out ways in which you can work together and hence achieve results that you would not have been able to produce by yourself. I think the goal of graduate education should be to make you expert in one narrow area, but also give you enough breadth so that you know what is the right tool and research area to be applied to solving a given problem or a sub–problem, so that you are not always looking at it with your own biased lens.

(music plays)

Host: Whenever we talk about the promises of innovative new technologies, we also have to talk about the perils. And this the part of the podcast where I always ask what could go wrong? I do this because I want to know if there are things we should be aware of, or thinking of, or even concerned about when we contemplate building and using these highly complex systems you’re talking about, that users will have very little understanding of. So is there anything about your work, Sumit, that keeps you up at night?

Sumit Gulwani: So now that program synthesis technologies are becoming mainstream, the increasing worry on my mind has been that of correctness.

Host: Mmm.

Sumit Gulwani: How does the user know that the program that has been synthesized is correct?

Host: Right!

Sumit Gulwani: Essentially, we need to start thinking about the debugging experiences in this new world of programming. And I think the key is going to be to regard program synthesis not as a one-shot process, but as an interactive conversation with the user.

Host: Mmm.

Sumit Gulwani: In fact, it turns out that when you do not commit to a program yourself, but you rather program by intent, we can actually enable some unique debugging experiences that are not going to be there in the standard programming world. We can, for instance, synthesize multiple programs from few examples – each of those programs is consistent with these examples, so they’re userprovided – and run all these programs in parallel on the remaining test inputs. If they all produce the same result, it doesn’t really matter which program you pick. But if these programs generate different results on some test input, it is a sign of ambiguity in the user’s intent on their test input. And we can surface that test input to the user and ask them to provide the correct output on that input. This is one of my favorite ideas and we called it “distinguishing inputs.” You can liken this to active learning in the machine learning terminology. We published this idea in ICSE 2010, and just few hours ago, I learned that this work has been selected for ICSE 2020 as the Most Influential Paper for the ICSE 2010 edition.

Host: Wow…. I love stories. Tell us yours, Sumit. What was your path to computer science and how did you end up doing what you’re doing right now at Microsoft Research?

Sumit Gulwani: I finished my schooling in India and then appeared for the joint entrance examination for IITs. I managed an All–India rank of somewhere between a hundred to two hundred out of around a million students who take this examination. My rank was sufficient to get me into computer science, but not in the city that was closer to where my parents lived, and hence, I had to make due with enrolling in the electrical engineering program at IIT Kanpur. After taking the introductory programming course in my first year there, I developed a love for the subject. The course instructor teased us that two 2*2 matrices can be multiplied with less than seven multiplications, but wouldn’t tell us how unless I was in an advanced class meant only for computer science students. So I retook the IIT entrance examination, and I managed a two-digit rank in thirties. I argued with the university to promote me to the computer science department for the second year, since the first year curriculum was same for students in all the departments, and I had managed to prove that I deserved to study computer science. But they wouldn’t agree. So I decided to drop out and re-enter the university. Now, actually, there’s a ruling that would not permit you to re-enter the university like I did. So I had to repeat the first year and was forced to take the same classes again. But I got to study what I loved and that spurred me towards more excellence in the field, and hence PhD was a natural choice.

Host: Right.

Sumit Gulwani: I was fortunate to get PhD admission offers from many top computer science universities. CMU called me and told me that the cost of living here is so low that I can even buy a house with my grad student salary! I wrote to my advisor, George Necula, at UC Berkeley and asked him about a counter-offer. He said, yes, the cost of living here is high, but this is because everyone in the world wants to live here, so now you make your choice. So that’s how I landed up in Berkeley. And then after my PhD at Berkeley I had several academic offers for faculty positions, but MSR was an easy choice for two reasons: the sheer amount of cross-disciplinary talent under one roof. It is like many university departments put together. And the opportunity to leverage Microsoft’s broad reach to customers to create real practical impact.

Host: Almost every researcher I’ve had in the booth has a side quest, I like to call it, or a personal passion. And I know from talking to you and from reading some of your personal blog posts that one of yours is inculcating human empathy as an important cultural attribute in the next generation. So circling back to your secret identity as a dot connector, tell us how you might bring those two together!

Sumit Gulwani: There are a few principles that are being increasingly embraced around the tech industry. Customer obsession, diverse work place, and work/life balance. These principles relate to the three kinds of relationships that we have in our lives: with our customers whom we serve, with our work colleagues, and with our personal contacts. Empathy holds the key to understanding and nurturing these relationships. Kids are highly attuned to being empathic. I will tell you one story involving my child Sumay when he was in preschool. I wanted to teach him some conceptual content, and in particular, a math theorem. I chose that theorem that relates odd numbers. Odd plus odd equals even. Now, while it’s a simple theorem, the student was a preschooler who had many more fun things to do. And this theorem was his first. The first evening, I made all kinds of diagrams, but unfortunately it didn’t work. The next evening I tried using all the various number-related toys in the house. Still no luck. Finally, I realized I had to meet Sumay where he was and not push to a level that he wasn’t ready for.

Host: Right.

Sumit Gulwani: The following evening, I told him a story: an odd number is like a group of guys that are all paired up, except one. And he’s the lonely guy. When two odd numbers come together, then the two lonely guys can pair up with each other and become friends. And that’s an even number. Suddenly, I could see by the look on his face that he understood and got it. And now I experience this firsthand, when Sumay was busy programming his robot on his Surface in the night, and then I tell him, Sumay, it’s time to go upstairs and sleep. And he’s so engrossed, he doesn’t respond. And then I raise my voice and ask, can you please go upstairs now? Again, no response. And then I tell Sumay in sad voice, I’m going upstairs, but I will be lonely in the odd house up there. Can you join me and make it even? And then he immediately comes to hold my hand and says, yes, Papa. Let’s go up. That’s bliss. Children are naturally terrified of being abandoned or cast out, and hence are very tuned into feelings of either rejection or belonging. Unfortunately, our ability to empathize gathers dust as we grow up. Hence, we need to reinforce this in our school education and corporate trainings. And you know, Gretchen, at the end of the day, the goal of technology should be to make us more humane.

Host: At the end of every podcast, I give my guests a chance to share some advice or wisdom with our listeners, and I think you kind of just did that! So I’m going to move over into the “predict the future” lane. What’s on the horizon for the future of programming, Sumit, and what might be a call to action for our listeners?

Sumit Gulwani: If you look at the history of programming, we went from punched cards and assembly language to high-level languages and beautiful code editors. The next evolution will take programming closer to human conversation, involving use of examples and natural language to express intent. And interactive, when a computer will act as your pair-programming agent. My team is invested in developing an SDK that can facilitate development of program synthesizers for new task domains. I hope that, with frameworks like these, we can take the art and science of developing program synthesizers from Microsoft Research laboratories to developers who are involved with writing libraries or programming tools. The grand question now is, what does this future mean for society? Programming in the future would be much easier and accessible than it is today. Hence there now should be even higher incentive to incorporate so-called computational thinking in the school curriculum. And this can empower people to effectively leverage the computational devices to unleash their creativity and hence achieve more in their lives.