MMLSpark: empowering AI for Good with Mark Hamilton

已发布 2019年10月2日

分享这个页面

Episode 92, October 2, 2019

If someone asked you what snow leopards and Vincent Van Gogh have in common, you might think it was the beginning of a joke. It’s not, but if it were, Mark Hamilton, a software engineer in Microsoft’s Cognitive Services group, budding PhD student and frequent Microsoft Research collaborator, would tell you the punchline is machine learning. More specifically, Microsoft Machine Learning for Apache Spark (MMLSpark for short), a powerful yet elastic open source machine learning library that’s finding its way beyond business and into “AI for Good” applications such as the environment and the arts.

Today, Mark talks about his love of mathematics and his desire to solve big, crazy, core knowledge sized problems; tells us all about MMLSpark and how it’s being used by organizations like the Snow Leopard Trust and the Metropolitan Museum of Art; and reveals how the persuasive advice of a really smart big sister helped launch an exciting career in AI research and development.

Microsoft Research Podcast: View more podcasts on Microsoft.com
iTunes: Subscribe and listen to new podcasts each week on iTunes
Email: Subscribe and listen by email
Android: Subscribe and listen on Android
Spotify: Listen on Spotify
RSS feed
Microsoft Research Newsletter: Sign up to receive the latest news from Microsoft Research

Transcript

Mark Hamilton: It’s one thing to count the number of photos of leopards, but how do you tell the difference between one very narcissistic leopard, who likes to get their photo taken, and several incredibly shy leopards? The only data that you have in order to tell the leopards apart is their spots and their spot patterns. Many kinds of patterned creatures, across the ecosystem, their patterns are kind of like human fingerprints in that they’re unique but slightly varied, so it’s a very difficult task. So that’s really the next step.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If someone asked you what snow leopards and Vincent Van Gogh have in common, you might think it was the beginning of a joke. It’s not, but if it were, Mark Hamilton, a software engineer in Microsoft’s Cognitive Services group, budding PhD student and frequent Microsoft Research collaborator, would tell you the punchline is machine learning. More specifically, Microsoft Machine Learning for Apache Spark (MMLSpark for short), a powerful yet elastic open source machine learning library that’s finding its way beyond business and into “AI for Good” applications such as the environment and the arts.

Host: Mark Hamilton, welcome to the podcast.

Mark Hamilton: Thank you! Thank you for having me.

Host: I always start my podcasts with introductions, and they’re usually pretty straightforward, but you’re differently situated than most researchers I get in the booth. So I’m going to start by letting you tell me and our listeners who you are, what you do, where you work, and in general, what gets you up in the morning.

Mark Hamilton: Yeah, so I’m Mark Hamilton. I’m a software engineer on Microsoft’s Cognitive Services team, and I run an open source machine learning library called Microsoft Machine Learning for Apache Spark and this library has kind of brought me into the more applied research space. So we do a lot of AI for Good-type projects with it and recently, I’ve started my PhD over at MIT. Right now I’m part time at Microsoft. I used to be full time, I’ve been working at Microsoft for the past three years. And during this next chapter of my life I’ll be part time, both getting a PhD in computer science and mathematics, as well as working at Microsoft. Technically, I’m on Microsoft’s Applied AI team and so Microsoft’s Applied AI team creates things like the Cognitive Services so, translation, text analytics, computer vision, these types of projects, and we really heavily collaborate with Microsoft Research. So we collaborate with John Langford and the Vowpal Wabbit team, we collaborate with LightGBM, and the DMTK team in MSR Asia. We also collaborate with the AI for Earth team and Lucas Joppa, his whole side of the organization, so we really have a lot of different projects that kind of span into Microsoft Research where, personally, I really like to work. I really like these kind of research problems, I like working on mathematics and things that aren’t necessarily easy to implement.

Host: Right. So, since you’re at MIT, you’re physically situated in Massachusetts. Are you working with Microsoft there in Massachusetts as well?

Mark Hamilton: I work at the Microsoft New England Research and Development Center, the NERD, um, colloquially put. And MIT is right next door, so it makes it fairly easy to kind of go back and forth between these two, although my team is actually situated in Redmond. So I come out here about every month and visit Sudarshan, my manager, and say hi, you know, and make the rounds that kind of stuff.

Host: And that’s why I’m looking at your face today instead of doing this remote, which is awesome. Well you’ve been working as a software engineer and a developer, but you just started your PhD at MIT as you’ve just told us. What made you decide to do that and what lines of research are you most interested in right now?

Mark Hamilton: It was a tough decision because I couldn’t imagine working with a better team than I do now at Microsoft, but really what it came down to is that I just missed math! I missed starting at a mathematics textbook all day! And really, that kind of intense learning is what I really missed in my life. And so that’s kind of what drove me back to the PhD, but I didn’t really want to leave my team because they’re an amazing group of people and amazing collaborators, so definitely wanted to keep that connection alive as I moved onto the PhD. Some of the things I really want to start to tackle in my PhD is really dive deep into the mathematical foundations of deep learning and how to use topics from more advanced and algebraic mathematics to really influence the kinds of architectures that we can create to learn. And so some of these particular threads that I’m interested in is information theory, because information theory provides these really nice mathematical tools to describe knowledge in the pure and crystalized way, and what’s really exciting is, you know, now we have the ability, through things like adversarial networks, to really control how information flows through deep networks. And that can really yield a lot of interesting techniques and applications. One of the things that I’m particularly interested in is using information theory to help us understand complex systems that we ordinarily would have no idea where to start. So things like our own thoughts, or the thoughts of kind of the larger organism, namely the whole human race, that kind of thing would be really interesting to see if algorithms could pull out structure and really tell us what it is we’re doing when we’re thinking or when we’re communicating.

Host: Small aspirations there, Mark…

Mark Hamilton: Yeah, no. I’m always driven by the crazy, core-of-knowledge kinds of problems, I like them big.

Host: Right. We hear a lot about AI for Good these days which is sort of an umbrella term for applying artificial intelligence technologies to things we feel good about, as opposed to just, how can I get more clicks on my ads? It manifests itself in fields like medicine, agriculture, the environment, and the arts. So we’ll talk about some of those specific projects shortly. But first let’s wade upstream and talk more philosophically about using our power for good and not for advertising. What are the promises of AI that excite you most?

Mark Hamilton: Yeah, I mean AI is very exciting to me because it’s one of the most powerful tools that a developer has at their disposal in order to create things of unprecedented intelligence and ability to do work in the real world. I think that it’s important that organizations like Microsoft, and organizations across the world, really think about devoting some of their resources to AI for Good-type problems. Because it not only brings a diversity of problems to the table, it also, really, can make an extraordinary impact where it’s not like you’re just optimizing the last three decimal places out of an advertising click-through scenario, you’re really, fundamentally changing the shape of a problem and its solution in some way that may or may not have had a lot of machine learning in the past.

Host: Well, and I want to stop there for a second because usually the problems to get the most attention are the ones that potentially make the most money, or affect the bottom line of some organization or person. And some of these problems under the AI for Good umbrella tend to be important things, but things that don’t have a huge ROI, at least at the outset in people’s minds. And so it’s like why should we put our minds over there when someone’s going to pay us to put our minds here. So how’s that mapping out? I mean you see Microsoft putting its muscle and mind and money behind AI for Earth, so what do you think about that?

Mark Hamilton: Yeah, I think it’s a lot like research, or like any other kind of problem, where there’s an incredible value to diversity. Where if your entire business model is parked in a single kind of task, or a single kind of automated solution, you 1) are incredibly sensitive to shocks in the market, and 2) you never really know where the next good idea is going to come from. A lot of times you get a huge boost from diversity. Like, when I used to work in automated theorem proving, the hot topic, when I worked there, was agent based theorem provers and they’re kind of like lots of little tiny algorithms that all try to prove different parts of the theorem. And when you look at how long each one of these little tiny algorithms takes to solve an individual task, it either solves it instantly, or it took years. They would never solve it.

Host: Right.

Mark Hamilton: And so that’s when people kind of started realizing we need a diverse collection of these, and we need to pool them together because, you know, for some lines of research or lines of thinking the problem can immediately be solved. Whereas with other lines of thinking and lines of work the problem will never be solved.

Host: Until quantum…

Mark Hamilton: Yeah.

Host: Billions of years in ten minutes.

Mark Hamilton: Maybe.

(music plays)

Host: Let’s get more specific and talk foundationally about a really cool machine learning framework that you’ve already alluded to, and you gave it it’s full name, gut it’s called MML Spark. Give us an overview of MML Spark, and describe some of the features that set it apart.

Mark Hamilton: Yeah, so what Microsoft Machine Learning for Apache Spark, or MML Spark for short, aims to do is really bring together a lot of the different technologies that currently exist in the Microsoft and kind of more broader computational ecosystem, and put them all under the same roof with a special set of superpowers. And these superpowers are massively distributed so that you can run it on hundreds to thousands of machines at a time, and elasticity. So what this means is that you can kind of add in new computers as you see fit, or kill off some computers if they’re taking up extra resources. And so this kind of lets you make computations that grow or shrink depending on how much data is actually flowing through the computation. We want it to tackle the largest scales that any industry could possibly see, but also be able to scale down to a single node if you don’t have that kind of money lying around. More specifically, what MML Spark brings to the table is deep learning in this kind of large, distributed, big data environment, efficient gradient boosted trees with LightGBM. We’ve recently added Microsoft Research work in Vowpal Wabbit on Spark, so kind of bringing that into this distributed ecosystem.

Host: Talk about Vowpal Wabbit for a minute, and every time I say it, I feel like Elmer Fudd but I suppose that’s the point.

Mark Hamilton: Yeah, so, Vowpal Wabbit is one of the newest additions to MML Spark and what Vowpal Wabbit provides is really hyper-efficient text analytics and now, increasingly more, it’s broadening into the scope of reinforcement learning, and multiarmed contextual bandits. So one thing that VW, colloquially put, is incredibly useful for is working with like text classifiers, text regressors, and also optimizing different text based situations. So, for instance, MSN uses it to optimize all of their ads and do what’s called multi-world testing…

Host: Right.

Mark Hamilton: …and a lot of other companies are starting to use it to kind of automatically update and refine their content.

Host: All right so keep going a little bit more on the MML Spark value proposition, if you will, to use a marketing term.

Mark Hamilton: Yeah there’s like a bunch of random, uh, bunch of random different things that are kind of all under the roof. I mean one extra thing that we’ve kind of lit up in the past year is the ability for Spark not just to be a big data platform, like a big cluster computing framework, but to serve as what’s called a microservice orchestrator. And so, in software engineering, it’s really useful to take all of your different components and encapsulate them behind nice little packages, little boxes that talk to each other in the same way that me and you talk to each other. And this kind of provides a nice separation of church and state between different components of your world, makes it easier for teams to collaborate on them and things like this. And so what we do in MML Spark is we give the building blocks to create these kinds of ecosystems of algorithms that all talk to each other, so that you can kind of take your existing Spark computation and turn it into a web service, or you can take your big collection of Spark clusters and use it to talk to a web service.

Host: Wow.

Mark Hamilton: And so it kind of provides these two pieces as well as doing integration with other machine learning frameworks, so…

Host: All right, so what’s the origin of it and who’s contributed? Because I’m hearing some academic contributors, some Microsoft Research contributions and even Microsoft proper. It’s a product.

Mark Hamilton: Yeah, yeah, I mean um the product originally came out of what’s called Azure Machine Learning, and what we first started on was the cognitive tool kit on Spark, or taking deep learning and trying to parallelize it across hundreds of computers at a time.

Host: Right.

Mark Hamilton: And then the first project that we ever worked on was the Snow Leopard Trust and that really provided the impetus to grow the library and then people saw its potential through working on snow leopard recognition and gave us a lot more time and gave us a lot more ideas and a lot more challenges that we had to kind of solve with the same library and really forced it to grow in a lot of different directions.

Host: All right, I want to talk about that right now, the Snow Leopard Trust. And there’s project under AI for Earth that you’ve done with the Snow Leopard Trust, and this is sort of AI technologies in conservation. This is about identifying, counting, and tracking snow leopards, which are very elusive cats, with motion trigger cameras, or camera traps. Why do we need to track them in the first place? And tell us why MML Spark makes a difference here.

Mark Hamilton: A few years back, you might have heard the news that snow leopards were considered no longer endangered. Like, woo-hoo! Great! Our leopards are no longer endangered! But there is a flip side to that in that it’s not like they discovered a whole new leopard habitat, or a whole new swathe of leopards, they really went back to some of their existing models and they thought about it a little bit more and they changed them a bit. And a lot of organizations, the Snow Leopard Trust included, think that there really needs to be a lot of thought before you take an animal off the endangered species list, because that can be very detrimental to the animal’s protected status. It suddenly means that a whole bunch of rules that were designed to actually preserve and protect the leopard no longer apply. And so, what the Snow Leopard Trust really aims to do is create a much more robust collection of data to really hone in on the true number of snow leopards so that we can accurately say, what’s going on with the population of snow leopards and should they or should they not be endangered? And you might think, like, what do I care about this big cat in the mountains? But, in order to protect the snow leopard, because it’s the apex predator, it’s at the top of the food chain, you kind of need to protect the entire food chain in order to keep that funnel going up there.

Host: Right.

Mark Hamilton: So protecting the apex predator of an ecosystem is incredibly important for the ecosystem’s health, as a whole. If you keep the apex predator alive, some other portions of the ecosystem start to swell. For instance, when they brought the grey wolf back into Yellowstone, suddenly, it was a much lusher place because all of the herbivores, that kind of secondary trophic level, were taken out by the wolves and so suddenly plants could grow again and so, you know, the influence of an apex predator can sometimes have like profoundly interesting and beneficial effects on food and the stability and biodiversity.

Host: Right, and also the snow leopard, being a stealthy cat, you would have some need to really understand how to count them because, are they not there, or are they just hiding?

Mark Hamilton: Yeah, this is a particularly a tough challenge because, in the Snow Leopard Trust’s, like, multiple decades of work on these creatures, they’ve really only been able to collar in the tens of leopards.

Host: Oh, wow.

Mark Hamilton: And you can’t really get much data out of ten leopards and so the only real option is to use camera traps and camera traps are these automated systems that, they have a camera and they have an infrared detector that detects motion, and they’ll fire a burst of photos every single time something moves in front of it. And one of the problems with this is that, there’s some snow leopards but there’s also a lot of waving blades of grass, some goats and even the locals that go and dance in front of the camera. There’s a ton of fun stuff in this data set. So there’s a lot to cull through.

Host: That’s a good segue into, why MML Spark? What does machine learning add to the mix here?

Mark Hamilton: It can be incredibly difficult, in order to actually look at one of these photos and rule out that a leopard’s not in the photo because leopards are kind of designed for this. They’re designed to hide. They’re designed to be indistinguishable from their surroundings. So for a human being, this can be fairly difficult and time consuming if you really want to be sure that there’s no leopards there. And so we’ve estimated that it would take around twenty thousand hours to cull through the Snow Leopard Trust’s, like, one point two million images that they have in their backlog from all of these fifty to sixty cameras that are out in various different locations. And so what we really need is a system that can handle this incredibly large, upfront cost of processing one point two million images, but then elastically scale back down to handle the kind of day-to-day flow of data through the system.

Host: How’s it going, then? Have they said, this new data set confirms our suspicions that it’s not endangered, or what?

Mark Hamilton: Yeah, so there’s a few extra steps that need to happen before we can really get those concrete numbers. One of them, which is the task that we’re working on now, is that it’s one thing to count the number of photos of leopards, but how do you tell the difference between one very narcissistic leopard, who likes to get their photo taken, and several incredibly shy leopards? The only data that you have in order to tell the leopards apart is their spots and their spot patterns. Many kinds of patterned creatures across the ecosystem, their patterns are kind of like human fingerprints in that they’re unique but slightly varied, so it’s a very difficult task. So that’s really the next step. And what we’ve already provided is kind of a great burden lifted off the Snow Leopard Trust’s shoulders, but this next burden of matching them up, they still have to do it kind of “CSI-style” where they print out all the photos and they plaster them all over whiteboards and try to piece this puzzle together manually in a conference room as opposed to something that is a bit more efficient and automated and scalable.

Host: Okay, so that’s interesting because I was thinking that’s exactly where the machine is going to do better than I would do at telling the difference between the narcissist and the introvert. And, that said, then we’re still back to human labeling and data identification… Yeah? No?

Mark Hamilton: Yeah, so, you know there’s a lot of nice technologies out here now that really aim to solve this problem in a generic way. The one that we’re looking at is called HotSpotter. A lot of researchers in the literature have kind of come up with automated ways in order to do species identification based on patterns. And a lot of these algorithms they really require you to understand not just, is there a leopard in the photo, but where is the leopard in the photo?

Host: Right.

Mark Hamilton: So some of the work that we’ve done to try to address this is that we have this classifier, this thing that can say, yes, there’s a leopard, and no, there’s not a leopard, but it would be ridiculous to think that this classifier didn’t actually know where the leopard was and yet performed well. And so we can actually pull this information out of the classifier by asking it a series of questions. This method is called LIME, or Locally Interpretable Model-agnostic Explanations, that can take any sort of classifier and look into its brain, so to speak, and figure out what is actually causing the classification to occur. And when you do this with a leopard classifier, you find that it kind of hones in on the leopard’s body and we hope to then pipe this into HotSpotter and really use that to complete the end-to-end pipeline in a way that doesn’t really require any human effort or any human labeling.

(music plays)

Host: Well, another area that falls under the AI for Good umbrella is a fairly recent trend to deploy AI in cultural preservation and engagement, and these are two sides of an important coin. So you’ve got a great story about how Microsoft, MIT, and the Metropolitan Museum of Art in New York City got together to create what you call Gen Studio and employ GANs, or generative adversarial networks in the world of fine art.

Mark Hamilton: Yeah, so this collaboration came out of an initiative that the Metropolitan Museum of Art started where they took their entire collection and they took really high quality photos of everything in a lot of different angles, put it together with all of the metadata about the artists, and put these as open-access online so that anyone from across the world, developers, anyone looking to enjoy and use this art in their projects could really grab it without needing to worry about the rights and needing to worry about licensing, and…

Host: So, seriously. I can go and use any piece in the Met collection in anything I want to do?

Mark Hamilton: Yeah, they’ve done this for about four hundred thousand different pieces in the Met’s collection.

Host: How many are there?

Mark Hamilton: Yeah, um… I think there might be like a few million, um…

Host: I suppose.

Mark Hamilton: …but can’t quote me on those tough figures.

Host: I’m not gonna.

Mark Hamilton: Yeah. So they released this large open-access catalog and then they employed a team to go through the collection and tag the art with what’s actually being shown in the art. So people and flowers and dogs and cats and turkeys and what have you, and so not only do you have the actual images, you have some semantic knowledge about what’s going on inside.

Host: Okay.

Mark Hamilton: And this semantic knowledge is slightly different than what you’d get from a normal classifier because it’s not that it’s tagging a physical turkey. It might be tagging the engraved metal in the shape of a turkey. So things that wouldn’t normally be picked up by algorithms, so it’s a very useful data set, and what they really wanted to do was create an environment where people could use all this data to create interesting applications or create applications to reach a broader segment of people. And that’s really where we came in is that we took this data and we really wanted to create an experience that got people excited about, not just the Metropolitan Museum of Art’s collection, but the way that we, and algorithms, kind of think and make art in a real platonic and philosophical sense…

Host: Keep going! I mean I love all aspects of this project, especially the idea of “generative adversarial art.” Can you unpack that a little bit? Because, wow!

Mark Hamilton: Yeah, so what a generative adversarial network tries to do, so it tries to model different distributions, it tries to kind of mimic the collection of data that you have available, and the collection of data that we had available was the Metropolitan Museum’s collection. So the particular algorithm that we used learned kind of like a human being does in that it starts out by creating images about the size of a postage stamp, you know eight by eight pixels, and slowly but surely, as it learns, it grows the image. And what this allows you to do is generate very high resolution photos of art. What we really aim to do primarily is to create an application that lets you explore, with this generative adversarial network, and then find related pieces to your creations in the actual collection. So not only do we have a GAN, we have an ability to take created works and reference them via a custom reverse image search and pull them up in the Met’s actual search collection and then find these kinds of connections and give people a starting point to branch out and explore this kind of four hundred thousand work database that they have assembled. Definitely one of our goals, and as we continue on in this vein of work, is to create things that would get schoolkids excited, get people who wouldn’t ordinarily go to the museum, get people who have never been in the same hundred square mile radius as the museum to start to play with these, because it’s hosted online anyone can go to the website and, and start to explore.

Host: What is the website, so we know?

Mark Hamilton: It was gen.studio.

Host: That’s it?

Mark Hamilton: That’s it.

Host: That’s g-e-n dot studio.

Mark Hamilton: Yeah.

Host: Okay I’m going there next… I’m really interested in some common themes I’m detecting here. We’re using AI to preserve, conserve, and save, but also to create, generate, and innovate. Let’s head back to the more philosophical for a minute and talk about creativity and machines in general. I know you have big ideas. Why don’t you share some of your thoughts and then tell us what the world looks like if you’re wildly successful.

Mark Hamilton: Right now I think that AI creativity is a bit limited in that it really mimics what humans have created. And there are ways to think about taking this outside of the box. You know, you can think about encoding the temporal evolution of art and seeing if an algorithm could extrapolate into the future, but it doesn’t really, to me, feel like it addresses that problem. I mean the closest thing that I’ve found to something that I hope exists that we can discover is that, a few years back, there was a lot of hype about algorithms that created their own languages. And in some sense, what this research kind of showed is that a language is not something that is learned or taught, it really emerges out of the physical world in that a language is needed in order to communicate and solve problems in the world, and can emerge by itself in a multiagent system. And so one of the things that I think would be a nice holy grail is if we can create a kind of multi-agent system that re-discovers art, or where art emerges, as a real, pivotal, fundamental concept. You know, it seems like art and language are intimately connected and that visual arts, at least they convey ideas, they beam ideas across from the wall into your brain in the same way that I’m beaming ideas to you, and vice versa, through our speech. And so the hope is that these can kind of be modeled in similar ways and they have similar structures. It will be interesting to see, you know, if there is a way to kind of create art without just mimicking human beings, like, does it require multiple agents, does it require just one? My guess is that it would require multiple and that this is kind of how language has to arise and I would assume that it’s just as complicated as language, at the very least.

Host: Well, that’s a good segue into the next question. I can’t let any podcast go by without asking what could possibly go wrong? And I always ask, what keeps you up at night. When we start talking about generating art from art that already exists, is there going to be any problems with copyright or plagiarism or even, is it valuable if a machine made it in the same way it’s valuable if a person made it?

Mark Hamilton: Yeah, and I think those are really good questions that we need to tackle in that, you know, you don’t want the things that you use to study the beauty of human creativity to cheapen the beauty of creativity. Personally, I feel that art isn’t just defined by the pixels, and that art has a long creation story and makes you think, and makes you think about the creator and how it was created. Which I think is something that these algorithms will really never have in quite the same way as a human being does. There are problems in generating art, like cheapening the human experience, and also plagiarism, but I think that also, what’s probably more problematic, is adversarial networks when they’re employed by really bad actors. There’s been a huge amount of controversy around, like, deep fakes and these kinds of really just sinister applications of adversarial networks. And it’s very difficult to square this with pursuing research in machine learning because it seems like, you know, once you create something, that you can’t put the genie back in the bottle. You know, the humanities and the social sciences have the Institutional Review Board. I mean, I don’t see why AI shouldn’t have a similar thing where people kind of think about what they’re doing before they make it. You know, it might make growth a little bit slower, but it makes it a little bit stabler, and I’d much rather have a little bit slower, stabler growth than very, very, volatile, potentially fast-paced growth.

(music plays)

Host: All right, it’s story time. Mark you’re only twenty-five so your story’s not a long one yet, but I think you already have a few plot twists. Tell us your background and how you ended up working where you’re working and going where you’re going.

Mark Hamilton: Yeah, so my research life has definitely taken a lot of twists and turns in that, when I originally started research, I was working in a photonics lab, and then I went to particle physics at the LHC, and then astronomy, and then meta-mathematics, kind of trying to chase the foundation of life, and then finally to machine learning. But I think that, along the way, you pick up a lot of different exciting viewpoints, you hear about random other whole branches of thought, and you also get to interact with really powerful mentors in that I think that what’s helped me throughout my life is the help of people like my sister and my teachers, and my current manager who really help form your research directions, your opinions and how you want to conduct your research and work with people and collaborate.

Host: Come back to your sister. Because it’s the first time anyone’s cited their sister as a, you know, inspiration…

Mark Hamilton: Yeah, my sister’s a really big inspiration for me. I think that, you know, my sister, from the very get-go in high school was like, no, Mark, you’re taking science research. I don’t care how nerdy you think it is! I don’t care how many friends you’re going to lose! You’ve got to take it.

Host: OK, who’s your sister, what’s she do?

Mark Hamilton: Yeah, my sister, she’s eight years older than me. She’s just opened her lab in Arizona State University studying Legionnaire’s disease…

Host: What’s her name?

Mark Hamilton: It’s Carrie Hamilton. She’s actually, MSR just, uh, we had a joint project where we looked at figuring out ways to measure Legionnaire’s – Legionnaire’s is a certain water-borne pathogen – and figure out how we can take the Legionnaire’s data and construct limits. So that the government can say, hey, we don’t want to see any more than this amount of Legionnaire’s bacteria flowing out of your tap if you want people to be safe. And so…

Host: Are there other Hamiltons we ought to know about??

Mark Hamilton: Those are the two that are in the research sphere, me and Carrie.

Host: Okay, well that’s the ones we want to know about! All right, well, where’d you go to school?

Mark Hamilton: Yeah, so I went to Yale in New Haven and studied math and physics there so, started out my work there at the Large Hadron Collider and kind of looking for certain alternatives to supersymmetry, namely the vector light quark. One of the… the most influential advisors in my life was Meg Urry, and she worked in active galactic nuclei. And what we aimed to do there was to look at, kind of, the farthest black holes, namely these things called active galactic nuclei, and figure out their distance from a very small number of measurements. And then, from there, I kind of moved into more abstract mathematics that took me into Germany, into this group that created a system of meta-mathematics, so it’s math that creates math, and the goal is to, kind of, create things like theorem provers up in this very abstract, meta-mathematical space and then apply them to all the languages that come out of this meta-mathematical language. And so that was a fun little foray into an intense realm of mathematics.

Host: Where, in Germany, is this math haven?

Mark Hamilton: It’s in Jacobs University in Bremen, Germany.

Host: OK. All right, so then you came back here, obviously.

Mark Hamilton: Yeah, yeah.

Host: And landed at Microsoft?

Mark Hamilton: Yeah, yeah… came to Microsoft and was really fortunate to land in Sudarshan Raghunathan’s group in that he also has, like, a deep love for mathematics and linear algebra and things, and really has been my guide through Microsoft ever since. And kind of followed him very closely as he’s really was one of the foundational people who built Microsoft Machine Learning for Apache Spark and continues to support it and support all of its related efforts.

Host: Well, my new favorite question is, what’s one interesting thing about you that people might not know, and how has it influenced your career path?

Mark Hamilton: I guess there’s a few things that I do outside of just thinking about math and one of the things that I’ve really come to love is cooking these days. You know, I think that cooking is one of the best ways to take your hard elbow grease and turn into something that’s incredibly satisfying without having to wait very long. You know, it’s not like coding where you put in a lot of hours and then you change the color of pixels! You really actually get to enjoy the things that you create.

Host: As we close, since you’re on the kind of front end of your life in research, if that’s where you land, I’m curious about your vision for the future horizon in AI. What parting thoughts would you leave with our listeners in terms of maybe what you’re most looking forward to as you move ahead in the field?

Mark Hamilton: Yeah, I think that one of the things I’m most looking forward to is just getting a large diversity of ideas and, you know, meeting more of the research community and seeing what kind of things are keeping them up at night and seeing, kind of, how they relate. I’m excited to start to uncover more of the beautiful structure underlying intelligence and I think that it’s surprising and it’s complex. And also, one of the things that I hope to do in the PhD is really explore how the math established in the pure math community and abstract algebra and things like this can help influence some of the work we’re doing in machine learning, in that it kind of feels like we are really doing a lot of algebra these days in machine learning and that we’re thinking about the structures of different things and how these structures relate. And so I’d love to understand a little bit more about how these different kinds of mathematical tools can create systems capable of understanding the world to a much better and more interpretable degree.

Host: Real quick before we go, do you have any experience having gone to the areas where snow leopards live or is this just all abstract for you?

Mark Hamilton: Yeah, I mean one thing that was really exciting was that we actually got to go to Kyrgyzstan and meet up with the Snow Leopard Trust and take a Honda CRV deep into the mountains where we got several flat tires and actually, like, see some of these camera traps. And that we went to what’s called the Shamsi River Valley, and it was really nice in that it kind of made this dataset feel a lot more real in that we were actually able to like see one of the cameras, and also just see how remote and difficult it is to actually get out there. And what was particularly exciting about this valley is that, recently, when we had first started doing this work, they only had one camera there, and they found a snow leopard and it was kind of the first documented evidence of finding a leopard in this particular valley. And so now they really want to scale out their efforts to actually understand, is this a single leopard passing through because they have incredibly large range or is this, um, indicative of a more established leopard population that needs to be properly preserved?

Host: Right.

Mark Hamilton: And so they’re really starting to build out an infrastructure around this river valley and it was really exciting to see them do this. And actually getting to meet Koustubh, kind of the head of the Snow Leopard Trust there, and getting to meet his whole team and that we’re just not the only people doing really exciting things around snow leopard technology. They have collaborations in Asia creating 3D snow leopard camera traps and all sorts of really exciting things, like now, invested in a series of drones that can do thermal imaging, and kind of sweep through the entire ecosystem and get understanding of biomass and things like this.

Host: Well, I’m thrilled that you were able to join us in person, so I didn’t have to use a camera trap to see your face. Mark Hamilton, thanks for joining us on the podcast today.

Mark Hamilton: Thank you so much…

(music plays)

To learn more about how researchers are using machine learning for social good, visit Microsoft.com/research