Research in Focus: Using ML to Troubleshoot and Improve Real Time Systems

As systems become increasingly complicated, cater to large geographical areas, have to seamlessly utilize an incredibly diverse array of computational resources and serve real-time, safety and mission-critical applications there is an emerging need for them to be self-aware or self-tuning in nature. Advances in machine learning and artificial intelligence have recently led to algorithms which can learn high-performance policies over extremely large state spaces (e.g. solving games like Ms. Pacman, Go, Poker or learn self-driving policies for autonomous cars, drones, etc). Just as the growth of cheap abundant computing and specialized systems (e.g. dedicated accelerators for deep learning) have led to rapid advances in machine learning and artificial intelligence, there is an emerging opportunity for machine learning to help systems back. In this session we want to explore the technical opportunities and unique challenges that surface when applying machine learning to optimize large scale distributed systems. Specifically, we want to explore challenges in developing systems which are self-tunable, resource-aware and use machine learning to dynamically optimize a running system to achieve desired latency, throughput and other system-dependent utility functions. Making significant progress in this area requires multiple disciplines coming together, namely: machine learning, decision-making, distributed systems and optimization.

日期:
演讲者:
Behnaz Arzani, Debadeepta Dey, Besmira Nushi
所属机构:
Microsoft Research

系列: Microsoft Research Faculty Summit