MonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models
- Zhaoyang Yu ,
- Minghua Ma ,
- Chaoyun Zhang ,
- Si Qin ,
- Yu Kang ,
- Chetan Bansal ,
- Saravan Rajmohan ,
- Yingnong Dang ,
- Changhua Pei ,
- Dan Pei ,
- Qingwei Lin 林庆维 ,
- Dongmei Zhang
Foundations of Software Engineering (FSE) |
Organized by ACM
In large-scale cloud service systems, monitoring metric data and conducting anomaly detection is an important way to maintain reliability and stability. However, great disparity exists between academic approaches and industrial practice to anomaly detection. Industry predominantly uses simple, efficient methods due to better interpretability and ease of implementation. In contrast, academically favor deep-learning methods, despite their advanced capabilities, face practical challenges in real-world applications. To address these challenges, this paper introduces MonitorAssistant, an end-to-end practical anomaly detection system via Large Language Models. MonitorAssistant automates model configuration recommendation achieving knowledge inheritance and alarm interpretation with guidance-oriented anomaly reports, facilitating a more intuitive engineer-system interaction through natural language. By deploying MonitorAssistant in Microsoft’s cloud service system, we validate its efficacy and practicality, marking a significant advancement in the field of practical anomaly detection for large-scale cloud services.