SPINE: A Scalable Log Parser with Feedback Guidance
- Xuheng Wang ,
- Xu Zhang ,
- Liqun Li ,
- Shilin He ,
- Hongyu Zhang ,
- Yudong Liu ,
- Lingling Zheng ,
- Yu Kang ,
- Qingwei Lin 林庆维 ,
- Yingnong Dang ,
- Saravan Rajmohan ,
- Dongmei Zhang
2022 Foundations of Software Engineering (FSE) |
SIGSOFT Distinguished Paper Award
下载 BibTexLog parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parsing model with initial grouping progressive clustering, we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.9 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of a world-leading cloud system, in which SPINE can parse 30 million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.