In the age of big data, the challenge is no longer how to collect or store vast quantities of data—it’s how to make sense of it and use it for practical benefit. Scientific researchers, governmental agencies, nonprofits, and businesses of all sizes are among those who struggle to understand the data they have collected instead of just getting buried by it.
Technology companies are no exception. At Microsoft, the quest to incorporate data analysis into the process of improving software and services has been intensifying in recent years, especially as the company has accumulated ever-larger amounts of data during the software-development process. Within the company, the data-analysis vanguard includes a team within Microsoft Research Asia (opens in new tab) called the Software Analytics Group (opens in new tab).
Based in that Beijing lab and led by Dongmei Zhang, the group, 12 people strong, is dedicated, Zhang says, to “utilizing the data-driven approach to help create high-quality, user-friendly, efficiently developed and operated software and services.”
on-demand event
In addition to analyzing key elements of the software-development process, such as source code, code check-in history, and software bugs, the team has extended its research into the areas of software quality and user experience by studying program logs, usage data, and other data collected during the development process.
The challenges have increased, Zhang says, as software has transitioned from shrink-wrapped products to “software as a service” and as users increasingly have become creators and sharers of content through social networks, blogs, and other online media. Those trends have expanded significantly the scale and complexity of software development, even as release cycles have become shorter—days instead of months or years—and testing increasingly has moved from development sites into the real world.
“With all the changes happening in the software domain,” Zhang says, “data is playing an increasingly important role.”
She notes that software analytics has become a popular research focus in academia, as well as in industry. Her group is interested not only in advancing the state of the art in software-analytics research, but also in working closely with product teams to gain practical benefit from the research results.
One of the Software Analytics Group’s recent innovations is targeted at a practice commonly used by developers: copy and paste. Reusing pieces of code is easy and helps speed productivity, but it also means that bugs can be propagated inadvertently. Too many “code clones”—identical or similar pieces of code—can make a code base bloated and difficult to maintain.
“Our thinking,” Zhang says, “was that because copy and paste is a common developer practice, if there’s something we could do about it, the potential impact would be huge.”
The team’s efforts, led by Yingnong Dang (opens in new tab), resulted in a service called Code Clone Search, Microsoft’s first internal tool to help the company’s developers find code clones. You simply enter a code snippet, and the tool finds all other snippets that are identical or syntactically similar.
The Microsoft Security Response Center (opens in new tab) (MSRC), which identifies, monitors, resolves, and responds to security incidents and Microsoft software-security vulnerabilities, has come to rely heavily on the service. Since 2009, the Software Analytics Group has worked with the MSRC to index about 600 million lines of source code across multiple Microsoft-product code bases. Now, whenever MSRC security engineers become aware of a piece of vulnerable code in one product, they use Code Clone Search to check for the security bug in many Microsoft products. In the past, the best they could do was to check the various code bases manually to look for related vulnerabilities.
An example of the tool’s value dates to early 2012, when the MSRC used Code Clone Search to scan a range of Microsoft products for a vulnerability that, when exploited by malware, could enable execution of arbitrary code when a user opened a malicious document. Although the MSRC had addressed the original vulnerability with a security update, the center wanted to search for other instances across the Microsoft code bases. Code Clone Search found the vulnerable code elsewhere, leading to a quick, comprehensive response.
Intensive Collaboration
Through an intensive collaboration with the Visual Studio (opens in new tab) team at Microsoft and the Innovation Engineering (opens in new tab) group at Microsoft Research Asia, Code Clone Search was incorporated into Visual Studio 2012 (opens in new tab), making its benefits available to all Windows (opens in new tab) developers. Cameron Skinner, former general manager of Visual Studio Ultimate (opens in new tab), called the collaboration with Zhang’s team “an absolute model of how a Microsoft Research team and a product unit should work.”
Zhang sees other valuable uses for the underlying technology—such as to summarize code changes when new features are added to existing software or bugs are fixed. This information can be used as input to predict the risk level presented by the changes.
In another recent product-team collaboration, Zhang’s group has helped the Office (opens in new tab) team maintain the performance of Office 365, the latest version of the productivity software suite and the first incarnation of Office as a subscription-based online service. Such a service must pursue 24/7 reliability, and when an incident happens, the service must recover quickly.
“Nowadays, we’re talking about data centers where hundreds and thousands of servers are located,” Zhang says. “A huge amount of service-monitoring data is generated every minute, including performance counters, system events, logs created by different service components, and user requests. The diagnosis of service-performance issues heavily depends on such data.”
The Software Analytics Group, with Jian-Guang Lou (opens in new tab) leading the project, went to work on improving Office 365’s “time to recovery”—the elapsed time from when an incident occurs to when the service is completely restored.
“We collaborated very closely with Office teams,” Zhang says. “Based on the service-monitoring data we got, we researched and developed a set of technologies that helped them analyze the data. These technologies include anomaly detection, correlation analysis, pattern mining, and case-based reasoning.”
The result is a system called Services Analysis Studio (SAS), which can determine which server is malfunctioning and identify which log messages, out of tens of thousands of lines of messages, to examine for the underlying performance issue.
SAS has been deployed at all of the data centers hosting SharePoint Online (opens in new tab), a collaboration service that comes with Office 365. The result shows that, in the first six months of use, SAS has helped solve 76 percent of critical incidents. The use of the technologies behind SAS could expand to other Office 365 services in the future.
Zhang is looking to expand her team as opportunities for its input into Microsoft products and services increase. One of the strengths of her group is its impressive diversity of skills and specialties, which include machine learning, information visualization, and system building. That range, she says, makes it possible for the team not only to conduct interdisciplinary research, but also take a great research concept and translate it into practical, deployable technologies, often in collaboration with product teams.
“Microsoft is a great place for us to work on software analytics, because it’s the largest software company in the world, and you can imagine the wealth of data that Microsoft has,” Zhang says. “We are fortunate in the sense that we realized the importance of data in the software domain very early on. We have already done some very good work in this area, and we’d like to continue the momentum.”