Collaborating to improve efficiency of the cloud in the increasingly connected world of big data

已发布

Posted by George Thomas Jr.

 (opens in new tab)The cloud is getting crowded.

As more and more devices connect to the Internet and more and more data flows to and from the cloud, the networking fabric once deemed sufficient to handle such traffic quickly is getting stretched.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

In 2013 alone, the total Internet bandwidth crossing international borders was 100 terabytes per second, according to TeleGeography's recent Global Internet Geography report (opens in new tab).

And even as fiber network capacity increases (opens in new tab), so, too, will the volume of big data. The challenge of processing it quickly, securely and more cost-effectively remains.

To address such challenges, Microsoft researchers joined collaborators from multiple universities this week at the annual USENIX Symposium on Networked Systems Design and Implementation (opens in new tab). Their goal: To recommend solutions that push the architectural boundaries of network services.

"The efficient management and operation of networks and data centers is Microsoft's core strength and priority," said Victor Bahl (opens in new tab), a Microsoft distinguished scientist. "These papers represent the best in systems research, a product of close collaboration between Microsoft researchers, engineers and our colleagues in academia, anticipating and taking care of important issues well before they become problems."

Microsoft's many contributions to the conference include Geode (opens in new tab) and Retro (opens in new tab).

Geode (opens in new tab) aims to reduce the cost of wide area bandwidth on a global scale. It's a collaboration between the University of Illinois (opens in new tab), Microsoft researcher George Varghese (opens in new tab), Carlo Curino, a senior scientist in Microsoft's Cloud and Enterprise product division, and Thomas Jungblut, a software engineer with Skype (opens in new tab).

Geode specifically targets the problem of wide-area analytics in the context of bandwidth usage of SQL data distributed globally, which they call Wide-Area Big Data (WABD).

The expense of wide-area network bandwidth can drive applications to discard valuable data. It also can contribute to privacy concerns regarding raw data storage, depending on the laws or constraints governments may impose.

However, with Geode, the researchers have solved the WABD issue by:

  • Optimizing query execution plans and data replication to minimize bandwidth costs
  • Modifying query executions to potentially increase computation within individual data centers without worsening cross-data center bandwidth
  • Aggressively caching all intermediate results, thereby eliminating data transfer redundancy.

The Geode prototype, built on the popular Hive analytics framework, already has demonstrated significant improvements. The researchers say there's been a 250-fold reduction in data transfer compared to the centralized approach in a standard Microsoft production workload, and they've seen up to a 360 times improvement in a range of scenarios across several standard benchmarks, including TPC-CH and Berkeley Big Data.

See also: Mobility and networking research at Microsoft (opens in new tab)

Another project, Retro (opens in new tab), improves management of server resources for big data inside the data center. It's a collaboration between Microsoft researchers Peter Bodik (opens in new tab) and Madan Musuvathi (opens in new tab) and researchers from Brown University (opens in new tab).

Retro is a new framework that identifies what is causing bottlenecks in cloud systems, then optimizes cloud resources to make cloud operations more cost-effective. That also reduces latency to the customer.

Other papers accepted to NSDI'15

Beyond Sensing: Multi-GHz Realtime Spectrum Analytics (opens in new tab)
Microsoft contributors: Paramvir Bahl (opens in new tab)
SpecInsight is a system for acquiring a detailed view of 4 GHz of spectrum in realtime and uses a new scheduling algorithm that maximizes the probability of sensing active signals.

Explicit Path Control in Commodity Data Centers: Design and Applications (opens in new tab)
Microsoft contributors: Haitao Wu, Chuanxiong Guo (opens in new tab)
Introducing XPath, a method based on existing commodity switches to implement explicit path control this is readily deployable and scales to large data center networks.

Compiling Packet Programs to Reconfigurable Switches (opens in new tab)
Microsoft contributors: George Varghese (opens in new tab)
Exploring the design of a compiler for programmable switching chips and how to map logical lookup tables to physical tables while meeting data and control dependencies in the program.

FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs (opens in new tab)
Microsoft contributors: Ashley Flavel, Pradeepkumar Mani, David A. Maltz (opens in new tab), Nick Holt, Jie Liu (opens in new tab), Yingying Chen, Oleg Surmachev
By collocating DNS and proxy services in each node location, FastRoute's highperformance, completely distributed system for routing users to a nearby proxy solves control issues of common content delivery networks.

A General Approach to Network Configuration Analysis (opens in new tab)
Microsoft contributors: Meg Walraed-Sullivan, Ratul Mahajan
This new approach to detect network configuration errors combines the benefits of prior techniques and can find errors proactively, before the configuration is applied.

Analyzing Protocol Implementations for Interoperability (opens in new tab)
Microsoft contributors: Nupur Kothari, Ratul Mahajan
Introducing PIC, a tool that helps developers search for non-interoperabilities in protocol implementations. Already it has been shown to find multiple previously unknown noninteroperabilities in large and mature implementations of the SIP and SPDY (v2 through v3.1) protocols.

Checking Beliefs in Dynamic Networks (opens in new tab)
Microsoft contributors: Nuno P. Lopes, Nikolaj Bjørner (opens in new tab), Patrice Godefroid (opens in new tab), Karthick Jayaraman, George Varghese
Addressing the shortcomings of existing network verification tools, the Network Optimized Dialog tool (NoD) is scalable to large header spaces, allowing checking for beliefs about network reachability policies in dynamic networks.

CubicRing: Enabling One-Hop Failure Detection and Recovery for Distributed In-Memory Storage Systems (opens in new tab)
Microsoft contributors: Chuanxiong Guo, Haitao Wu, Yongqiang Xiong (opens in new tab)
CubicRing is a distributed structure for cube-based networks that exploits network proximity to restrict failure detection and recovery within the smallest possible one-hop range.

Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services (opens in new tab)
Microsoft contributors: Jacob R. Lorch (opens in new tab), Andrew Baumann (opens in new tab)
Tardigrade replicates the service on several machines so that it continues running even when some of them fail. Yet, it keeps the service states synchronized so clients see strongly consistent results.

Scalable Error Isolation for Distributed Systems (opens in new tab)
Microsoft contributors: Flavio P. Junqueira (opens in new tab)
Introducing SEI, an algorithm that tolerates Arbitrary State Corruption faults and prevents data corruption from propagating across a distributed system, significantly reducing undetected errors.