Our open source commitment: The proof is in the projects

Published

By Miran Lee, Principal Research Program Manager & Winnie Cui, Senior Research Program Manager, Microsoft Research Asia

Openness allows innovation to evolve in unforeseen, novel and exciting ways, and sometimes even provides solutions that no one ever imagined were possible.

Getting more done with crowdsourcing

One such innovation is GeoMission (opens in new tab) (geo-location-based mission), a crowdsourcing platform developed by MSRA and a team of researchers from the Hong Kong University of Science and Technology (HKUST). GeoMission lets users share and accept tasks based on where they are located.

Spotlight: blog post

GraphRAG auto-tuning provides rapid adaptation to new domains

GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.

Users submit location-based requests via GeoMission apps, which then push questions to other users near the target location (as long as they meet any additional criteria in the request.)

The project owner Professor Lei Chen from HKUST is introducing GeoMission to audience

The project owner Professor Lei Chen from HKUST is introducing GeoMission to audience

Developed for IOS and Android clients, the GeoMission server platform allows users to initiate requests by audio, video, photo or plain old texting.

All of GeoMission’s source code (opens in new tab) is hosted on GitHub, providing some critical benefits for a research-based project — like more people! Researchers can intricately study how users interact with the platform, and users can directly contribute to help make it better. Of course, making it open source extends the tools to the greatest possible number of spatial crowdsourcing researchers. Most importantly, we believe opening the source code helps us innovate faster and provide more ways to collaborate with other developers or just about anyone else who’s interested in the project. You can find more details about project at HKUST’s website (opens in new tab).

Improving datacenter efficiency with Vortex

In the same spirit of openness, we’ve worked with Professor Byung-Gon Chun from Seoul National University (SNU) to develop Vortex (opens in new tab) in an effort to address the problem of wasted resources at datacenters. Tapping these sometimes vast computing resources — that remain largely unused outside of peak usage — represents a huge opportunity to improve datacenter efficiency and save energy.

Although current resource managers like Google’s Borg system and Apache Mesos attempt to reclaim idle resources for other tasks, they largely fall short when reclaimed resources are inevitably preempted by latency critical tasks. The more aggressively the resources are reclaimed, the more frequently they’re preempted due to conflict, resulting in transient resources.  The upshot of all this is that current data processing systems that rely on transient resources cannot efficiently complete jobs.

Vortex, on the other hand, maintains high performance despite frequent preemptions. Developed by SNU grad students, Yunseong Lee and Youngseok Yang during their internship at MSRA, the pair are continuing to work on Vortex after returning to school. Joining the project is SNU undergraduate student Geon-Woo Kim along with contributors from other institutions and Microsoft.

Vortex team in SNU (from left to right); Geon-Woo Kim, Youngseok Yang, Byung-Gon Chun, and Yunseong Lee

Vortex team in SNU (from left to right); Geon-Woo Kim, Youngseok Yang, Byung-Gon Chun, and Yunseong Lee

Experimental evaluations have been conducted on Microsoft Azure to measure the Vortex system’s effectiveness. The results show that Vortex can scale out much better with frequently preempted transient resources than Apache Spark. In certain cases, Apache Spark failed to complete jobs.

Hosted on GitHub (opens in new tab), Vortex has been developed as an application of Apache REEF — an open source library for big data applications — in what has since proved to be a mutually beneficial project.  Vortex is succeeding in leveraging the Apache methods of growing open source projects: Development issues were openly discussed and pull requests were thoroughly reviewed. Meanwhile, the Apache REEF community was able to closely observe how Vortex uses Apache REEF as well as learn about the overall Vortex requirements.

Vortex

Vortex and GeoMission — as well as other projects like them — clearly have the potential to succeed in the marketplace. However, we believe that releasing them as open source opens the way to greater long term value for the global community of researchers and developers whose collaborative efforts can sometimes trigger unimaginable breakthroughs. At Microsoft Research Asia, we see a future that includes many more opportunities to collaborate with the open source community — to the benefit of all.

Learn more

Continue reading

See all blog posts