The Distributed Social Analytics Platform (DSoAP) project is focused on the “Huge Data” problem in social policy research caused by the breadth of data involved. Using aggregate social media data to investigate and validate social issues (such as employment, health and fiscal policy) requires analyzing many months or years of data. DSoAP is applying intelligent compaction, pre-indexing and distribution of data across a server cluster to achieve responsive query times for online data exploration.
Twitter is much more than just cat pictures and what people eat for lunch! – it is a treasure trove of data about people’s life events, experiences, and opinions.
Recent research has started to look at how to use broader aggregate data to investigate and validate social issues such as employment, health and fiscal policy. A defining characteristic of this type of social policy research is the timeline and breadth of data involved. While most tweet analysis concentrates on a short sliding time window of the order of hours or days, extracting meaningful social policy trends typically involves looking at many months or even years of data.
With ~500 million new tweets (~2-3TB) been added to the Twitter data corpus daily, creating systems that can efficiently handle that massive volume of data is a challenging task. In the dsoap project, we are working on solutions for this “huge data” problem by applying intelligent compaction, pre-indexing and distribution of data across a cluster of machines to achieve reasonable query times for online data exploration.
人员
Lidong Zhou
Corporate Vice President, Chief Scientist of Microsoft Asia Pacific R&D Group, Managing Director of Microsoft Research Asia
Emre Kiciman
Senior Principal Research Manager
Scott Counts
Senior Principal Research Manager