Research Focus: Week of January 8, 2024

已发布

作者 , Senior Researcher , Partner Applied Science Manager , Principal Researcher , Senior Principal Research Manager , Senior Researcher , Principal Researcher , Partner Research Manager (全球研究合伙人)

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus - week of January 8, 2024

Mixture-of-Linear-Experts for Long-term Time Series Forecasting 

This research was accepted by the 2024 International Conference on Artificial Intelligence and Statistics (AISTATS).

Long-term time series forecasting (LTSF), which aims to predict future values of a series given past values, is an important problem in the machine learning community. It’s useful in areas like weather modeling, traffic flow prediction, and financial forecasting.  

The current state of the art on LTSF is attained in some cases by linear-centric models. However, real-world time series are usually nonstationary. For example, traffic patterns change on different days of the week. The inherent simplicity of linear-centric models makes them unable to capture these patterns. In a recent paper: Mixture-of-Linear-Experts for Long-term Time Series Forecasting, researchers from Microsoft and external colleagues propose Mixture-of-Linear-Experts (MoLE) to address this problem. Instead of training a single model, MoLE trains multiple linear-centric models (i.e., experts) and a router model that weighs and mixes their outputs. While the entire framework is trained end-to-end, each expert learns to specialize in a specific temporal pattern, and the router model learns to compose the experts adaptively. Experiments show that MoLE significantly reduces forecasting error of linear-centric models, and MoLE outperforms state-of-the-art transformer-based approaches in 68% of settings.


A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

End-to-end (E2E) models are the dominant model structure in automatic speech recognition (ASR) and speech translation (ST). This has led to efforts to develop a unified E2E model for multilingual ASR and multilingual ST tasks. Streaming ASR and ST tasks have extensively utilized neural transducers in the past.  

In a recent paper: A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability, researchers from Microsoft present a streaming multilingual speech model – $SM^2$ – which employs a single neural transducer model for transcribing or translating multiple languages into target languages. $SM^2$ is trained using weakly supervised data created by converting speech recognition transcriptions with a machine translation model. Leveraging 351,000 hours of speech training data from 25 languages, $SM^2$ achieves impressive ST performance. Notably, no human-labeled ST data was employed during training. It was purely weakly supervised ST data generated by converting 351,000 hours of anonymized ASR data from 25 languages using text based machine translation service. 

The researchers also demonstrate the truly zero-shot capability of $SM^2$ when expanding to new target languages, generating high-quality zero-shot ST translation for \{source-speech, target-text\} pairs that were not seen during training.

Microsoft research podcast

What’s Your Story: Lex Story

Model maker and fabricator Lex Story helps bring research to life through prototyping. He discusses his take on failure; the encouragement and advice that has supported his pursuit of art and science; and the sabbatical that might inspire his next career move.

KBFormer: A Diffusion Model for Structured Entity Completion

Deep generative models include large language models (LLMs) for text, plus models for other modalities, such as for vision and audio. In a recent paper: KBFormer: A Diffusion Model for Structured Entity Completion, researchers from Microsoft and external colleagues explore generative modeling of structured entities with heterogeneous properties, such as numerical, categorical, string, and composite. This includes entries in rich knowledge bases (KBs), items in product catalogs or scientific catalogs, and ontologies like the periodic table of elements and the various properties of isotopes. 

Their approach handles such heterogeneous data through a mixed continuous-discrete diffusion process over the properties, using a flexible framework that can model entities with arbitrary hierarchical properties. Using this approach, the researchers obtain state-of-the-art performance on a majority of cases across 15 datasets. In addition, experiments with a device KB and a nuclear physics dataset demonstrate the model’s ability to learn representations useful for entity completion in diverse settings. This has many downstream use cases, including modeling numerical properties with high accuracy – critical for science applications, which also benefit from the model’s inherent probabilistic nature.


A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers

People are increasingly interacting with, and being affected by, the deployment of AI systems in the workplace. This is a pressing matter for system designers, policy-makers, and workers themselves, which researchers from Microsoft address in a recent paper: A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers.  

Organizations generate huge amounts of information that raise challenges associated with the maintenance, dissemination, and discovery of organizational knowledge. Recent developments in AI, notably large language models (LLMs), present a shift in what is possible in this domain. Recent advances could enable more extensive mining, knowledge synthesis, and natural language interaction in relation to knowledge.  

The researchers propose the Consequence-Mechanism-Risk Framework to identify risks to workers associated with deploying AI-mediated enterprise knowledge access systems. The goal is to support those involved in the design and/or deployment of such systems to identify the risks they introduce, the specific system mechanisms that introduce those risks, and the actionable levers to reduce those risks.


Large Search Model: Redefining Search Stack in the Era of LLMs

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In a recent paper: Large Search Model: Redefining Search Stack in the Era of LLMs, researchers from Microsoft introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simplifying the cumbersome search stack. To substantiate the feasibility of this framework, the researchers present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.

相关论文与出版物

继续阅读

查看所有博客文章