Return to Microsoft Research Lab – Redmond

LEAP – Language, Learning, Audio, Privacy

Télécharger

Orca-2-13B

janvier 2024

Orca 2 is a finetuned version of LLAMA-2. It is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model…

Download

Orca-2-7B

janvier 2024

Download

dp-transformers repository

août 2022

Motivated by our recent work, we are releasing a repository for training transformer models with differential privacy. Our repository is based on integrating the Opacus library to the Hugging Face platform. We aim to serve the privacy-preserving ML community in utilizing the state-of-the-art models while…

Github

KID: Knowledge Infused Decoding

mars 2022

Knowledge Infused Decoding (KID) is a decoding algorithm that infuses knowledge (from Wikipedia) into each step decoding of text generation.

Github

LiST (Lite Self-Training)

octobre 2021

We present a new method LiST for efficient fine-tuning of large pre-trained language models (PLMs) in few-shot learning settings. LiST significantly improves over recent methods that adopt prompt fine-tuning using two key techniques. The first one is the use of…

Github

Meta Self-training for Few-shot Neural Sequence Labeling [Code]

octobre 2021

This is the implementation of the paper Meta Self-training for Few-shot Neural Sequence Labeling. MetaST is short for meta-learning for self-training.

Github

Baselines for Multilingual Reply Suggestion (MRS)

août 2021

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning…

Github

Meta Representation Transformation for Low-resource Cross-Lingual Learning [Code]

mai 2021

This is a source code release for a published research at NAACL 2021. Paper Title: MetaXL: Meta Representation Transformation for Low-resource Cross-Lingual Learning Paper Abstract: The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most…

Github

Self-training with Weak Supervision [Code]

avril 2021

State-of-the-art deep neural networks require large-scale labeled training data that is often either expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to…

Github