Research Focus: Week of June 5, 2023

Published June 7, 2023

Share this page

Microsoft Research Focus 17 | Week of June 5, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

PODCAST

The GPT-x Revolution in Medicine, with Peter Lee

Microsoft Research’s Peter Lee recently sat down to discuss the impact of GPT-4 and large language models in medicine on physician-scientist Eric Topol’s Ground Truths podcast (opens in new tab). Drawing from Lee’s recent book, The AI Revolution in Medicine (opens in new tab), the conversation includes his early experimentation with GPT-4 and his views of its potential as well as its weaknesses.

For example:

GPT-4 excels at evaluating and reviewing content, insightfully spotting inconsistencies and missing citations, and perceiving a lack of inclusivity and diversity in terminology

GPT-4 can help reduce medical errors and coach physicians to consider different diagnoses and show greater empathy to patients
GPT-4 has the potential to empower patients with new tools and to democratize access to expert medical information
AI needs appropriate regulation, particularly in the field of medicine

Explore the podcast

NEW RESEARCH

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. Inference risks range from membership inference to data reconstruction attacks. Inspired by the success of games in cryptography to study security properties, some authors describe privacy inference risks in machine learning using a similar game-based formalism. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the next, which makes it hard to relate and compose results.

In a new research paper, SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning (opens in new tab), researchers from Microsoft present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. In the paper, which was presented at the 2023 IEEE Symposium on Security and Privacy (opens in new tab), the authors use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) uncover hitherto unknown relations that would have been difficult to spot otherwise.

Read the paper

NEW RESEARCH

Analyzing Leakage of Personally Identifiable Information in Language Models

Language models (LMs) are widely deployed for performing several different downstream tasks. However, they have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking personally identifiable information (PII) has received less attention. Dataset curation techniques such as scrubbing reduce, but do not prevent, the risk of PII leakage—in practice, scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to what extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure.

In a new research paper, Analyzing Leakage of Personally Identifiable Information in Language Models, researchers from Microsoft introduce rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. In the paper, which was presented at the 2023 IEEE Symposium on Security and Privacy, they empirically evaluate the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mail.

Their findings show that differential privacy can largely, but not completely, mitigate PII leakage. Traditional data curation approaches such as PII scrubbing are still necessary to achieve sufficient protection. The authors advocate for the design of less aggressive PII scrubbing techniques that account for the protection afforded by DP and achieve a better privacy/utility trade-off.

Read the paper

Download the code

NEW RESEARCH

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Large Language Models (LLMs) have shown impressive performance as general-purpose agents, but their abilities remain highly dependent on hand-written prompts, which require onerous trial-and-error work. Automatic or semiautomatic procedures would help people write the best prompts while reducing manual effort. In a recent research paper, Automatic Prompt Optimization with “Gradient Descent” and Beam Search, researchers from Microsoft propose a simple and nonparametric solution to this problem. Automatic Prompt Optimization (APO) is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language “gradients” that criticize the current prompt. The gradients are then “propagated” into the prompt by editing it in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that APO can outperform prior prompt editing techniques and improve an initial prompt’s performance by up to 31%, by using data to rewrite vague task descriptions into more precise annotation instructions.

Read the paper

Continue reading

White line icons on a blue and green gradient background

October 16, 2023

Research Areas

Research Groups

Related projects

Confidential AI

Related labs

Microsoft Research Lab - Cambridge

Microsoft Research Blog

Research Focus: Week of June 5, 2023

PODCAST

The GPT-x Revolution in Medicine, with Peter Lee

NEW RESEARCH

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

NEW RESEARCH

Analyzing Leakage of Personally Identifiable Information in Language Models

NEW RESEARCH

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Related publications

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers

Analyzing Leakage of Personally Identifiable Information in Language Models

Continue reading

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Research Focus: Week of July 17, 2023

Research Focus: Week of March 6, 2023

Research Focus: Week of February 6, 2023

Research Areas

Research Groups

Related projects

Related labs

Microsoft Research Blog

PODCAST

The GPT-x Revolution in Medicine, with Peter Lee

Microsoft Research Forum

NEW RESEARCH

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

NEW RESEARCH

Analyzing Leakage of Personally Identifiable Information in Language Models

NEW RESEARCH

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Related publications

Continue reading

Research Areas

Research Groups

Related projects

Related labs