Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport

Machine learning research has traditionally been model-centric, focusing on architectures, parameter optimization,  and model transfer. Much less attention has been given to the datasets on which these models are trained, which are often assumed to be fixed, or subject to extrinsic and inevitable change. However, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing and manipulation, such as augmenting, merging, mixing, or reducing datasets.

In this talk I will present some of our recent work that seeks to formalize and automatize these and other flavors of dataset manipulation under a unified approach. First, I will introduce the Optimal Transport Dataset Distance, which provides a fundamental theoretical building block: a formal notion of similarity between labeled datasets. In the second part of the talk, I will discuss how this notion of distance can be used to formulate a general framework of dataset optimization by means of gradient flows in probability space. I will end by presenting various exciting potential applications of this dataset optimization framework.

Learn more about the 2020-2021 Directions in ML: AutoML and Automating Algorithms virtual speaker series: https://aka.ms/diml (opens in new tab)

Date:: November 18, 2020
Haut-parleurs:: David Alvarez-Melis
Affiliation:: Microsoft Research New England

- David Alvarez-Melis
  
  Senior Researcher
Domaine de recherche
- Artificial intelligence
Laboratoire de recherche
- Microsoft Research Lab - New England
Projet
- AutoML
Événement
- Directions in ML: AutoML and Automating Algorithms

Directions in ML: Automating Dataset Comparison and Manipulation with Optimal Transport

Intervenants

David Alvarez-Melis

Fichiers connexes

Domaine de recherche

Laboratoire de recherche

Projet

Événement