Big Data and Bayesian Nonparametrics

Big Data is often characterized by large sample sizes, high dimensions, and strange variable distributions. For example, an e-commerce website has 10-100s million observations weekly on a huge number of variables with density spikes at zero and elsewhere and very fat tails. These properties – big and strange – beg for nonparametric analysis. We revisit a flavor of distribution-free Bayesian nonparametrics that approximates the data generating process (DGP) with a multinomial sampling model. This model then serves as the basis for analysis of statistics – functionals of the DGP – that are useful for decision making regardless of the true DGP. The ideas will be illustrated in the indexing of treatment effect heterogeneity onto user characteristics in digital experiments, and in analysis of decision trees employed in fraud prediction. The result is a framework for scalable nonparametric Bayesian decision making on massive data.

发言人详细信息

Matt Taddy is Associate Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. His research is focused on statistical methodology and data mining, driven by applications in business and engineering. He developed and teaches the MBA ‘Big Data’ course at Chicago Booth. Taddy works on building robust solutions for large scale data analysis problems, at the interface of econometrics and machine learning. This involves dimension reduction techniques for massive datasets and development of models for inference on the output of these algorithms. He has collaborated both with small start-ups and with large research agencies, including NASA Ames, and Lawrence Livermore, Sandia, and Los Alamos National Laboratories, and is a research fellow at eBay. Taddy earned his PhD in Applied Math and Statistics in 2008 from the University of California, Santa Cruz, as well as a BA in Philosophy and Mathematics and an MSc in Mathematical Statistics from McGill University. He joined the Chicago Booth faculty in 2008.

日期:
演讲者:
Matt Taddy
所属机构:
University of Chicago