Measuring Rhetoric: Statistical Language Models in Social Science

Social scientists are embracing the idea of using `text as data’ as a way to quantify and evaluate social theories. I’ll discuss a brief history of how this strategy has worked and evolved, and pitch some new approaches for combining social measurement with state-of-the-art natural language processing. We’ll focus on the massive multinomial regression models that serve as a basis for text analysis and the distributed computing strategies that allow inference on truly Big Data. I’ll then work through a number of examples of social science questions being asked and answered via statistical NLP, with data from online reviews on Yelp, the US congressional record, and communications between buyers and sellers on eBay.

发言人详细信息

Matt Taddy is Associate Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. His research is focused on statistical methodology and data mining, driven by applications in business and engineering. He developed and teaches the MBA ‘Big Data’ course at Chicago Booth.

Taddy works on building robust solutions for large scale data analysis problems, at the interface of econometrics and machine learning. This involves dimension reduction techniques for massive datasets and development of models for inference on the output of these algorithms. He has collaborated both with small start-ups and with large research agencies, including NASA Ames, and Lawrence Livermore, Sandia, and Los Alamos National Laboratories, and is a scientist at eBay research labs.

Taddy earned his PhD in Applied Math and Statistics in 2008 from the University of California, Santa Cruz, as well as a BA in Philosophy and Mathematics and an MSc in Mathematical Statistics from McGill University. He joined the Chicago Booth faculty in 2008.

日期:
演讲者:
Matt Taddy
所属机构:
University of Chicago