Machine learning, data mining and rethinking knowledge at KDD 2018

Publié septembre 6, 2018

Partagez cette page

A group of Microsoft employees attending KDD 2018.

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. This year’s KDD was the largest ever, with more than 3400 participants from 99 countries and 1588 submissions and included a strong showing by Microsoft.

In addition to an astonishing program featuring peer-reviewed papers, workshops, hands-on tutorials, deep learning day and Health Day – an entire day dedicated to discussing machine learning trends and addressing challenges in healthcare, attendees were treated to outstanding keynote talks by Imperial College London Emeritus Professor of Mathematics David Hand, Nobel Laureate Alvin Roth, Columbia University’s Data Science Director Jeannette Wing and Oxford University Professor Yee Whye Teh. Professor Hand focused on data science for financial applications and on the importance of understanding the data in this domain. In remarks regarding the reliability of data, one quote in particular stood out, “If data can speak for themselves, they can also lie for themselves”. He identified two types of models: data-driven models that are based on relationships observed in data and come with a statistical theory; and theory-driven models that are based on an underlying theoretical model and can be used to understand the data once a fit to data is made using statistical ideas. He then presented some general lessons that touched upon the limitations of models and the importance of these limitations before they can be applied. Any algorithm will produce a number if data is thrown at it. Therefore, purely data-based approaches are fragile. Many examples in the financial world suffer from non-stationarity and therefore many algorithms are not suitable for these use cases. Thought provoking stuff.

Professor Chris Re of Stanford University talked about Software 2.0 and the Snorkel project. The manual process to create labeled training data is expensive and slow in real-life applications and requires domain expertise. The Snorkel project aims to rapidly create, model and manage large training sets which is essential for the success of machine learning models. This project takes noisy labeling functions from users and automatically models the process by learning in which labeling functions are more accurate.

Microsoft had a strong and dynamic presence at the conference with multiple oral and poster presentations, tutorials and workshops. Joseph Sirosh, Corporate Vice President and CTO for AI gave a well-attended invited talk titled, “Planet-scale Land Cover Classification with FPGAs” in which he demonstrated the power of Azure Machine Learning and Project Brainwave in classification of terabytes of land cover aerial images using DNNs and tackling use cases such as wildlife poacher recognition.

For the full list of Microsoft’s contributions at KDD 2018, check out https://www.microsoft.com/en-us/research/event/kdd-2018/ and be sure to watch some of the videos if you were unable to attend!