What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is a proud partner of WiDS; in addition to supporting the Datathon via the webinar, Microsoft also provided Xboxes as prizes.
One of the main drivers for engagement is the WiDS Datathon, now in its second year, that kicks off in the weeks preceding the conference, with the winners announced at Stanford during the conference. This year’s Datathon had participants working on a classic image classification problem using computer vision techniques. The challenge to be solved is an environmental one. Rampant deforestation caused by oil palm production (oil palm is a common ingredient across products in everyday use) has led to devastation of the eco habitats of many animal and plant species. One way to get ahead of the problem is to identify where the deforestation is taking place. These are remote regions and satellite imagery is an effective means of smart detection and intervention. Planet (opens in new tab) provided a set of hi-res satellite images and Figure8 (opens in new tab) helped annotate them and created a training, testing and holdout dataset for the Datathon. The Datathon has led to workshops (opens in new tab) in several countries with participants coming together to form teams to solve the challenge.
Spotlight: AI-POWERED EXPERIENCE
Datathon rules allow for teams of up to four people, with the requirement that at least half of each team be female or identify as female. Within weeks, the Datathon attracted over 200 teams. I took a shot at solving the problem using Microsoft Custom vision, one of the cognitive services available on Azure. Using the custom vision UI, I was able to build a classifier with a handful of training images within minutes. Extending the classifier to include hundreds of images was easy using the Python SDK for Custom vision. Such is the power of cognitive services in Azure; you can build a transfer learning-based powerful image classification algorithm with less than 100 lines of code. The model improved by simply continuing to add more images from the geo-images training dataset to the existing custom vision model, which was a simple and effective demonstration of the importance of increasing training data for higher model accuracy.
Training images count | Precision | Recall |
60 | 79.60% | 79.60% |
1,800 | 97.50% | 97.10% |
5,000 | 99.60% | 99.10% |
We hosted a WiDS webinar that covered basic machine learning concepts and a tutorial with the custom vision solution. The webinar recording (opens in new tab) and slides (opens in new tab) are available for those who missed it.
This democratization of machine learning tools is an important factor in opening up the field of data science to a wide audience of data science students and practitioners. The other factor, especially relevant to attracting women to data science, is the focus on socially relevant datasets and problems, such as this year’s oil palm classification problem.
Data science for social good is an important sub field within the data science community with efforts such as the annual Workshop on Social Impact (opens in new tab) at KDD (opens in new tab) and efforts such as the Data Science for Social Good Summer Fellowship (opens in new tab) started at University of Chicago and now offered by University of Washington (opens in new tab), University of British Columbia (opens in new tab) and other universities. The emphasis on leveraging data for altruistic goals is also evident in computer science departments across higher education that are currently pivoting to data science education. For example, the Data Science program offered at the University of California Berkeley (opens in new tab), based on real datasets, has been a great catalyst in getting women into computing in unprecedented numbers—half the enrolled students are women (opens in new tab), in contrast to traditional computer science courses. Greater numbers of women skilled in data science will help to fill the data gap that has created a pervasive but invisible bias with a profound effect (opens in new tab) on women’s lives.
More broadly than data science, AI has a burgeoning effort of socially relevant subfields that are applicable to a growing demographic of women technologists and students. These include topics such as eliminating bias in AI systems through fairness, accountability and transparency, secure machine learning, privacy, ethics, policy impacting and domain specific machine learning.
This year, the WiDS Datathon has resulted in regional Datathon workshops (opens in new tab) around the globe, for example, the WiDS Data Collaboration Day at UC Berkeley (opens in new tab), and a meetup at the Microsoft New England Research and Development center.
Congratulations to all participants – visit the WiDS Datathon page (opens in new tab) for the full list of winners. We look forward to continuing our engagement with the growing community of data scientists as they tackle challenges that will have positive lasting impact on research and technology!