Code Hunt Data Set Release 1 on GitHub
We are delighted to announce the first of the data set releases from the Code Hunt database on GitHub (opens in new tab). The data can be downloaded, and there are also some helpful tools for analyzing it.
Code Hunt (opens in new tab) is a serious education game which has been played by over 140,000 students and enthusiasts over the past year. In the process we have collected over 1.5M programs, which we can link to specific users at specific levels of expertise. We hope that you will embark on research into the data, discovering how coders code and how technology can be used to make the process more accurate and less painful. Although there has been research on how students code in the past, Microsoft Research is offering a unique opportunity to do this research on large, common data sets.
This data set contains the programs written by students (only) worldwide during a contest over 48 hours. There are approximately 250 users, 24 puzzles and about 13,000 programs. The readme file describes the format of the file. The summary program gives statistics related to the number of tries per puzzle, as well as the calculations that we performed, as were described in the workshop, and in our ICSE 2015 paper. See more on our Code Hunt Research (opens in new tab) page.
Please fill in the survey (opens in new tab)if you download the data.