We have selected a set of 1,040 sentences from five Sherlock Holmes novels by Sir Arthur Conan Doyle. In each sentence, an infrequent word is chosen as the focus of the question. Four alternates to each word were chosen by hand from a list of thirty possibilities suggested by an N-gram language model. Both training and test data are available, and described in a companion technical report.
The MSR sentence completion challenge is intended to stimulate research in the area of semantic modeling. The challenge set consists of fill-in-the-blank questions similar to those found on the widely used Scholastic Aptitude Test. The sentence completion questions we focus on test the students ability to select words which are meaningful and coherent in the the context of a complete sentence. In general, this determination cannot be made on the basis of grammatical correctness alone.