The Microsoft Academic Graph (opens in new tab) makes it possible to gain analytic insights about any of the entities within it: publications, authors (opens in new tab), institutions (opens in new tab), topics (opens in new tab), journals (opens in new tab), and conferences (opens in new tab). In this series, we present analytic insights about current conferences, which we hope will help you prepare for attending each event. All the insights within are derived from the Microsoft Academic Graph and visualized in Microsoft Power BI. You can generate your own insights by accessing the Microsoft Academic Graph through the Academic Knowledge API (opens in new tab) or through Azure Data Lake Store (opens in new tab) (please contact us (opens in new tab) for the latter option). If you would like to learn how we generated the insights below, please see the repository with source code (opens in new tab).
In this post, we present historical trend analysis about the conference SIGIR – Special Interest Group on Information Retrieval (opens in new tab), taking place in Ann Arbor, Michigan, US. from July 8-12, 2018.
Click on each image for current trends and data hosted by Microsoft Academic Graph (opens in new tab).
SIGIR paper output
The chart below shows the evolution of the number of conference papers for each conference year.
In the following chart, the black bars represent average numbers of references per conference paper for each year. The data show that recent publications tend to cite more references. The green bars show the average number of citations of conference papers written in a given year. Note that the citations are raw counts and not normalized by the age of publications. This is because the “correct” way to normalize the citation counts turns out to be a nontrivial problem and may well be application dependent. Please treat the raw data presented as an invitation to conduct research on this topic!
That being said, a visible trend is that older publications tend to receive more citations because they have more time for researchers to recognize the contributions of the paper. There are, however, notable exceptions, the first in 1994, due to several highly cited papers:
- David D. Lewis, William A. Gale “A sequential algorithm for training text classifiers.” (opens in new tab)
- Stephen E. Robertson, Steve Walker “Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval.” (opens in new tab)
- Ellen M. Voorhees “Query expansion using lexical-semantic relations” (opens in new tab)
The second result, in 1998 and 1999, sees the technique of language model for information retrieval being introduced, leading quite a few papers to be highly cited in the ensuing years. However, in 2000, when the concept of discounted cumulative gain (DCG) is first proposed, most citations of the work go to the journal version (TOIS) of the work published two years later. That might explain why there is a deep decline in the citation counts of SIGIR 2000 relative to adjacent years.
Memory of references
How old are the papers cited by SIGIR papers? Follow a given year’s column to see the age of papers cited in conference papers published that year. For example, in 2017, SIGIR papers collectively cited 683 papers published in 2016, 657 papers published in 2015, and so on.
*If some years appear to cite publications from the future, it is most likely because they cited books. When a new edition of the book appeared, it replaced the previous one in the Microsoft Academic Graph and the citation appears to be from the future. In this representation, to generate a cleaner view, we removed all instances of papers citing papers more than two years in the future.
Outgoing references
What venues do SIGIR papers cite?
The pie chart shows the top 10 venues cited by SIGIR papers over time. SIGIR, CIKM, and WWW emerge as the top three.
The 100 percent stacked bar chart below shows the percent of references given by SIGIR papers to each of the top 20 venues, year by year.
Incoming citations
What venues cite SIGIR papers?
The pie chart below shows the top 10 venues of all time that cite SIGIR papers. SIGIR is the top one, followed by CIKM, and Information Processing and Management. See the table for year-by-year details of citations coming from each of the top 10 venues.
The 100 percent stacked bar chart below shows the citation distribution from the top 20 citing venues, year by year.
Most-cited authors
Who are the most-cited authors of all time in SIGIR papers? The interactive chart below ranks the most-cited authors by using number of publications cited by the conference and number of citations received from the conference. Authors do not have to have published in SIGIR to appear on this chart.
Who are the rising stars among the top cited authors in SIGIR? The 100 percent stacked bar chart below shows the citation distribution by the top 20 authors, year by year.
Top institutions
The bubble chart visualizes the top institutions at SIGIR by citation count. The size of the bubble is proportional to the total number of publications from that institution at SIGIR.
Get the most current data and also explore the top institutions at the conference in more detail by clicking the chart. Once on the underlying Microsoft PowerBI dashboard, click on a column to rank the top institutions by publication or citation count.
Top authors
The next three charts show author rankings according to different criteria.
The bubble chart displays SIGIR authors ranked by citation count, with bubble size being relative to publication count.
Get the most current data and also explore the top authors at the conference in more detail by clicking the chart. Once on the underlying Microsoft PowerBI dashboard, you can also explore the top conference authors in more detail. Click on a column to rank the top authors by Microsoft Academic rank, publication, or citation count.
The bubble chart below visualizes author rank, which is calculated by Microsoft Academic by using a formula that is less susceptible to citation counts than similar measures. The X axis shows author rank. The higher an author’s rank, the closer they are to the right side. The Y axis normalizes the rank by publication count and enables us to identify impactful authors who might not have had a very large number of publications. The closer an author is to the top, the higher their normalized rank. Of course, the area of the chart that represents the highest rank is the top right corner.
Stephen Roberson is an interesting case. Although he is one of the most influential authors in the information retrieval field, he’s only ranked at the 19th place for SIGIR conference. It turns out the Stephen’s best work is not published at SIGIR. BM25F is published at CIKM in 2004 [1], then in a booklet in 2009 [2]. He got his fame mostly from Okapi, published first at 1994 TREC [3] through 1999 [4], again, at TREC. His most well-cited work at SIGIR is an approximation to 2-Poisson model [5], and a CAL paper with the Bing team using pseudo-relevance feedback [6] that is no longer in the production. He co-authored a paper questioning the use of language modeling techniques for IR [7] which, unfortunately, prevailed until today against his predictions.
- Robertson, Stephen E., et al. “Simple BM25 Extension to Multiple Weighted Fields.” Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, 2004, pp. 42–49. (opens in new tab)
- Robertson, Stephen E., and Hugo Zaragoza. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval, vol. 3, no. 4, 2009, pp. 333–389. (opens in new tab)
- Robertson, Stephen E., et al. “Okapi at TREC.” Overview of the Third Text REtrieval Conference, no. 500207, 1994, pp. 109–123. (opens in new tab)
- Robertson, Stephen E., and Steve Walker. “Okapi/Keenbow at TREC-8.” TREC, 1999. (opens in new tab)
- Robertson, Stephen E., and Steve Walker. “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval.” Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994, pp. 232–241. (opens in new tab)
- Cao, Guihong, et al. “Selecting Good Expansion Terms for Pseudo-Relevance Feedback.” Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008, pp. 243–250. (opens in new tab)
- Allan, James, et al. “Challenges in Information Retrieval and Language Modeling: Report of a Workshop Held at the Center for Intelligent Information Retrieval, University of Massachusetts Amherst, September 2002.” International ACM SIGIR Conference on Research and Development in Information Retrieval, vol. 37, no. 1, 2003, pp. 31–47. (opens in new tab)
We hope you have enjoyed the analytic insights into this conference made possible by the Microsoft Academic Graph (opens in new tab)! Please visit our Microsoft Academic Graph (opens in new tab) page to learn how you can use our knowledge graph to generate your own custom analytics about an institution, a topic, an author, a publication venue, or any combination of these.
As always, we would like to hear from you either through the feedback link at the bottom right of the website (opens in new tab) or on Twitter (opens in new tab).