The Microsoft Academic Graph (opens in new tab) makes it possible to gain analytic insights about any of the entities within it: publications, authors (opens in new tab), institutions (opens in new tab), topics (opens in new tab), journals (opens in new tab), and conferences (opens in new tab). In this series, we present analytic insights about current conferences, which we hope will help you prepare for attending each event. All the insights within are derived from the Microsoft Academic Graph and visualized in Microsoft Power BI. You can generate your own insights by accessing the Microsoft Academic Graph through the Academic Knowledge API (opens in new tab) or through Azure Data Lake Storage (opens in new tab) (please contact us (opens in new tab) for the latter option). If you would like to learn how we generated the insights below, please see the repository with source code (opens in new tab).
In this post, we present historical trend analysis about the conference KDD – Knowledge Discovery and Data Mining (opens in new tab), taking place in London, United Kingdom, from August 19-23, 2018. We derive insights from 1995 to the latest available year.
Click on each image for current trends and data hosted by.
KDD paper output
The chart below shows the evolution of the number of conference papers for each conference year.
In the following chart, the black bars represent average numbers of references per conference paper for each year. The data show that recent publications tend to cite more references. The green bars show the average numbers of citations received by conference papers written in a given year. Note that the citations are raw counts and not normalized by the age of publications. This is because the “correct” way to normalize the citation counts turns out to be a non-trivial problem and may well be application-dependent. Please treat the raw data presented as an invitation to conduct research on this topic!
A visible trend is that older publications tend to receive more citations because they have more time to receive recognition. There is, however, a notable exception in 1996 due to two highly cited papers:
- Ester, Martin, et al. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” KDD, 1996, pp. 226–231. (opens in new tab)
- Heckerman, David, et al. “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data.” Knowledge Discovery and Data Mining, vol. 20, no. 3, 1994, pp. 197–243. (opens in new tab)
Memory of references
How old are papers cited by KDD papers? Follow a given year’s column to see the age of papers cited in conference papers published that year. For example, in 2017, KDD papers collectively cited 657 papers published in 2016, 709 papers published in 2015, and so on.
*If some years appear to cite publications from the future, it is most likely because they cited books. When a new edition of the book appears, it replaces the previous one in the Microsoft Academic Graph, and the citation appears to be from the future. In this representation, we remove all instances of papers citing papers more than two years in the future so that we can generate a cleaner view.
Outgoing references
What venues do KDD papers cite?
The bar chart shows the top 10 venues cited by KDD papers. KDD, ICML, and NIPS emerge as the top 3.
The 100 percent stacked bar chart below shows the percent of references given by KDD papers to each of the top 10 venues, by year.
Incoming citations
What venues cite KDD papers?
The bar chart below shows the top 10 venues that cite KDD papers. Again, KDD is at the top, followed by ICDM and CIKM. See the table for year-by-year details of citations coming from each of the top 10 venues.
The 100 percent stacked bar chart below shows the citation distribution from the top 10 citing venues, by year.
Most-cited authors
Who are the most-cited authors of all time by KDD papers? The chart below ranks the most-cited authors by using the number of publications cited by the conference and the number of citations received from the conference. Authors do not need to have published in KDD to appear on this chart.
Who are the rising stars among the top cited authors in KDD? The area chart below shows the number of KDD citations received by the four most-cited authors, by year.
Top institutions
The bubble chart visualizes the top institutions at KDD by citation count. The size of the bubble is proportional to the total number of publications from that institution at KDD.
See the most current data and explore the top institutions at the conference in more detail by clicking the chart. Once you are at the underlying Microsoft Power BI report, click on a column to rank the top institutions by publication or citation count.
Top authors
The next three charts show author rankings, according to different criteria.
The bubble chart displays KDD authors ranked by citation count, with bubble size being relative to publication count.
See the most current data and explore the top authors at the conference in more detail by clicking the chart. Once on the underlying Microsoft Power BI report, you can explore the top conference authors in more detail. Click on a column to rank the top authors by Microsoft Academic rank, publication, or citation count.
The bubble chart below visualizes author rank, which is calculated by Microsoft Academic by using a formula that is less susceptible to citation counts than similar measures. The X axis shows author rank. The higher an author’s rank, the closer they are to the right side. The Y axis normalizes the rank by publication count and enables us to identify impactful authors who might not have had a very large number of publications. The closer an author is to the top, the higher their normalized rank. The area of the chart that represents the highest rank is the top right corner.
We hope you have enjoyed the analytic insights into this conference made possible by the Microsoft Academic Graph! Please visit Microsoft Academic Graph (opens in new tab) to learn how you can use our knowledge graph to generate your own custom analytics about an institution, a topic, an author, a publication venue, or any combination of these.
As always, we would like to hear from you either through the feedback link at the bottom right of the website (opens in new tab), or on Twitter (opens in new tab). You can also find our project home page with this blog on the Microsoft Research site at aka.ms/msracad (opens in new tab).