avatarSergei Ivanov

Summary

This context discusses a comprehensive analysis of authors, organizations, and countries participating in ICML 2020, one of the most important conferences in Machine Learning.

Abstract

ICML 2020 accepted 1088 papers out of 4990 submissions, resulting in a 21.8% acceptance rate. The analysis highlights top authors, organizations, and countries contributing to the conference. Masashi Sugiyama from RIKEN and the University of Tokyo had an impressive 11 accepted papers. Google dominates the list of top organizations, participating in approximately 1/10 of the papers published at ICML. The USA published 728 accepted papers, approximately 3/4 of all papers. The analysis also explores collaborations between different organizations, creating a graph with 426 nodes and 1206 edges.

Opinions

  • Publishing at ICML is incredibly hard, making it even more impressive for authors who have multiple accepted papers.
  • Google dominates the list of top organizations, but the author warns that it's not fair to say that Google and DeepMind published 114+51 papers, as many of these papers were done in collaboration.
  • The USA has a huge number of organizations, both in industry and academia, with a significant number of papers, while UK's performance is led by DeepMind, followed by universities.
  • Most top-publishing organizations are universities, except for the USA and China.
  • The author notes that creating a mapping for all possible affiliations is a nightmare due to abbreviations, typos, and different names for the same institutions, so the provided mapping may not be perfect.
  • The author encourages readers to play with the code on GitHub and a Colab notebook for further exploration and analysis.
  • The author shares additional resources and publications on medium and invites interested individuals to explore career opportunities in Product, Research & Development at Criteo.

ICML 2020. Comprehensive analysis of authors, organizations, and countries.

ICML is one of the most important conferences in Machine Learning and therefore it’s interesting to see who publishes at this conference. So I looked at the accepted papers for ICML 2020 and analyzed authors, organizations, and countries that participated this year. The conference will take place virtually from 13th to 18th July in 2020.

This year there are 1088 accepted papers from 4990 submissions, leading to 21.8% acceptance rate.

Before we dive in, the code can be found at GitHub repo and you can build your own plots in this Colab notebook (no installation required).

Authors

Let’s first take a look at the top authors.

Publishing at ICML is incredibly hard and hence it’s even more impressive to see that so many authors published several papers. Masashi Sugiyama from RIKEN and the university of Tokyo has astonishing 11 accepted papers. He is followed by Michal Valko (DeepMind), Michael Jordan (UC Berkeley), and Dale Schuurmans (Google / U. of Alberta).

Let’s now look at global ranking by the organization. For each organization, I count the set of all papers it participated in. Here are top-30 organizations.

Google dominates the list, participating approximately in 1/10 of the papers published at ICML. It is followed by 3 institutions: MIT, Stanford, and Berkeley. Alphabet’s DeepMind concludes the Top-5 organizations. One note of caution that it’s not fair to say that Google+DeepMind published 114+51 papers as many of these papers were done in collaboration, as we will see next.

Countries

Here is a fun part. I created a mapping between an affiliation of the author and its country, so we can see which countries publish the most.

As a disclaimer, I must warn that creating a mapping for all possible affiliations is a nightmare (people abbreviate, make typos, and call the same institutions differently), so I did my best to get a decent mapping, which has countries for ~7K affiliations, but I think it’s not perfect, and if you see some missing mappings, feel free to edit the mapping file yourself.

Let’s take a look at the breakdown by countries.

Wow! The USA participated in 728 accepted papers, approximately 3/4 of all papers. A huge lead compared to other countries.

Here is another warning: the country is attributed based on the organization’s headquarter, not on the author’s location. So if an author works at Google Zurich, the paper will be counted to the USA, and not to Switzerland.

However, despite the warning above, the numbers are not too far from reality. If we just consider universities, i.e. organizations that have only a single presence in the world, the plot would look as follows:

That is, even without companies, the USA still participates in publishing more than a half of the papers at ICML. If we add all the industrial researchers who work in the USA, the numbers would be close to the previous plot.

Another interesting observation is that UK and China approximately published the same number of papers. As we will see next, UK’s DeepMind approximately publishes 40% of the overall country’s record.

Let’s look at each country individually. The following are top-10 organizations (3+ papers) for top-15 countries:

USA has a huge number of organizations, both in the industry and the academy, with a significant number of papers. Contrary, UK performance is led by DeepMind, which is followed by universities.
China has strong institutions, but companies such as Huawei, Alibaba, and Baidu are catching up. In Canada, it’s almost all universities that publish papers.
Criteo (France) is top-2 company in Europe by the number of published papers.
École Polytechnique Fédérale de Lausanne and ETH Zurich are the top performers for Switzerland.

So it seems that except for the USA and China, most of the top-publishing organizations are universities. Globally, universities published 3 times more than companies.

Among non-US companies, only companies from UK (DeepMind), France (Criteo), China (Huawei, Baidu, Alibaba), Russia (Yandex), and South Korea (Samsung) published more than 5 papers.

Collaboration

We can also look at how different organizations collaborate with each other. I build a graph with collaborations between different organizations, which in total has 426 nodes and 1206 edges. If we plot it, we see a bunch of points connected by edges. You can interact with it in colab notebook.

If we just take a subgraph of nodes that have at least 30 collaborations, then we get a more appealing graph.

We can also take a look at individual companies. For example, for Google and MIT it looks as:

It’s interesting to see that Google does not collaborate with other companies as much as they do with universities. MIT, on the other hand, has a big number of collaborators from the industry.

Finally, let’s look at the overall number of authors and organizations per paper.

Most of the papers have 3–4 authors, but some rare exceptions have as many as 15 authors.

Two papers have 15 authors overall: Stochastic Flows and Geometric Optimization on the Orthogonal Group by 15 researchers from Google, Oxford and Cambridge Universities, Columbia and Berkeley Universities; and Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising by 15 researchers from Tianjin University, Alibaba, Tsinghua University, and Shanghai Jiao Tong University.

And if we look at the number of different organizations per paper, then it’s as follows:

One or two organizations per paper are the most common scenario, but some papers have as many as 7 organizations involved in writing a paper.

Two papers have authors from 7 different organizations: How Good is the Bayes Posterior in Deep Neural Networks Really? a collaboration between Google, Microsoft, University of Warsaw, University of Amsterdam, UC Irvine, ETH Zurich, and Imperial College London. And, Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning a collaboration between 99andBeyond, University of Montreal, IIIT Hyderabad, MIT, Mila, University of Delaware, and LinkedIn.

I will stop here, I think now we understand much more about what authors, organizations, and countries publish the most, but I bet you may have even more ideas and questions, so feel free to play with the code on GitHub and Colab notebook.

P.S. If you like this story, consider following me on medium or subscribe to my telegram channel or my twitter.

Don’t forget to check out our latest publications on medium:

Interested in joining our journey? Head over to our career page:

Icml 2020
Icml
AI
Machine Learning
Conference
Recommended from ReadMedium