What should be the order of authors in your ML paper?
Answering this sometimes is harder than the P=NP question.

Fuzzy, subjective, controversial — these are some of the epithets people used to describe their logic in placing the authors in the right order. Indeed, each needs to estimate the credit everyone else deserves, while preferably not speaking about it out loud because it creates this tension between authors that distracts them from research, and then somehow to silently agree that yes, this order makes sense.
So I asked the big bosses, people who have hundreds of publications, how they resolve situations like this and from what I obtained it seems everyone has its own ad-hoc approach and there are few things that everyone agrees on. Overall, I still do not have the ultimate methodology on how to assign credit and then to place authors according to the credit they deserve, so if you know one please shoot me a message. But at least I know that I am not the only one who have asked this question and below are the responses to some questions worth asking once and for all.
Does the order matter in ML papers?
It kind of does, but in a confusing manner. Primarily everyone agrees that the first author in ML paper is the most deserving guy, the guy who did most of the work, experiments, figures, tables — he touched them all, this is his baby. Not only has he done most of the work pre-submission, but also he has responsibility post-acceptance to respond to the emails, fix bugs, and open-source the code. This is unlike other disciplines such as math, where the order is alphabetical and the contributions are hidden unless put in a corresponding section.
Now things would be easy if there was a single author (ah, the dream), but in reality, there are usually 2,3,5, or 10 other co-authors. So what about them? Well, let’s first handle supervisors aka “guys who paid for the show”. In general, it seems that the last place is a sweet spot and is saved for the supervisor unless you have several supervisors for the paper. So what to do if there are many of them?
Different options. Some people say that the last one is the supervisor of the first author, some that you should treat them by seniority level, keeping the senior to be the last. For some, this question should be privately discussed between the supervisors, but the bottom-line was that it probably does not matter much for people of such caliber.
What seems to matter though is the perception of people on the number of authors in the paper. Some say that 2, 5, or 50 people — it’s all the same, the paper is done and the value is not diluted. And for such people, it actually does not matter if there is some random weirdo who was hanging around and was included in the paper because his jokes were funny and hey, he is a nice guy after all. Contrary, some people believe the fewer authors, the more value everyone gets and 2–3–4 authors is the optimal number unless there was a large hadron collider built to support some crazy hypothesis. So depending on which camp you are in, left or right, either you think “yeah, let’s invite everyone to the party” and sort by their contributions in the middle or you think “the world should know it’s our paper” so let’s be very precise on who did what.
Okay, so with the order it’s clear. Now, what about the credit assignment?
This is again a highly biased opinion, but informally an author of the paper is either a chef (e.g. supervisor) or a worker (e.g. student) and each has its set of responsibilities for which he gets the scores.

For chefs, the set of responsibilities is to have a clear picture of what should be accomplished, to provide resources, to synchronize the workers, to resolve conflicts, to steer the group in the right direction, etc. Chefs are placed at the end of the list, so while we can compute their credits at each particular case, their positions won’t change much.
For workers instead, the credit assignment is more subtle. There are experiments, small and big, there are figures, there is text, there are ideas, there are theorems to prove, there is scheduling, and tons of other details. Gosh, it’s hard to be a worker. And it’s hard to estimate what is more important, but here is a little heuristic I use to roughly estimate the credits.
Model and main experiments = 40%
Supporting claims and minor experiments = 10%
Text and presentation = 15%
Theory and analysis = 30%
Ideas, scheduling, motivation = 5%
Of course, not all papers have a theoretical part, in which case you have more experiments, where the credit is due. And, of course, in other cases, that’s a theorem that rocks in which case experiments just support it. But you get the idea: split the work by buckets, assign the weight to each bucket, and get the sum of the weights as the credit of each co-author.
Again depending on your camp, you may decide on the threshold to include a co-author to the paper (or have a formal reason to remove that poor guy who was asking too many questions). You scored 20% in total, you are in. You made it to 50% take your first place. But more likely it will be the first author who did close to 80% and the rest will be divided among 5 fans.
In the end, perhaps the most important thing is that your paper has been accepted and well-cited and for a single paper it probably does not matter much about which position you are. Building a trustworthy group around yourself over time is what is important and it will seamlessly resolve questions like this. So build your network and good luck with your research!
P.S. I will continue to write about machine learning, papers, and all this jazz, so if you are interested, follow me on medium or subscribe to my telegram channel or my twitter.
