How Medium Curates Articles

How Medium curates articles? It is a question that perplexes every writer enrolled with Medium Partner Program.

On a wider level, it is a holy grail question for any writer that writes for a curated platform.

Curation guidelines surely help, but when you have a very narrow waist funnel, it is quite easy for the selection to become subjective.

The only few that makes it are the ones that offer unique content, unique perspective on existing phenomenon, or both.

That’s what all writers think.

In reality, with large scale user generated content, it is quite impossible to judge uniqueness and perspective (even based on voice + lucid writing style).

You could have a panel discussion about a piece to be published. However, only big publishing houses can afford it. And even their wastebaskets are full of rejected masterpieces.

Content platforms that have enormous user generated content cannot imagine having that effort.

Not All Curation Eligible Articles Are Curated:

Remember that leaf out of curation guideline scroll?

The guidelines above can be considered a bare minimum — curators are looking for stories that surpass this minimum. Your story might not be curated even if it has not been disqualified by any of these criteria. Reasons for this include curators’ individual quality judgment and a writer’s past curation acceptance.

So where do they go?

Who eats them up?

Enter curation bot.

What Is A Curation Bot?

A piece of software sitting between you and human content curator. Its job is to filter out as many pieces as possible, and present N articles to human curators per day.

Let’s say, on a given day, a platform receives M articles. (10001 in the above example). M could vary with time.

Human curators have capability of evaluating, let’s say, N articles per day.

Depending upon N (100 in the figure example) + platform capabilities + number of readers + their individual preference distribution, X articles are curated every day.

Those X articles enjoy curation limelight. Authors of exactly X articles receive that much awaited email from curation team.

In the above figure, X could be reasonably 10–20, but actual number is obviously quite higher for Medium.

But where are those authors that are not fortunate (at least that day) to receive such email? They belong to two distinct sets.

Their writing was great, but somehow didn’t appeal to curators compared to other articles. Manual rejection: The standard editorial rejection route. They belong to (N - X).
Automatic Rejection: Their submissions belonged to a set having (M - N) articles.

Anyone can guess who is the more unfortunate (and bigger) lot here. It’s those authors whose articles belonged to (M - N).

They didn’t get read by human curators at all.

Not that all of them had great writing. They could suck at content, they could suck at writing style / voice, they could even suck grammatically.

But I will cover those points later. For now, read on.

Why Do I Think Medium Uses A Curation Bot?

By the number of Reads when the status of article changes to: “Not distributed in Topics.”

Several times, I have witnessed that transition from “We are processing this story. Hang tight!” to “Not distributed in topics” happens when my Read count is 1, and I know a person who has fully read it.

Sometimes, the time of the status transition does not coincide with the number of Reads update. While this is an indication of non-human rejection, it could also mean that previously a human read it, and updated the status later.

However, more often, it is the former case. Occurrences on that side are simply overwhelming to suggest that curation bot is present.

Such Read count discrepancy does not exist when my article gets successfully curated. Why? Because a human being read it before circulating it under topics.

Hence, a bot is present to select the ones that make it to the human curation team.

What Medium Curation Bot Could Look Like?

Disclosure: I am not a Medium insider. This article is purely speculative.

However, as a long time software developer, I know enough about the need for an automated bot to filter out chaff.

Again, I am not a data scientist. But whatever I know qualifies me to state general nature of how a curation bot could use AI to filter the content.

For a start, a curation bot could include:

Simple grammar + spelling checks
Checks for Medium curation guidelines i.e. rules checking such as title case for titles, sentence case for subtitles etc.
Checks to detect plagiarism (AKA Copyscape)
Checks to detect offensive content (using outright usage of racist, sexist or abusive words)
Checks for content length (pre-decided, based on readers historical preferences in read ratio e.g. Highest read ration is achieved when article length is 4 minutes long)
Checks for better voice i.e. usage of active voice vs passive voice and so on.

However, none of the above features require any AI.

AI will come into the picture when bot needs to work on past articles popularity + engagement data to select eligible articles for human curators to consider.

In other words:

AI plays its part not in filtering out the error-prone content, but in filtering in the popularity-prone content.

AI will shortlist likely popular positives from non-negatives, after other modules (grammar checks etc) have already filtered out clear negatives.

Such usage of AI in curation is any writer’s worst nightmare.

How AI Plays A Role Inside Curation Bot:

Simply put, it works on a pre-decided factors that human curators think were vital in popularity of certain number of articles.

Let’s call that certain number of historically popular articles H.

Those H sample articles were popular on Medium platform before curation bot was implemented.

Each of those H articles had following in common:

They all had 10K+ views.
They all had 40%+ Read Ratio.
They all had 1K+ claps.
They all had at least 1 comment.
They all had at least 3 highlights.
They were shared on social media from Medium.com and its app at least 20 times.

Let’s not forget that all those numbers are just for example. I am no insider, but if I would be the bot creator, those would be the numbers I would play around. Again, based on data that is available to me.

Next, as a bot developer, I would choose characteristics that I think were instrumental in leading up to that popularity for those H articles.

All of them had SEO worthy title. All articles having monthly google title searches > 100 will get p points. Digital marketers know how to get this.
All of them had 50%+ correlation between focus keyword and content. (Focus keyword is not something that author gets to enter, but it could be derived based on tags + title combination.) All eligible articles get q points.
Their images (content derived through openCV) and captions had strong correlation with content and between each other — r points.
They had citations e.g. “Scientific survey / journal (ABC / XYZ / PQR …) claims that” sort of content. One point per citation, s points in total.
They had relevant word density e.g. an article having an “Einstein” tag must have “relativity” word at least 10 times in 1000 word volume, and so on. For perfectly dense articles, allot t points. (it’s stupid, yes, but building a related tag cloud is possible)
Articles with certain voice characteristics i.e. narrative style based on most famous literature / non-fiction publication writing standards. It is algorithmically possible if you feed a bot some grammar + style rules. Gmail currently does it quite well, although it dumbs down email composing experience.

While it is a great standard in itself, it favors writers with relevant literature education or journalistic experience. Highest resemblance gets u points.

Finally, an article score will be summed up as:

Article Score = p + q + r + s + t + u

As you can guess, Highest scoring N articles will reach human curators.

Where The Bot Corrects Itself:

Wherever the bot fails i.e. an article succeeded in getting popular despite no curation, it could be fed back to the bot (H = H + 1) as a sample, not without human curators review.

So all forthcoming articles with similar characteristics will have higher chances of getting curated.

However, it can happen mostly in case of authors who are already popular. It is more likely to choose articles of popular authors rather than unique / meritorious content.

On the other hand, bot will have no way to correct itself if truly readable content got dropped out, and failed to gain popularity through other channels (followers + social media).

Why The Bot Could Be Near Perfect One Day, Yet It’s Bad For Writers, And Worse For Writing:

Simply because it ends up experimenting with current content instead of evaluating it. It kills uniqueness.

A lot of good content might be simply lost because it failed to resemble to anything popular that was previously published.

Even software-governed grammar rules, while completely valid from puritan view, could reject slangs (“Yay”, “Aye”) and accents only popular within some geographies, and flatten the writing to the point of being devoid of essence.

New authors (with less followers) are already having much competition from established authors having 10K+ followers.

The competition is compounded in the form of Medium backed publications such as Elemental, OneZero et al.

Newer writers will be harder pressed to push more content to improve their curation count against gradually depleting view count.

This will clearly favor quantity over quality in the long run.

As a result, writing will suffer.

Conclusion:

Curation bot is any writer’s worst nightmare. Whether Medium implements it or not is beyond the scope of this article, what we know for sure is that its curation team is made of 35 experts.

I do not know the number of articles they have to go through to select X eligible articles for front page readers’ eyeballs.

But next time if your article does not make it to that list, do not get disappointed. Keep intelligent experimenting with your content. You could strike an amazing curation rate when you have figured out, for your own content niche, how Medium curates articles.