Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable
How a team obtained private data, constructed a fake AI model, and got away with the money from a platform for adopting neglected pets

Kaggle just announced that the 1st Place Team, Bestpetting[1], has been disqualified from the Petfinder.my competition for cheating. The team crawled the pet adoption site to collect the private leaderboard answers, and hid this data in their submission to win the first prize on 2019–04–09. The first prize was $10,000 out of the $25,000 prize pool. In this post, you’ll find background on Kaggle, the darker side of its competitions, and links to the related resources.
Links
- An analysis of the code used to encode and decode the obfuscated answers
- The competition page
- Further discussion on reddit
What is a Kaggle competition
Kaggle (a subsidiary of Google), is an online community built around competitions to build machine learning models. With prize pools as high as $1,500,000, the platform attracted a diverse following. Such competitions present a dataset, and the metric which will be used to decide the winning submission. Competitors analyze the given data, build models to match the desired outcome, and submit their results (often alongside their code). To prevent cheating, machine learning competitions include data that isn’t labeled and is used in two phases:
- When the competition is over, the “private” part of the dataset which competitors only have as unlabeled data, is used to choose a winner. This data ideally represents how the model would perform on never-before-seen data.
- To rank competitors during the competition, team submissions are scored based on a part of the dataset dedicated to the “leaderboard”. Like the “private” test data, the competitors have this data but with no labels. Teams that optimize for the leaderboard metric alone will often lose because of a poor fit for the “private” part of the dataset.
Cheating with the private data
The result of this competition format is that if a team obtains the ground-truth answers to the private test dataset — it is effectively guaranteed to win. The models produced by the cheating team would be ineffective, invalidating the competition.
In this case, the cheaters packaged the private answers alongside their submission. Other attacks could be more difficult to detect. One such method would be to optimize hyperparameters using the full dataset, creating a model that seems as though it’s coincidentally more effective. Perhaps the cheating team chose a more detectable approach because they weren’t capable of creating a model that was leaderboard worthy at all, or they were too brazen to bother.
A mitigation to these issues could be to exclude the private data from the competition entirely. Submissions would have to include code that provides an API to generate predictions. This would also prevent competitors from knowing the distribution of the features in the private and leaderboard data.
The dark side of Kaggle
There are many potential problems with Kaggle competitions. I stumbled across an example in a competition to detect credit card fraud. A popular model was being trained using information from the future, which would make it unusable in practice — the bank doesn’t have a crystal ball. Many models use the datasets in ways which produce higher scores, but make the model useless to the competition organizer. These models can still win the competition as they don’t break any rules.
As a result of these loop-holes and potentials for useless results, competition organizers have to be extra vigilant and careful with their data, and rules. Some require victory in multiple rounds of competition, and some distribute the prize money almost evenly between a larger number of top submissions.
Repeat Offender
This wasn’t the first time Pavel defeated the purpose of a Kaggle competition, or was accused of cheating. It seems Kaggle failed to denounce and action these tactics in the past.


These misgivings weren’t just ignored by Kaggle, Pavel was a celebrated grandmaster as can be seen in this interview Kaggle produced.

Consequences
Aside from being stripped of his winnings and banned from the Kaggle platform, I reached out to Pavel’s employer, H2O.ai, for comment. This was Ingrid Burton’s response:
We were made aware of the situation earlier today. He is no longer affiliated with H2O.ai effective immediately. We will also be reaching out to Petfinder.my to see how we can help with their cause.
I did not find further information on Fedor Dobryanski, who was also banned from Kaggle. Per Kaggle’s investigation, they chose not to ban Narek Maloyan.
A brighter future for Kaggle
Kaggle has been a great force in pushing what’s possible with machine learning. Often winning submissions shine a light on the best tools and practices, and inspire inventing new techniques. Although impractical models and cheating have hurt the competition organizers, Kaggle brand, and the ecosystem at large — the good and potential far outweigh the bad.
Perhaps this is a sign of a new chapter for Kaggle. They have a lot of ground to cover if they want to rebuild the lost trust in the platform and its community. A push towards reproducible, fair, and useful model building competitions, is exactly what machine learning needs. Let’s hope we get it.

[1] The Bestpetting team included: Pavel Pleskov, Narek Maloyan and Fedor Dobryanski.
Edit 2020–06–05 —Petfinder.my announced on 2020–01–28 that the Bestpetting team issued an apology and returned the stolen funds.






