avatarKaushik Sureshkumar

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2682

Abstract

her quality recommendations. In particular, there are two main factors that affect the quality of feedback, implicit vs explicit and the granularity of the feedback.</p><p id="2797"><b>Implicit feedback</b> is what we call subconscious and natural user interactions that we can use as a proxy for the user’s true preferences. Whether they reacted to a post (LinkedIn), whether they clicked on a playlist (Spotify) and whether they followed another user (Twitter) are some examples of implicit feedback.</p><p id="f7e1"><b>Explicit feedback</b> refers to actual ratings that users have given to items. This conveys more information than implicit feedback since we don’t need to guess what the user’s true preferences are. User ratings of products (Amazon), films (Netflix) and apps (App Store) are some examples of explicit feedback. Although explicit feedback gives us more information, it’s much harder to acquire this data. Since rating items requires more effort from the user, fewer of them will actually do it. The result is a very sparse user-item matrix with few entries, leading to a less than optimal recommendation algorithm.</p><p id="1aab"><b>Granularity</b> of the feedback refers to how much information we get from it. A rating scale out of 10 gives us more information than a rating scale out of 5 which in turn gives us more information than a rating scale out of 2. It allows us to understand not just whether a user liked or disliked an item, but also how much they liked or disliked it.</p><p id="3c77">Most implicit feedback data is binary — did a user perform a specific action on this item or not — and therefore less granular than explicit feedback data, however it’s much easier to acquire as it requires less effort from the user.</p><figure id="d01e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*VtOS2bQriV0WrAND"><figcaption>Photo by <a href="https://unsplash.com/@onice?utm_source=medium&amp;utm_medium=referral">Joey Huang</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h2 id="815c">TikTok’s main advantages</h2><p id="b91b">TikTok is able to use the time a user spends on each video as a form of <b>granular implicit feedback</b>. Not only is it granular but since it’s a time-based scale it is continuous, so it’s a very accurate proxy for the user’s true preferences. Since it doesn’t require any extra effort from the user in order to gather this data, TikTok is able to do this successfully at scale, leading to super accurate recommendations.</p><p id="6e76">This kind of granular implicit feedback is something that Netflix and Spotify could also use, and

Options

no doubt they do. However this is where TikTok’s second major advantage makes a difference — the format of content on their platform. Short form videos are much easier to consume than films, episodes and even songs. In a given TikTok session, a user is likely to scroll through 20 different videos, creating 20 new entries in the user-item matrix. Whereas in a Netflix session, a user is likely to only watch 1 film or TV show, creating just 1 new entry in the user-item matrix. This allows TikTok to gather this data at scale very easily, resulting in a very accurate algorithm.</p><h2 id="8b5b">How does Machine Learning fit into this?</h2><p id="61c4">Building, storing and computing on this user-item matrix is extremely computationally expensive. So we use ML methods like <a href="https://towardsdatascience.com/paper-summary-matrix-factorization-techniques-for-recommender-systems-82d1a7ace74">Matrix Factorisation</a> or neural net methods like <a href="https://towardsdatascience.com/deep-learning-based-recommender-systems-3d120201db7e">Deep Collaborative Filtering</a> to learn user and item representations (matrices or embeddings, respectively) which are easier to store and compute on.</p><p id="0fed">Data Scientists at TikTok use these Machine Learning and Deep Learning techniques on the user-item matrix to create great recommendation algorithms. However, granular implicit feedback, along with the fact that the product is built around short form videos, the easiest form of media to consume, is what allows them to gather this vast wealth of data and use it so efficiently to perfect their feed algorithms.</p><p id="4350">Thanks for reading this article! I hope it helped you get a better understanding of how TikTok is able to serve very accurate recommendations.</p><p id="3563"><i>If you enjoy reading my articles, would like to support my writing and are thinking of getting a Medium subscription, feel free to use my referral link below. I’d get a percentage of your subscription fees.</i></p><div id="d144" class="link-block"> <a href="https://medium.com/@kaushsk12/membership"> <div> <div> <h2>Join Medium with my referral link — Kaushik Sureshkumar</h2> <div><h3>Read every story from Kaushik Sureshkumar (and thousands of other writers on Medium). Your membership fee directly…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*CusCfRP1lpUIyIdx)"></div> </div> </div> </a> </div></article></body>

How does TikTok’s algorithm know me so well?

Photo by Hello I'm Nik 🎞 on Unsplash

The phrase “knows me better than I know myself” was mentioned a few times when discussing TikTok’s algorithm with friends. This is a mark of a great recommendation system, so I wanted to look at what makes it so good from a data point of view.

The data science team at TikTok have no doubt worked on many different recommendation systems involving complex machine learning techniques. This post, however, will focus on how the data that TikTok uses for this exercise, along with the format of content on the platform, already give it a massive advantage.

To give you a better understand of why this is, let me briefly introduce how recommendation systems work. These algorithms are built on what’s called a user-item matrix. This is essentially a table where each row represents an individual user and each column represents an individual item that the algorithm needs to recommend. For example, at Spotify these items would be songs, at Netflix they’d be films and at Tinder they’d be other users.

The value in each cell of the table represents some sort of interaction between the user and the item. This could be whether a user liked a post (Instagram), a star rating of a product after use (Amazon) or whether they finished listening to a song (Spotify). We call this feedback since it gives us information on whether the user liked the item or not and, in some cases, how much they liked it.

Using this table, given a user, we can work out other similar users, and recommend items those users have watched/bought etc. Given items we can recommend other items based on what other users have bought with this item etc.

Photo by Celpax on Unsplash

How does this relate to quality of recommendations?

The quality of feedback has a big impact on the quality of recommendations. Higher quality information on the user’s preferences leads to higher quality recommendations. In particular, there are two main factors that affect the quality of feedback, implicit vs explicit and the granularity of the feedback.

Implicit feedback is what we call subconscious and natural user interactions that we can use as a proxy for the user’s true preferences. Whether they reacted to a post (LinkedIn), whether they clicked on a playlist (Spotify) and whether they followed another user (Twitter) are some examples of implicit feedback.

Explicit feedback refers to actual ratings that users have given to items. This conveys more information than implicit feedback since we don’t need to guess what the user’s true preferences are. User ratings of products (Amazon), films (Netflix) and apps (App Store) are some examples of explicit feedback. Although explicit feedback gives us more information, it’s much harder to acquire this data. Since rating items requires more effort from the user, fewer of them will actually do it. The result is a very sparse user-item matrix with few entries, leading to a less than optimal recommendation algorithm.

Granularity of the feedback refers to how much information we get from it. A rating scale out of 10 gives us more information than a rating scale out of 5 which in turn gives us more information than a rating scale out of 2. It allows us to understand not just whether a user liked or disliked an item, but also how much they liked or disliked it.

Most implicit feedback data is binary — did a user perform a specific action on this item or not — and therefore less granular than explicit feedback data, however it’s much easier to acquire as it requires less effort from the user.

Photo by Joey Huang on Unsplash

TikTok’s main advantages

TikTok is able to use the time a user spends on each video as a form of granular implicit feedback. Not only is it granular but since it’s a time-based scale it is continuous, so it’s a very accurate proxy for the user’s true preferences. Since it doesn’t require any extra effort from the user in order to gather this data, TikTok is able to do this successfully at scale, leading to super accurate recommendations.

This kind of granular implicit feedback is something that Netflix and Spotify could also use, and no doubt they do. However this is where TikTok’s second major advantage makes a difference — the format of content on their platform. Short form videos are much easier to consume than films, episodes and even songs. In a given TikTok session, a user is likely to scroll through 20 different videos, creating 20 new entries in the user-item matrix. Whereas in a Netflix session, a user is likely to only watch 1 film or TV show, creating just 1 new entry in the user-item matrix. This allows TikTok to gather this data at scale very easily, resulting in a very accurate algorithm.

How does Machine Learning fit into this?

Building, storing and computing on this user-item matrix is extremely computationally expensive. So we use ML methods like Matrix Factorisation or neural net methods like Deep Collaborative Filtering to learn user and item representations (matrices or embeddings, respectively) which are easier to store and compute on.

Data Scientists at TikTok use these Machine Learning and Deep Learning techniques on the user-item matrix to create great recommendation algorithms. However, granular implicit feedback, along with the fact that the product is built around short form videos, the easiest form of media to consume, is what allows them to gather this vast wealth of data and use it so efficiently to perfect their feed algorithms.

Thanks for reading this article! I hope it helped you get a better understanding of how TikTok is able to serve very accurate recommendations.

If you enjoy reading my articles, would like to support my writing and are thinking of getting a Medium subscription, feel free to use my referral link below. I’d get a percentage of your subscription fees.

Recommendations
Social Media
Data Science
Product
Recommendation System
Recommended from ReadMedium