Two minutes NLP — Tips for Recommender Systems with NLP
Content-based and User-based Filtering, Collaborative Filtering, and Hybrid Approaches

There are several types of recommender systems, but not all of them are suitable to be implemented with NLP techniques. Suppose we are building a recommender system for Medium, where our goal is to suggest articles to the users.
Content-based filtering

Content-based filtering methods are based on descriptions of the items to be recommended. They are best suited to situations where there is known data on the items (like name, description, etc.), but not on the users (like his/her previously read articles). These algorithms try to recommend items similar to those that a user liked in the past or is examining in the present.
A key advantage of content-based filtering is that it doesn’t need to know a list of items the user has interacted with in the past, which is usually collected after the user has interacted with the service for some time. As a consequence, content-based filtering works well from day one of a new user.
If we have text data that describe the items, we can leverage NLP techniques to compute items' similarity with document embeddings, like the ones obtained with Doc2vec.
Collaborative filtering

Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past.
A key advantage of the collaborative filtering approach is that it does not rely on items’ descriptions and therefore it is capable of accurately recommending complex items such as movies without requiring an “understanding” of the item itself. However, this approach needs to know a list of items the user has interacted with in the past, thus suffering from the cold start problem.
As collaborative filtering does not rely on descriptive features of items and users, it’s not possible to leverage NLP techniques.
I personally found the collaborative filtering approach to outperform the content-based approach when enough data is available.
User-based filtering

It is possible to create recommender systems that are based on similarities between users as well, though they commonly perform worse than content-based and collaborative filtering.
Similar to content-based filtering, if we have text data that describe the users, we can use it to compute users similarity leveraging embeddings.
Hybrid filtering

Given the pros and cons of the different types of recommender systems, it is very common to use a hybrid approach.
Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on Medium, LinkedIn, and Twitter!
Two minutes NLP related posts






