avatarIsmael Araujo

Summary

PyTrends is a Python library that enables users to retrieve and analyze Google Trends data, enhancing the depth and predictive capabilities of data-driven projects.

Abstract

PyTrends is a versatile Python library that interfaces with Google Trends to provide users with the ability to extract and analyze search trend data. This tool is particularly useful for adding context to data analysis, as it can reflect public interest and potential market trends. The library simplifies the process of accessing historical search data, which can be used for various applications such as predicting housing prices based on search query popularity. PyTrends allows for daily data retrieval, which is not directly available through the Google Trends website, thus offering a more granular level of analysis. Additionally, it provides keyword suggestions and related topics through Freebase IDs, ensuring precise data extraction. The library is easy to install and use, and it can be applied to a wide range of subjects, making it an invaluable resource for data analysts and scientists looking to enrich their projects with real-world trend data.

Opinions

  • The author believes that PyTrends can significantly enhance the quality of data projects by providing access to Google Trends data, which can reveal the popularity and story behind specific keywords.
  • PyTrends is considered a "must-have" for projects that require an understanding of keyword popularity, as it can explain trends and potentially predict outcomes in various fields such as real estate or health.
  • The author emphasizes the advantage of PyTrends in obtaining daily trend data, which is not available directly from Google Trends, thus allowing for more accurate and detailed analysis.
  • The library is praised for its ability to streamline the data extraction process, eliminating the need for manual data downloads and imports, which can be time-consuming and cumbersome.
  • The author suggests that PyTrends can be applied to virtually any subject matter, highlighting its versatility and broad utility in data science and analytics.
  • The use of Freebase IDs for precise keyword searches is highlighted as a valuable feature for obtaining targeted trend data.
  • The article concludes with a strong endorsement of PyTrends, encouraging readers to explore its capabilities and integrate it into their data projects for improved insights and predictive modeling.

PYTHON

PyTrends: A Python Library That You Should Know About

PyTrends will bring your next project to the next level. Here’s why.

Photo by Max Duzij on Unsplash

I have been writing for Medium for over a year, and writing about Python libraries is my favorite topic. I love learning about them and new ways to do things. However, I need to assume that sometimes I focus on those super cool and impressive libraries that do a lot with one click and forget those simpler but still essential libraries.

Well, today I will change that and talk about an impressive library called PyTrends. It will take your project to the next level!

What’s PyTrends?

PyTrends is a Python library that easily retrieves data from Google Trends through its API. For those not familiar with Google Trends, it’s a website owned by Google that shows the popularity of queries in Google Search overtime.

Screenshot by the author

Ok, but why is PyTrends a must-have for my next project?

I’m glad you asked. Through Google Trends, you can add how popular a keyword is, which can explain a lot about a specific topic. The more popular a keyword is, the more story it might have behind it. For example, let’s say you want to predict the housing price. If more people are looking to buy a house, it’s reasonable to think that houses price would increase. However, if fewer people are looking for homes, we can assume that prices would be lower.

If you are working on a scientific project, you can add information to help people understand the topic and its importance. For example, if you are working on a health-related project. There are no disadvantages to adding more information to help you understand a project and, on top of that, help other people.

Why not get the data straight from the website?

Although we can download data from Google Trends’s website, there is a caveat. If you are looking for a more extended period, we can’t get daily data, only weekly. If you want to be more accurate in your analysis, daily data would be the best option.

Also, you would have to leave your project notebook, search for the keyword, download the data, then import the data to your notebook using Pandas… Lots of steps and no fun! Ok, with that said, let’s learn how to use PyTrends.

Installation

Installing PyTrends is simple. Just type pip install pytrends in your Terminal, and we are good to go. Now, let’s set up some parameters to the PyTrends function. This step is optional because you can do the same inside the function, but for organization purposes, we will do outside the function.

First, let’s find a subject. Why don’t we start with COVID, as we will be able to see its rise on the search? I will add it to the kw_list variable. Here, you can add a list of keywords. Then, choose the frequency, which can be hourly, daily, weekly, monthly, or yearly. Let’s do it daily. Then, I will select the region. Today we will check the US. Lastly, you can choose the language. I will not add the variable to the function because it doesn’t matter for this search, but you can do it.

Then, we select the start date and end date. Note that I did year, month, day, and hour separeraly.

kw_list = [‘covid’]
frequency = ‘daily’ # ie. hourly, weekly, monthly, yearly
geo = ‘US’
hl=’en-US’
# Select Start Date
year_start = 2017
month_start = 6
day_start=1
hour_start=0
# Select End Date
year_end=2020
month_end=6
day_end=30
hour_end=0

Now, let’s run the function and retrieve the data. Here we will add the variables we already created into the function. It looks more complicated than it is, but if we look at the code below, I’m just adding the parameters that we set-up into the PyTrends function, and did some changes to the DataFrame that are optional and won’t influence our results.

google_trends = pytrends.get_historical_interest(kw_list,
 year_start = year_start, 
 month_start = month_start, 
 day_start = day_start, 
 hour_start = hour_start, 
 year_end = year_end, 
 month_end = month_end, 
 day_end = day_end, 
 hour_end = hour_end, 
 cat=0, 
 geo=geo, 
 gprop=’’, 
 sleep=0,
 frequency=frequency)
google_trends = google_trends.reset_index()
google_trends.columns = [‘date’, ‘keyword’,’partial’]
pd.to_datetime(google_trends[‘date’])
google_trends.head()
GIF by the author

Cool, we ran PyTrends. As result, we got three columns. Date, Google Trends’ results, and if the results are partial or not, which I dropped. Google Trends’ results are numbers from 0 to 100 that shows how popular a keyword was on a given date. Please note that the numbers are not search volume, but a grade based on popularity. Let’s visualize the data to understand the results better.

# Plot google trends over time
sns.set(rc={"figure.figsize":(14, 6)})
sns.lineplot(data=google_trends, x='date', y='keyword')
Screenshot by the author

Above, we can see the popularity of the covid keyword from January 2020 to May 2022. The beginning of the year didn’t have many people searching about it, the numbers explored after the second half of March 2020, and there were some picks and drops over time. We can also visualize some picks in July 2021 (Delta variant) and December 2021-January 2022 (Omicron variant). As I mentioned in the beginning, we can see the story being told through numbers. Take a moment and think about all the stories you can tell using the same logic.

Let’s look at other keywords and see what we can find. This time, let’s imagine that you are trying to predict the price of a house. How can we leverage Google Trends on it? We can start looking at how the term has been performing in the past few years. We will use the same code we used to check for COVID cases, changing the keyword to buy house and the dates from 2015 to 2022.

Screenshot by the author

Interesting. It looks like the query buy house has some ups and downs overtime. Maybe we can extract seasonality? We can check that using the code below

from statsmodels.tsa.seasonal import seasonal_decompose
series = google_trends.set_index('date')
result = seasonal_decompose(series, model='additive', period=365)
result.seasonal.plot()
Screenshot by the author

Look at this! We can clearly see the seasonality of buying a house. The first half of the year is when people are more interested in buying a home, while in the second half, the numbers are smaller. If we think about it, it makes sense. In the US, school start in July-August, so people want to move before their kids go back to school. Also, winter might not be the best month to move. I’m sure there are even more possible explanations. So far, we have learned a lot of information about some queries. Let’s move on to other cool features that PyTrends can provide.

Keyword suggestions

We are also able to get keywords suggestions using PyTrends. Let’s say that you have a subject in mind and want some insights, we can do that with the code below. Let’s see how it works.

# Get Google Keyword Suggestions
pytrend = TrendReq()
keywords = pytrend.suggestions(keyword='buy house')
df = pd.DataFrame(keywords)
df
Screenshot by the author

We got some interesting information. First, there’s a column called mid that we haven’t discussed yet, so let’s take a step back. When we search for something on Google, it automatically fixes misspellings and related topics. For example, if we search for COVID, Google will search for all related terms such as COVID-19, coronavirus, pandemic 2020, etc. Google Trends does the same, but PyTrends is more specific. It looks for the exact keyword you typed. How can we then look at a keyword the way Google does?

Well, there’s the Freebase ID, which is the code you see under mid. If we use that code, we will look at a keyword exactly how Google would do. So, if we want to search for everything related to Coronavirus disease 2019, we will use the code g/11j2cc_qll. Going back to the Dataframe, we also have the title of the subject, and the type, which is very helpful because if gives us a better understanding of what results we will get.

Let’s see one more example.

Screenshot by the author

Above, I checked for keywords related to the United States, and we can see that it returned related topics such as the American currency, the country, the current president, and even the mail company. With the freebase ID, you can ensure that you are looking for the term that interests you the most.

Final Thoughts

Today we went over some of the capabilities of PyTrends, which many data analysts or data scientists can easily oversee. It’s a simple but powerful tool. I can’t express enough how PyTrends can be helpful for your next project. You will be one step ahead of everyone who focuses mainly on what dataset for their project. I can’t think of a subject in which PyTrends wouldn’t be helpful.

You can find the code I used in this blog here. I also included some additional code that can be useful for your next project. If you decide to give PyTrends a try, let me know. I can’t wait to hear all the fantastic projects you will be able to develop with the help of PyTrends. Happy Coding!

You might also like…

These Are the 10 Best Lesser-Known Python Libraries You Should Know About Mito: One of the Coolest Python Libraries You Have Ever Seen D-Tale: One of the Best Python Libraries You Have Ever Seen

Python
Data Science
Artificial Intelligence
Programming
Coding
Recommended from ReadMedium