Generate SRT File (Subtitles) using Google Cloud’s Speech-to-Text API
The code used in this article can be found here.
Watched a movie/series with subtitles and been amazed over how they magically pop up? Or wondered how can you add subtitles to your own videos? Follow-through and by the end of this article, you’ll be able to generate subtitles programmatically.
First Things First — Why prefer SRT File Format for Subtitles?
SRT is a widely accepted format for subtitles, compatible with most of the media players out there, and has immense SEO benefits. Blog — Understanding and Creating SRT Files not only explains SRT files and their benefits pretty well but also explains how to add them to your videos on various platforms such as — YouTube, Facebook, etc.

If you followed the mentioned blog, you would realize that in the age of automation, creating an SRT file involves a lot of manual labor. Can we somehow minimize these efforts?

There are two ways to do this —
- Train your own ML model — This requires a lot of data, manual labor to annotate the data (Irony), time, and frankly, a lot of money. This ain’t worth it for small scale applications/organizations.
- Use pre-trained APIs — Now there are multiple pre-trained APIs that can do this job efficiently. Added benefits are — they require less time to set up, easy to learn, and are cost-efficient. We will use one such API to generate subtitles — Google Cloud’s Speech-to-Text API.
Let’s Get Started!
Pre-requisites
- You need to have Git, Python 3.7 and ffmpeg installed on your system.
- You need to have a Google Cloud project with billing enabled. Follow Creating and managing projects to set this up.
- Also, a service account with the right to use Speech-to-Text API. Download the service account credentials as credentials.json. Follow Creating and managing service accounts to set this up.
Setting Up the Environment
Enable the Speech-to-Text API in your Google Cloud Project. From the navigation bar, go to APIs & Services > Library > Cloud Speech-to-Text API and Click on Enable.
Now, run below commands from your Terminal
- Clone the repository —
git clone [email protected]:darshan-majithiya/Generate-SRT-File-using-Google-Cloud-s-Speech-to-Text-API.git- Install the requirements —
cd Generate-SRT-File-using-Google-Cloud-s-Speech-to-Text-API
pip install -r requirements.txt- Move your credentials.json here and then export the credentials —
export GOOGLE_APPLICATION_CREDENTIALS="credentials.json"Data Preparation
I’m a Suits fan so I’ll use this video for the demonstration. Feel free to use any other video.
I’ll download this video using the pytube3 module.




