An Introduction to pyttsx3: A Text-To-Speech Converter for Python

A Python-based alternative for text-to-speech conversion that works offline

This article is a guide for dummies to learn text-to-speech conversion in Python. By reading this article, you’ll be exposed to the proper steps to set it up and get to know some of the basic functionalities available in the module. According to the official site for pyttsx3:

“ … is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.”

In addition, this module has been tested and is known to work on the following systems:

SAPI5 on Windows XP, Windows Vista, and Windows 8, 8.1, 10
NSSpeechSynthesizer on Mac OS X 10.5 (Leopard) and 10.6 (Snow Leopard)
eSpeak on Ubuntu Desktop Edition 8.10 (Intrepid), 9.04 (Jaunty), and 9.10 (Karmic)

I’ll demonstrate the capabilities of this module using the following languages:

English (male, female)
Chinese (female)
Japanese (female)

There are four sections in this tutorial:

Setup
Basic API
Results
Conclusion

Let’s move on to the next section to find out more.

1. Setup

The proper way to install this module is by cloning the repository and installing via the setup Python file.

This is mainly because the published version in PyPI isn’t the latest version at the time of this writing. You can do a pip install in the future if it the developer has published the latest version on PyPi. EDIT: The package has been updated as of July 2020. You can install it easily via the following command.

pip install pyttsx3

Python module

First, clone the repository from the official GitHub site, and unzip it to a directory of your choice.

Then, open up a command line, and activate the virtual environment you prefer. Once you’re done, change the directory to the root of the master folder. It should have a file called setup.py. Run the following command:

python setup.py install

Once the installation has completed, kindly check if you have the following module:

pip show pyttsx3

If you’re using Windows, you need to verify the pypiwin32 module as well.

pip show  pypiwin32

Kindly install it using the following command if the module is not present:

pip install  pypiwin32

Text to speech

The text-to-speech features for this module are based on languages installed in your operating system.

By default, it should come together with the language pack during the installation of the operating system. You need to install the language pack manually if you intend to use other languages.

For Windows user, head over to the Language setting. You should be able to see the following user interface.

Apart from the default language pack (English), I have installed additional languages (such as Chinese and Japanese). Click on the Add button to add a preferred language. You should see the following pop-up.

Make sure the language pack has the text-to-speech icon.

In the example given above, we can see French (France) supports text to speech in the language pack. You can select it and click Next to install the language pack.

Let’s proceed to the next section to start testing the available API.

2. Basic API

In this section, we’ll be exploring some of the useful functionalities provided in the module. If you installed this module via the pip method, some of the features won’t be available. Kindly refer to the previous section to do a proper installation before you continue.

Import

Let’s start with a simple import statement

import pyttsx3

Initialization

The initialization is pretty straightforward — you can just type the following code:

engine = pyttsx3.init()

If you encountered errors related to a missing driver, you can try to pass in the driver’s name. There are four available built-in drivers:

dummy.py — Test driver that does nothing. You can use the structure as reference to implement your own driver.
espeak.py — Driver for other platforms, such as Ubuntu
nsss.py — Driver for MacOS
sapi5.py — Driver for Windows operating system

Example for initialization with the driver’s name is as follows:

engine = pyttsx3.init("sapi5")

Text to speech

Let’s try out the following code to say something simple. Save it in a Python file, and run it.

import pyttsx3
engine = pyttsx3.init()

engine.say('Welcome to Medium')
engine.runAndWait()

You should be able to hear the voice say, “Welcome to Medium.”

Property

The engine object has the following variables that can be fine-tuned:

rate — Integer speech rate in words per minute. The base value is 200.
voice — String identifier of the active voice
volume — Floating point volume in the range of 0.0 to 1.0 inclusive
voices — List of pyttsx3.voice.Voice descriptor objects

You can set them via the setProperty function.

engine.setProperty('rate', 150)

You can get the value of the property using the getProperty function.

engine.getProperty('volume')

Voice property

The voice property has the following variables:

age — Integer age of the voice in years. Defaults to None if unknown.
gender — String gender of the voice: male, female, or neutral. Defaults to None if unknown.
id — String identifier of the voice. Used to set the active voice via pyttsx3.engine.Engine.setPropertyValue(). This attribute is always defined.
languages — List of string languages supported by this voice. Defaults to an empty list of unknown.
name — Human-readable name of the voice. Defaults to None if unknown.

You can use the following code to identify all the available voices in your system. Add the following code after the init function.

voices = engine.getProperty('voices')
for voice in voices:
    print("Voice: %s" % voice.name)
    print(" - ID: %s" % voice.id)
    print(" - Languages: %s" % voice.languages)
    print(" - Gender: %s" % voice.gender)
    print(" - Age: %s" % voice.age)
    print("\n")

You should see the following output when you run it (depends on the language packs you’ve installed)

Let’s try changing the voice to Zira (female version). You need to use the ID as a parameter, and the order will be exactly what we’ve printed out just now. I will be passing 1 as the index for Zira based on the following order:

David (English male)
Zira (English male)
Haruka (Japanese)
Huihui (Chinese)

Let’s add the following code after the for loop and before the say function.

engine.setProperty("voice", voices[1].id)

Rerun the file, and you should hear a female voice saying “Welcome to Medium.” Feel free to input your own text and test the text-to-speech conversion.

Save to a file

The developer has added a new functionality to save the audio stream to a file for MacOS and Windows. You can easily call it via the save_to_file function. Make sure that you placed it before the engine.runAndWait() function. Let’s try it out!

engine.save_to_file("How do you do?", "output.mp3")

The function accepts two parameters:

text — The text for text-to-speech conversion
filename — The name of the file. You can use either the mp3 or wav extension.

3. Result

I just tested the program on the following data in various language:

Simple greetings
Random text
A paragraph from a short story or light novel
A paragraph from a news article

I am linking the results to my GitHub repository. Feel free to check them out.

English (male)

English (female)

Japanese

Chinese

Based on the results, this module works great for news and professional articles. The emotion is quite lacking, and the pacing isn’t that great — especially for Japanese. I’ve purposely placed quite a lot of commas in between sentences to break up the sentences further. Feel free to test it on your own.

4. Conclusion

Let’s recap what we’ve learned today. First, we started off with installing the Python module and the language pack for the language we desired.

We moved on to learn some of the available functions provided by the module. We learned to set and get the property of the object as well as the text-to-speech function. Besides, we also tried outputting the available voices present in the operating system.

Other than that, I also linked a few examples I’ve tested out on four different domains for three languages.

Have fun testing it on your own, and hope you enjoyed this article. See you again in the next article.

An Introduction to pyttsx3: A Text-To-Speech Converter for Python

A Python-based alternative for text-to-speech conversion that works offline

1. Setup

Python module

Text to speech

2. Basic API

Import

Initialization

Text to speech

Property

Voice property

Save to a file

3. Result

English (male)

English (female)

Japanese

Chinese

4. Conclusion

References