Summary

The article provides a comprehensive guide on using the Montreal Forced Aligner (MFA) for a low-resource language, specifically Sinhalese, by detailing the creation of a speech corpus and pronunciation dictionary, and the installation and usage of MFA with Miniconda.

Abstract

The author of the article shares their experience in adapting the Montreal Forced Aligner (MFA) for use with the Sinhalese language, which is not natively supported by MFA. The process involves sourcing a speech corpus from an existing Sinhalese Text To Speech dataset and modifying it to fit the required format. Additionally, the article discusses the creation or acquisition of a pronunciation dictionary, which is crucial for forced alignment. The author notes the availability of a pre-existing dictionary for Sinhalese, simplifying this step. The article also provides step-by-step instructions for installing MFA using Miniconda, validating the speech corpus and dictionary, and training the acoustic model. The author concludes by sharing their personal experience with MFA's performance on different operating systems and provides resources for further reading.

Opinions

The author acknowledges the challenge of working with a low-resource language like Sinhalese in MFA and provides a centralized resource to streamline the process.
They express gratitude for the pre-existing Sinhalese pronunciation dictionary, which significantly eases the task of forced alignment for Sinhalese.
The author notes that MFA is primarily used by researchers and suggests that it may have more robust support and fewer bugs on macOS compared to other operating systems.
They share their own difficulties in running MFA on Ubuntu machines and Google Colab, implying that these platforms may not be as reliable for this tool.
The article encourages reader engagement by inviting them to applaud, comment, and share the article if they find it helpful.

How I used Montreal Forced Aligner for a New Language (Sinhalese)

Hi There,

Let’s see how to use Montreal Forced Aligner(MFA) with a low-resource language or a completely new language. You can find everything online like how I did. But it would take more time, So I am going to give you all the resources in one place here.

MFA is a fantastic tool for forced alignments. According to the User Guides of MFA, Forced alignment is a technique to take an orthographic transcription of an audio file and generate a time-aligned version using a pronunciation dictionary to look up phones for words. I see a good article to understand forced alignment here.

In MFA documentation, we can see there are four use cases of using this tool. If your language is listed in MFA acoustic models, MFA dictionaries, and MFA G2P models your life will be easy, you can use steps 1 or 2 to continue with this tool. But what if your language is not listed in the above places. Huh! This makes your life harder. Today I am going to discuss this topic. I recently used this tool and I used this with the Sinhalese language. The Sinhalese language is not listed in any of the above links and my only option was to use this tool in 3rd use case of the MFA documentation.

If your language is not listed in the above links, you have to follow 3rd use case. That means you must have,

A Speech Corpus. (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html#corpus-structure)
A Pronunciation Dictionary. (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/dictionary.html#dictionary-format)

Speech Corpus

The speech corpus formats have been given in the above link. I used the Prosodylab-aligner format. That means I have the audio file in the .wav extension and its transcription should be in a .lab format file. I found a Sinhalese Text To Speech data set from here. I am going to use this data set for this forced alignment. You can find the transcriptions of the audio files in the prompts.txt file in this data set. I did some modifications to this file by doing some manipulations with excel. and I created a metadata.csv file. You can find it here. Then I wrote a simple python script to write each transcription to each file to get the speech corpus format as requested. This python manipulation was done in Google colab here. Download the transcriptions.zip file. extract it and paste the files into your corpus.

Pronunciation Dictionary

You have to find a pronunciation dictionary for your language. This link provides a sample dictionary for the English language. If it is not available to your language, you have to generate the pronunciation dictionary on your own. This might be challenging with your language. There is an online tool XPF that can be used for generating transcriptions. This has a pretty cool web interface, If your language is there, you can generate the transcriptions. The other alternative is if the orthography is pretty transparent would be to just write your rule-based G2P script (i.e. all ක symbols go to k a). For the Sinhalese language, the pronunciation dictionary is already there in this link. Luckily previous researchers have done that for my language. 😇

Okay, now we have the speech corpus and pronunciation dictionary with us. Now you need to install Miniconda to your machine. All the instructions for downloading the Miniconda can be found in the Conda documentation.

Install MFA with Miniconda

Create a new environment and install MFA:

conda create -n aligner -c conda-forge montreal-forced-aligner

2. Ensure you’re in the new environment created

conda activate aligner

3. Check whether MFA is installed successfully.

mfa — help

The above command should list down the set of options we can do with MFA.

The rest of the article will be discussed assuming your speech corpus is in the following location

~/mfa_data/speech_corpus

and your pronunciation dictionary are in the following location.

~/mfa_data/pronunciation_dictionary.txt

The first data set should be validated to make sure whether the data set is in the proper format with MFA. For validating you should run the following command first.

mfa validate ~/mfa_data/speech_corpus ~/mfa_data/pronunciation_dictionary.txt

This command will look through the corpus and make sure that MFA is parsing everything correctly. After this command, will give a summary of the validation.

Summary of the corpus and the dictionary

Then we can run the commands to generate acoustic models and training alignments.

To export just the trained acoustic model, run the following command

mfa train ~/mfa_data/speech_corpus ~/mfa_data/pronunciation_dictionary.txt ~/mfa_data/new_acoustic_model.zip

To export just the training alignments, run the following command

mfa train ~/mfa_data/speech_corpus ~/mfa_data/pronunciation_dictionary.txt ~/mfa_data/my_corpus_aligned

To export both trained model and alignments, run following command

mfa train ~/mfa_data/speech_corpus ~/mfa_data/pronunciation_dictionary.txt ~/mfa_data/new_acoustic_model.zip ~/mfa_data/my_corpus_aligned

The above commands are referenced from here.

According to my experience of using MFA, it is hard to run on ubuntu machines. Also, I tried Google colab, But it is also not running as expected. MFA runs smoothly in macOS,. This tool is mainly used by researchers and its bugs have been solved in macOS, But I have doubts about other operating systems.

For further reading, refer to the following

Hope You enjoyed this article and this has been helpful If so give this a clap, comment, and share.

How I used Montreal Forced Aligner for a New Language (Sinhalese)

Speech Corpus

Pronunciation Dictionary

Install MFA with Miniconda

Thank you! Happy Coding!