avatarFabio Chiusano

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1963

Abstract

rch.</li><li><a href="https://github.com/mozilla/DeepSpeech">DeepSpeech</a>: was originally a paper about speech recognition techniques produced by Baidu’s research team. DeepSpeech can run offline and on devices. DeepSpeech works on a wide range of devices from Raspberry Pi devices to actual GPUs that are used to train models in the industry.</li><li><a href="https://github.com/speechbrain/speechbrain">SpeechBrain</a>: it’s an open-source and all-in-one speech <a href="https://mila.quebec/en/article/introducing-speechbrain-a-general-purpose-pytorch-speech-processing-toolkit/">toolkit</a>. It is designed to make the research and development of neural speech processing technologies easier by being simple, flexible, user-friendly, and well-documented. Integrates with HuggingFace transformers.</li><li><a href="https://github.com/Uberi/speech_recognition">SpeechRecognition</a>: open-source wrapper of various speech recognition APIs, both open-source and closed-source cloud solutions.</li></ul><p id="4ac8">You can find more comparisons of open-source speech recognition libraries <a href="https://www.assemblyai.com/blog/the-state-of-python-speech-recognition-in-2021/">here</a>.</p><p id="d58a"><b>Cloud-based Speech Recognition</b></p><p id="d1fe">Cloud solutions for building a speech recognition project have the big advantage of being easy to use, more accurate than open-source options, and don’t require you to host any models on your own hardware. The main drawback of some cloud solutions is the cost.</p><p id="1903">Examples of closed-source cloud solutions are <a href="https://cloud.google.com/speech-to-text">Google Cloud Speech-to-Text API</a>, <a href="https://wit.ai/">Wit.ai</a>, <a href="https://azure.microsoft.com/en-us/services/cognitive-services/speech/">Microsoft Azure Speech</a>, <a href="https://houndify.com/">Houndify API</a>, and <a href="https://cloud.ibm.com/apidocs/speech-to-text">IBM Speech to Text</a>.</p><p id="5579

Options

"><b>Two minutes NLP related posts</b></p><div id="e658" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-building-blocks-to-train-a-paraphrases-generation-model-effortlessly-ad7b1f0fc8da"> <div> <div> <h2>Two minutes NLP — Building blocks to train a paraphrases generation model effortlessly</h2> <div><h3>T5, BART, and PEGASUS</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*yjSOQNYTLfqbqmEz)"></div> </div> </div> </a> </div><div id="7540" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-quick-tips-to-make-your-semantic-search-projects-painless-2563cede8f23"> <div> <div> <h2>Two minutes NLP — Quick tips to make your semantic search projects painless</h2> <div><h3>Semantic search, embeddings, symmetric vs asymmetric search, and embeddings storage</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*k0iiSf5REVL1FNY0)"></div> </div> </div> </a> </div><figure id="1cc7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*jmTMjY3IBUx9VJ4F.png"><figcaption>NLPlanet logo.</figcaption></figure><p id="5fc1"><i>Stay up to date with the latest stories about applied Natural Language Processing and join the NLPlanet community on <a href="https://www.linkedin.com/company/nlplanet">LinkedIn</a>, <a href="https://twitter.com/nlplanet_">Twitter</a>, <a href="https://www.facebook.com/NLPlanet-113393687828458">Facebook</a>, and <a href="https://t.me/nlplanet">Telegram</a>.</i></p></article></body>

Two minutes NLP — Speech Recognition options with Python

DeepSpeech, SpeechBrain, SpeechRecognition, Speech-to-Text APIs

Photo by Soundtrap on Unsplash

Speech-related tasks overview

Automatic Speech Recognition (ASR) is the task of transforming speech to text. Other common speech-related tasks are:

  • Spoken Language Understanding: speech-to-semantics.
  • Speaker Recognition: identifying or verifying speaker identities from speech recordings.
  • Speech Enhancement: improving the quality of the speech signal by removing noise.
  • Speech Separation: separating multiple speakers speaking at the same time.
  • Speaker Diarization: detecting who spoke when.
  • Multi-microphone signal processing: combining the information recorded by multiple microphones.

Open-source Speech Recognition

The biggest drawback of open-source solutions is that the computing power required to do speech recognition will have to come from your hardware. Another important consideration is that open-source speech recognition options are usually less accurate than cloud-based API options. You’re probably better off with a cloud solution if accuracy is important to your project.

  • CMU Sphinx: collects over 20 years of CMU research. Some advantages of this library: CMUSphinx tools are designed specifically for low-resource platforms, flexible design, and focus on practical application development and not on research.
  • DeepSpeech: was originally a paper about speech recognition techniques produced by Baidu’s research team. DeepSpeech can run offline and on devices. DeepSpeech works on a wide range of devices from Raspberry Pi devices to actual GPUs that are used to train models in the industry.
  • SpeechBrain: it’s an open-source and all-in-one speech toolkit. It is designed to make the research and development of neural speech processing technologies easier by being simple, flexible, user-friendly, and well-documented. Integrates with HuggingFace transformers.
  • SpeechRecognition: open-source wrapper of various speech recognition APIs, both open-source and closed-source cloud solutions.

You can find more comparisons of open-source speech recognition libraries here.

Cloud-based Speech Recognition

Cloud solutions for building a speech recognition project have the big advantage of being easy to use, more accurate than open-source options, and don’t require you to host any models on your own hardware. The main drawback of some cloud solutions is the cost.

Examples of closed-source cloud solutions are Google Cloud Speech-to-Text API, Wit.ai, Microsoft Azure Speech, Houndify API, and IBM Speech to Text.

Two minutes NLP related posts

NLPlanet logo.

Stay up to date with the latest stories about applied Natural Language Processing and join the NLPlanet community on LinkedIn, Twitter, Facebook, and Telegram.

NLP
Data Sicence
Speech Recognition
Transformers
Python
Recommended from ReadMedium