avatarSixing Huang

Summary

The article outlines the process of integrating Alan's speech-to-text engine into the medical chatbot Doctor.ai to enhance its speech recognition capabilities, particularly for medical terminology.

Abstract

The integration of Alan's speech-to-text engine into Doctor.ai, a medical chatbot, significantly improves the accuracy of speech recognition during medical conversations. The article, authored by Sixing Huang and Liang Li, details the shortcomings of Chrome's speech-to-text engine in the context of Doctor.ai, which struggled with accurately transcribing medical jargon. Alan AI's technology, with its specialized Domain Language Model, is shown to be superior in understanding both general and technical language, including complex biomedical terms. The authors provide a step-by-step guide on how to incorporate the Alan button into Doctor.ai's frontend, demonstrating the ease of extending Doctor.ai's functionality due to its modular design and API-driven communication. The article also touches on the cost-effectiveness of the project under Alan's Developer Plan and the potential to create multilingual versions of the chatbot, with instructions for setting up an English and German interface.

Opinions

  • The authors express dissatisfaction with Chrome's speech-to-text engine, citing its tendency to drop and misinterpret words, leading to incoherent sentences.
  • They are impressed with Alan AI's voice capturing accuracy, noting its ability to correctly interpret complex medical terms and even self-correct minor errors.
  • The authors advocate for the modularity and flexibility of Doctor.ai, emphasizing its ease of enhancement and the potential for developers to contribute to its improvement.
  • There is a clear endorsement of Alan AI as a solution for developers seeking to improve speech recognition in their applications, especially for specialized domains like healthcare.
  • The article suggests that the integration of Alan AI not only improves user experience but also represents a significant step forward in the practicality and efficiency of voice chatbots in medical settings.

How to Integrate Alan’s Speech-to-Text Engine into Doctor.ai

Improve Doctor.ai’s speech recognition with the highly accurate Alan AI

By Sixing Huang and Liang Li

Photo by Gritte on Unsplash

For a voice chatbot, its ability to accurately capture the speaker’s utterances can make or break the user experience. An intelligent transcriber is always fun to talk to. And it is a productivity boost because speaking is about two to three times faster than typing.

In contrast, a choppy speech-to-text engine, such as the one from Chrome, can quickly frustrate users. In this case, the user will soon realize that the time spent in the correction is more than the time gained from the dictation. As a result, he will always type and not speak. And that defeats the whole purpose of a voice chatbot.

During the development of our medical chatbot Doctor.ai (1, 2, 3, 4, 5, and 6), we have noticed early on that the speech-to-text engine in Chrome is not great. It dropped and misinterpreted words. It formed incoherent sentences that make no sense. Its performance in medical conversations, which Doctor.ai needed the most, was abysmal.

We have been on the hunt for a better engine. This new engine should not only excel in normal conversations but also understand the common medical jargon, such as names of diseases, drugs, and pathogens. And our search has been rewarded with the Alan AI.

Figure 1. Doctor.ai with Alan button in action. Image by the author.
Figure 2. Alan captures medical jargon, such as frontal sinus and Doxepin. Image by the author.

According to its website, Alan is a conversational voice AI platform. Its Spoken Language Understanding (SLU) is designed to process the error-prone output of Automatic Speech Recognition (ASR). And it has a so-called Domain Language Model to better recognize the specialized language, dynamically adapting to users’ conversational style.

We are impressed by Alan’s highly accurate voice capturing (Figure 1). It outperforms Chrome entirely in both normal and technical conversations. It sailed through many biomedical jargons such as “photosynthesis”, “frontal sinus”, “Doxepin” and “cowpox” (Figure 2). What amazes us the most is that, as you can see in Figure 1, even though Alan got some wrong words here and there, it was able to correct the mistakes and form coherent sentences in the end.

Figure 3. The architecture of Doctor.ai with Alan. Image by author.

In this article, we are going to show you how to integrate the Alan button into Doctor.ai’s frontend to improve user experience (Figure 3). It understands English. If you have the Enterprise version of Alan, you can make a German version, too. The project does not cost money with Alan’s Developer Plan. The code for this project is hosted on the GitHub repository here.

1. Get Alan’s SDK key

Go to Alan’s website and sign up. And get the “Developer Plan” by following the instructions. This plan gives you over ten thousand free interactions in Alan. Once you are in the Alan Studio, click Create Voice Assistant and name it like doctorai_en.

Figure 4. How to create an Alan project. Image by author.

Once inside the Studio, create an “Alan Integrations” Project. Delete all the original scripts on the left panel and create a new one. Copy and paste Code 1 into the content. This script captures the user’s speeches and silences Alan’s voice responses. Finally, click the </> Integrations button.

Figure 5. How to edit the Alan script. Image by author.

Once in the Integration page, change the Microphone timeout to 3 sec so that Alan will not capture the voice response from Doctor.ai. You can change the language, too. Finally, copy the SDK Key.

Figure 6. Configure the Integration parameters in Alan. Image by the author.

2. Change the frontend of Doctor.ai

We have updated the code in Doctor.ai’s frontend to accommodate the Alan button. You can find this new code in the repository. Here is a brief rundown of the changes.

Figure 7. Changes in Doctor.ai’s app.js. Image by the author.

The chatbot is initialized like in the previous articles except for a new ref. Next, we use Alan’s parsed event to capture the speech and transfer the transcript to the chatbot’s input field. And finally, we add the Alan button to the DOM.

3. Start the Amplify frontend

Now let’s set up the Amplify frontend and test Alan. Fork our repository to your own GitHub account. Setup Doctor.ai as described in this article. Add the following environment variables during the setup:

So it should look like this in your Configure build settings page:

Figure 8. Setting the environment variables in Amplify. Image by the author.

Once Amplify finishes the setup, head over to your chatbot’s URL and have fun! Notice that after Alan captures your speech, the transcript will appear in the input field. You can examine or correct the transcript before sending it to the chatbot.

Figure 9. Doctor.ai with Alan button in English. Image by the author.

4. The German version

If you have Alan’s Enterprise version, you can also easily set up the German version. Create a new project in Alan Studio and change the language to de (Figure 10). Copy and save the SDK Key.

Figure 10. Change the language in Alan to German. Image by the author.

During the setup in Amplify, don’t forget to change the REACT_APP_LANGUAGE to German.

Figure 11. Change the language to German in Amplify. Image by the author.

Now, you can chat in German, too.

Conclusion

In this article, we show you how to integrate the speech-to-text engine from Alan into Doctor.ai. This engine can capture our speeches with a much higher success rate than that from Chrome. Even though Alan sometimes misinterprets some words, we can easily correct them in the input field. This addition boosts the efficiency of our interactions with Doctor.ai.

This project also shows how easy it is to extend Doctor.ai. Doctor.ai is modular and communicates via APIs. Developers can switch individual components accordingly. And we can easily add elements to its Amplify frontend in order to enhance the user experience. So we encourage you to try Doctor.ai and tell us your experience with it.

The Chinese version of this article is here.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Join our community Discord.

Alan
JavaScript
Neo4j
NLP
Programming
Recommended from ReadMedium