avatarRonni Souers

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6742

Abstract

m Resemble.AI’s <a href="https://www.resemble.ai/marketplace/">voice marketplace</a>. After all, their website promises me that their “wide range of digital voices optimized for different genres” will ensure that my “audiobook will sound just as [I] imagined it.”</p><p id="7142">I can also use their <a href="https://www.resemble.ai/api/">programming interface</a> to dramatize my voice clone’s performance <i>or </i>the performances of the other AI narrators available in their marketplace.</p><p id="f0c9">One of the available tools in their API is the <a href="https://www.youtube.com/watch?v=a0SZ7FFjSfA">emotion gradient</a>, through which I can add emotion, emphasis, and dramatic pauses. Emotion involves the expressiveness (pitch), aggressiveness (loudness), and pitch (speed), so all these factors can be manipulated at various levels. If I use this feature, I can highlight passages, individual sentences, or even individual words to manipulate them. This gives me a good amount of control over what the AI narrator sounds like.</p><p id="5a78">Some readers might be worried about the AI narrator messing up the author’s words. This is a valid concern. The AI can<i> </i>read certain words, including homonyms, incorrectly. One of the examples Resemble.AI gives (seen in <a href="https://www.youtube.com/watch?v=a0SZ7FFjSfA">this video</a>) is the sentence, “The dove was flying low and dove into the ocean.” If given this sentence, the AI <i>might </i>read both forms of “dove” as either the noun <i>dove</i> or<i> </i>the verb <i>dove</i>, so Resemble.AI has a feature where I can click on words to reveal a dropdown menu from which I can select the correct part of speech. (I’m not sure if this would work for the saying, “tomatoes, tomatoes,” though!)</p><p id="b718">If, after doing all of the above steps, I’m still unsatisfied — if I think the narration sounds <i>too robotic, </i>for example<i> </i>— I have even more options.</p><p id="c3b0">Resemble.AI’s <a href="https://www.youtube.com/watch?v=b0RXdElB2z8">speech-to-speech engine</a> might be what I need. The speech-to-speech engine can create “AI voiceovers that truly match human-like performances, including all the imperfections of human speech.” Such imperfections <a href="https://www.resemble.ai/speech-to-speech/">include</a> every “subtlety, chatty expression, accent, and inflection” of the original speech…all of those are reflected in the AI-generated speech.</p><p id="50d0">Of course, using the speech-to-speech engine <i>will </i>add a substantial amount of time to the audiobook creation process, but it’s an option with amazing benefits.</p><p id="9840">This engine will also allow me to <a href="https://www.resemble.ai/voice-changer/"><i>modulate</i></a><i> </i>the sound of my voice by enabling me to “change the intonation, add inflection, and modify the pitch.” I can use voice skins (like the angry, aggressive skin or the super-friendly skin) to add emotion.</p><p id="ec0e">What’s more: Resemble.AI users can seamlessly <a href="https://www.resemble.ai/speech-to-speech/">combine</a> speech-to-speech voice generation <i>with </i>text-to-speech voice generation, enabling them to “create unique human-like vocalizations without compromising automation, quality, or the speed” of the text-to-speech system. This means that I can use text-to-speech for certain parts of my book and speech-to-speech for other parts of my book (the parts in which I want the narration to sound a specific way).</p><p id="4787">My contemporary fiction novel uses three alternating points of view, so I am <i>very </i>pleased with Resemble.AI’s offerings, as I want the narrator for each point of view to sound different. With the tech’s modulating features, voice skins, and synthetic voices, I can easily make each of my point-of-view characters sound distinct from one another, even if I use my own voice clone as the foundation for all three voices.</p><p id="a695">Furthermore, Resemble.AI’s localization feature means I can translate my book into up to 60 languages to make it more accessible to readers around the world. With Localize, I can actually hear myself (meaning: my voice clone) speak Japanese. Woah.</p><p id="7d81">“We work closely with authors and publishers to ensure that the final product is true to the original work and does justice to the author’s voice,” Resemble.AI promises on their website’s <a href="https://www.resemble.ai/audiobooks/">page on audiobook</a>s. I, for one, am curious to seeing just how much companies like theirs contribute to the changing audiobook landscape.</p><h1 id="4946">Ethics of voice cloning technology</h1><p id="eff4">Of course, Resemble.AI’s technology calls various ethics into question.</p><p id="c250">The ethics of voice cloning in general are already very murky, as the potential misuses and abuses of voice clones could have terrible consequences (which is perhaps why Resemble.AI has recently unveiled <a href="https://www.resemble.ai/detect/">Resemble Detect</a>, which can supposedly detect deepfake audio). For example, voice clones can be used for various forms of deception, including manipulating children, engaging in fraudulent activities, and creating deepfake content to spread hate speech and incite violence.</p><p id="ce00">Because the downsides of voice clones are <i>so incredibly ugly</i>, concerns about companies like Resemble.AI — which are making this tech easy to access and use — are absolutely valid.</p><p id="bb66">Resemble.AI does have any entire page devoted to ethics statements, and on that page, they provide a list of forbidden uses for their AI:</p><blockquote id="7aa6"><p>You can not use AI Voices built by Resemble for:</p></blockquote><blockquote id="f8ac"><p>· claiming to be from any person, company, administration, or entity without explicit authorization to make this statement and/or impersonating to gain illegal information or privileges;</p></blockquote><blockquote id="743d"><p>· propagating hate speech;</p></blockquote><blockquote id="3fdd"><p>· discrimination, libel, terrorism, or violent activities;</p></blockquote><blockquote id="feeb"><p>· spreading unattributed content or misrepresenting sources.</p></blockquote><blockquote id="638a"><p>· exploiting or manipulating children;</p></blockquote><blockquote id="0813"><p>· making unsolicited phone calls, vast communications, postings, or messages;</p></blockquote><blockquote id="de6e"><p>· deceiving or deliberately misleading people.</p></blockquote><p id="d506">This list can’t outright prevent people from using their tech for these purposes; however, users who get caught misusing the tech will probably have their accounts suspended (or possibly face worse consequences if legal agencie

Options

s are involved; such legal consequences might depend on the evolution of regulations surrounding voice cloning).</p><p id="4366">There are other ethical concerns, too (some less ugly than those mentioned above), which are directly related to the creation of audiobooks using tech like that of Resemble.AI’s.</p><p id="e763">If you use Resemble.AI to create an audiobook using your own voice, then once you upload your voice clone onto Resemble.AI’s site, it could potentially be stolen via data breach and used for other (potentially malicious) purposes. (Still, depending on how many recordings of your voice are already publicly available or taken via illegal or covert means, other AI voice generators could be trained on your voice whether you offer it up freely or not.)</p><p id="8ab7">Have people whose voices have been used to train the company’s AI voices basically relinquished control of their voices? Must they worry about their voices becoming forever associated with terrible books (and, possibly, books with harmful content/agendas)? This seems to be the case.</p><p id="9cd8">There will be lost jobs and lost income, although voice cloning itself represents a new source of income. Resemble.AI has paired up with various voice actors to create the available synthetic human-based voices in their voice AI marketplace, so presumably, the voice actors whose voices were used to train the AI are making passive income off their AI counterparts. This might serve some voice actors well in the long run, but I predict that the ease of selling one’s voice clone (and the lack of true voice acting required in this process) might quickly saturate the AI voice market and make it difficult for new voice actors to take advantage of the passive income available to voice cloners. Meanwhile, what will happen to the voice actors who currently gain a lot of their income from endeavors like audiobook creation? Will AI voice clones greatly reduce the available jobs in this industry?</p><p id="e009">I’d imagine that some celebrities might jump on this bandwagon to make some extra income (as long as they don’t take issue with their voices being used to narrate content they have no control over). If, say, Morgan Freeman’s voice is available to read nonfiction titles, how will that shake up the pricing models of software like Resemble.AI’s? How will this affect the competition between voice clones? Will audiobook creators favor more-established voices over those of lesser-knowns?</p><p id="2405">How much will AI-generated books saturate the audiobook market? My guess is: quite a lot. I firmly believe that the publishing industry doesn’t always care about good writing and that a lot of good writers never achieve traditional publication despite their best efforts. Meanwhile, a lot of crap is published. Tech like that of Resemble.AI <i>could </i>help some of those hidden-gem authors get their books into the hands of more readers, but it will also pave the way for even more crap — crap which may end up further burying the hidden-gem authors, consequently <i>preventing</i> them from reaching more readers.</p><p id="8a63">We’ll also have to see how audiobook publishing platforms responds to the upcoming influx of AI-generated audiobooks. Will they create any new regulations surrounding AI-generated audiobooks? How will such books be marketed and sold in comparison to human-narrated audiobooks?</p><p id="b6a7">Finally, I’m sure some readers will completely denounce AI-narrated books in favor of their human-narrated counterparts. Similarly, real-life narrators who sound “too robotic” might be shunned alongside AI narrators. (If “human-sounding” humans become the new thing, we can all laugh.)</p><p id="fd7e">I’ve had a difficult time naming my own position on AI, as I often feel conflicted about new AI technologies. I think AI has the potential to do great things, but I also fear it, particularly the way humans will abuse it.</p><p id="9234">I think I most consider myself an AI defeatist: I realize AI is here to stay and will impact the world in unchangeable ways, so why not make the most of it and focus on its positive impacts?</p><p id="7012">If, for example, I want to use AI to cheaply and efficiently fulfill my dream of creating an audiobook, then why not? From my view, I had better get on the bandwagon quickly, before everyone and their uncle decides to release an AI-narrated book. That day might be here sooner than we think.</p><ul><li><i>Please note that I am in no way affiliated with Resemble.AI. I’ve been following their company closely because I’m apprehensive and excited about what they’re doing. This article is </i>not <i>an ad.</i></li></ul><h2 id="a5cd">More From The Generator</h2><div id="22a3" class="link-block"> <a href="https://readmedium.com/how-to-increase-the-size-of-your-midjourney-images-6499f067175a"> <div> <div> <h2>How to Increase the Size of Your Midjourney Images</h2> <div><h3>Upscale your images to print resolution</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*4GWm_S_T2Dfp70_F38uiyg.png)"></div> </div> </div> </a> </div><div id="12d8" class="link-block"> <a href="https://readmedium.com/i-tested-originality-ais-ai-detector-95725011bbe7"> <div> <div> <h2>I Tested Originality.AI’s AI Detector</h2> <div><h3>My honest review on the service</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*52fYKy5XefA-UU2mnJgJbw.png)"></div> </div> </div> </a> </div><div id="8cb4" class="link-block"> <a href="https://readmedium.com/how-will-ai-affect-workers-tech-waves-of-the-past-show-how-unpredictable-the-path-can-be-3518e111a21a"> <div> <div> <h2>How will AI affect workers? Tech waves of the past show how unpredictable the path can be</h2> <div><h3>New digital technologies have been a constant for workers over the past few decades, with a mixed record on the economy…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*GoLhU71dDhc8xXzR.jpg)"></div> </div> </div> </a> </div></article></body>

AI Voice Cloning is About to Shake Up the Audiobook Industry

Companies like Resemble.AI will make audiobook creation more affordable and less complex for indie authors

Photo by Karolina Grabowska: https://www.pexels.com/photo/woman-narrating-story-while-recording-audiobook-4476366/ (edited by the author)

Audiobooks have skyrocketed in popularity in recent years, for various reasons. With engaging narration and other incorporated sound effects, they can be far more immersive than text-based books. People can listen to audiobooks while doing other (usually non-language-based) tasks, which makes them a great choice for busy people or people who want to squeeze in more books than they have time to read. And for blind and sight-impaired readers, they are a cheap alternative to the oh-so-expensive Braille books.

I used to be an audiobook naysayer naysayer who felt like reading should be confined to — you know — reading, but my mind has been changed. I’ve learned to love audiobooks and now consider the active listening involved in consuming them to count as reading. Audiobooks have become one of my number one defenses against my insomnia, as I can use them to get through the sleepless nights or to evade insomnia entirely (using them to distract myself from my midnight waking thoughts or to lull myself to sleep with soothing epic-fantasy narrator voices).

My love for audiobooks means that the number of books I read annually has nearly tripled.

This is all to say that I recognize the greatness of audiobooks, and this explains why, as an indie author, I’ve always wanted to turn one of my own fiction novels into an audiobook. But the endeavor (the creation plus production part) is not cheap!

If you want to create an audiobook, you must pay for a voice actor or voice actors to narrate your story. You must pay for a recording studio or recording equipment (unless the voice actor handles that for you as part of their fee, which they often do). You must pay for an audio engineer/producer to edit the book, and costs go up if you want to add other features, like additional sound effects (again, many narrators also work as producers and edit the audio as part of their fee). If you care about the way your book is read, you must put time and effort into adapting your book into an audio script that provides the narrator(s) with instructions on how the book should be narrated.

You can do some or all this work yourself, but narrating and audio editing are both difficult and time-consuming feats with their own learning curves and costs involved.

Modern platforms like ACX have greatly simplified this process, but even with the simplifications, the process can still be quite costly, and it isn’t without complications.

Voice AI and audiobooks

Generative voice AI has major implications for audiobook creation, and there are many companies currently working on this tech.

Resemble.AI is one such company, self-proclaimed to be “taking generative voice AI to a new level.” (This is the company that recreated Andy Warhol’s voice for the Netflix documentary series The Andy Warhol Diaries.)

While Resemble.AI’s generative voice AI toolkit has many applications, one of the use cases is audiobook creation for authors and publishers. That’s right: AI-generated audiobooks.

The increasing availability of tech like theirs means that the cost-prohibitiveness of audiobook creation will be greatly reduced, as will the complicated processes involved.

Resemble.AI’s website says, “With Resemble AI’s digital narration technology, millions of books that were previously unheard can now be easily converted into audiobooks. Independent authors and small publishers who previously couldn’t afford the cost and complexity of audiobook production can now make their books available to the growing audience of audiobook listeners.”

By streamlining the production process, Resemble.AI will significantly reduce the time and monetary investments traditionally associated with audiobook creation. With the ability to create AI voices within minutes after data submission, authors (including indie authors) can swiftly transform their stories into captivating audiobooks, which will hopefully allow them to reach a wider audience without breaking the bank.

Resemble.AI’s tech is currently open only to select users, but get ready: with increasing access to tech like theirs, the audiobook industry is about to see a massive increase in AI-narrated books.

What could the AI-narrated audiobook creation process look like?

Picture this: I want to turn my contemporary fiction novel into an audiobook that uses AI narration.

To begin, I have two options: I can clone my own voice or use (an)other AI voice(s).

I decide to use my own voice clone to narrate the audiobook. With Resemble.AI, the process is simple. I can either record my voice using the web recorder or upload existing audio data featuring my voice. I choose the former option, which takes me just minutes (and 25 sentences) of voice recordings to train the AI.

It takes only a little longer for the AI to generate a complete audiobook using my AI voice clone using its text-to-speech software.

Viola! In no time at all, I have the initial recordings for my audiobook. To ensure it’s to my satisfaction, I listen to it.

If I’m satisfied, then yay! I’ve just completed a good chunk of my audiobook.

If I’m not satisfied, then I might decide to use a different voice from Resemble.AI’s voice marketplace. After all, their website promises me that their “wide range of digital voices optimized for different genres” will ensure that my “audiobook will sound just as [I] imagined it.”

I can also use their programming interface to dramatize my voice clone’s performance or the performances of the other AI narrators available in their marketplace.

One of the available tools in their API is the emotion gradient, through which I can add emotion, emphasis, and dramatic pauses. Emotion involves the expressiveness (pitch), aggressiveness (loudness), and pitch (speed), so all these factors can be manipulated at various levels. If I use this feature, I can highlight passages, individual sentences, or even individual words to manipulate them. This gives me a good amount of control over what the AI narrator sounds like.

Some readers might be worried about the AI narrator messing up the author’s words. This is a valid concern. The AI can read certain words, including homonyms, incorrectly. One of the examples Resemble.AI gives (seen in this video) is the sentence, “The dove was flying low and dove into the ocean.” If given this sentence, the AI might read both forms of “dove” as either the noun dove or the verb dove, so Resemble.AI has a feature where I can click on words to reveal a dropdown menu from which I can select the correct part of speech. (I’m not sure if this would work for the saying, “tomatoes, tomatoes,” though!)

If, after doing all of the above steps, I’m still unsatisfied — if I think the narration sounds too robotic, for example — I have even more options.

Resemble.AI’s speech-to-speech engine might be what I need. The speech-to-speech engine can create “AI voiceovers that truly match human-like performances, including all the imperfections of human speech.” Such imperfections include every “subtlety, chatty expression, accent, and inflection” of the original speech…all of those are reflected in the AI-generated speech.

Of course, using the speech-to-speech engine will add a substantial amount of time to the audiobook creation process, but it’s an option with amazing benefits.

This engine will also allow me to modulate the sound of my voice by enabling me to “change the intonation, add inflection, and modify the pitch.” I can use voice skins (like the angry, aggressive skin or the super-friendly skin) to add emotion.

What’s more: Resemble.AI users can seamlessly combine speech-to-speech voice generation with text-to-speech voice generation, enabling them to “create unique human-like vocalizations without compromising automation, quality, or the speed” of the text-to-speech system. This means that I can use text-to-speech for certain parts of my book and speech-to-speech for other parts of my book (the parts in which I want the narration to sound a specific way).

My contemporary fiction novel uses three alternating points of view, so I am very pleased with Resemble.AI’s offerings, as I want the narrator for each point of view to sound different. With the tech’s modulating features, voice skins, and synthetic voices, I can easily make each of my point-of-view characters sound distinct from one another, even if I use my own voice clone as the foundation for all three voices.

Furthermore, Resemble.AI’s localization feature means I can translate my book into up to 60 languages to make it more accessible to readers around the world. With Localize, I can actually hear myself (meaning: my voice clone) speak Japanese. Woah.

“We work closely with authors and publishers to ensure that the final product is true to the original work and does justice to the author’s voice,” Resemble.AI promises on their website’s page on audiobooks. I, for one, am curious to seeing just how much companies like theirs contribute to the changing audiobook landscape.

Ethics of voice cloning technology

Of course, Resemble.AI’s technology calls various ethics into question.

The ethics of voice cloning in general are already very murky, as the potential misuses and abuses of voice clones could have terrible consequences (which is perhaps why Resemble.AI has recently unveiled Resemble Detect, which can supposedly detect deepfake audio). For example, voice clones can be used for various forms of deception, including manipulating children, engaging in fraudulent activities, and creating deepfake content to spread hate speech and incite violence.

Because the downsides of voice clones are so incredibly ugly, concerns about companies like Resemble.AI — which are making this tech easy to access and use — are absolutely valid.

Resemble.AI does have any entire page devoted to ethics statements, and on that page, they provide a list of forbidden uses for their AI:

You can not use AI Voices built by Resemble for:

· claiming to be from any person, company, administration, or entity without explicit authorization to make this statement and/or impersonating to gain illegal information or privileges;

· propagating hate speech;

· discrimination, libel, terrorism, or violent activities;

· spreading unattributed content or misrepresenting sources.

· exploiting or manipulating children;

· making unsolicited phone calls, vast communications, postings, or messages;

· deceiving or deliberately misleading people.

This list can’t outright prevent people from using their tech for these purposes; however, users who get caught misusing the tech will probably have their accounts suspended (or possibly face worse consequences if legal agencies are involved; such legal consequences might depend on the evolution of regulations surrounding voice cloning).

There are other ethical concerns, too (some less ugly than those mentioned above), which are directly related to the creation of audiobooks using tech like that of Resemble.AI’s.

If you use Resemble.AI to create an audiobook using your own voice, then once you upload your voice clone onto Resemble.AI’s site, it could potentially be stolen via data breach and used for other (potentially malicious) purposes. (Still, depending on how many recordings of your voice are already publicly available or taken via illegal or covert means, other AI voice generators could be trained on your voice whether you offer it up freely or not.)

Have people whose voices have been used to train the company’s AI voices basically relinquished control of their voices? Must they worry about their voices becoming forever associated with terrible books (and, possibly, books with harmful content/agendas)? This seems to be the case.

There will be lost jobs and lost income, although voice cloning itself represents a new source of income. Resemble.AI has paired up with various voice actors to create the available synthetic human-based voices in their voice AI marketplace, so presumably, the voice actors whose voices were used to train the AI are making passive income off their AI counterparts. This might serve some voice actors well in the long run, but I predict that the ease of selling one’s voice clone (and the lack of true voice acting required in this process) might quickly saturate the AI voice market and make it difficult for new voice actors to take advantage of the passive income available to voice cloners. Meanwhile, what will happen to the voice actors who currently gain a lot of their income from endeavors like audiobook creation? Will AI voice clones greatly reduce the available jobs in this industry?

I’d imagine that some celebrities might jump on this bandwagon to make some extra income (as long as they don’t take issue with their voices being used to narrate content they have no control over). If, say, Morgan Freeman’s voice is available to read nonfiction titles, how will that shake up the pricing models of software like Resemble.AI’s? How will this affect the competition between voice clones? Will audiobook creators favor more-established voices over those of lesser-knowns?

How much will AI-generated books saturate the audiobook market? My guess is: quite a lot. I firmly believe that the publishing industry doesn’t always care about good writing and that a lot of good writers never achieve traditional publication despite their best efforts. Meanwhile, a lot of crap is published. Tech like that of Resemble.AI could help some of those hidden-gem authors get their books into the hands of more readers, but it will also pave the way for even more crap — crap which may end up further burying the hidden-gem authors, consequently preventing them from reaching more readers.

We’ll also have to see how audiobook publishing platforms responds to the upcoming influx of AI-generated audiobooks. Will they create any new regulations surrounding AI-generated audiobooks? How will such books be marketed and sold in comparison to human-narrated audiobooks?

Finally, I’m sure some readers will completely denounce AI-narrated books in favor of their human-narrated counterparts. Similarly, real-life narrators who sound “too robotic” might be shunned alongside AI narrators. (If “human-sounding” humans become the new thing, we can all laugh.)

I’ve had a difficult time naming my own position on AI, as I often feel conflicted about new AI technologies. I think AI has the potential to do great things, but I also fear it, particularly the way humans will abuse it.

I think I most consider myself an AI defeatist: I realize AI is here to stay and will impact the world in unchangeable ways, so why not make the most of it and focus on its positive impacts?

If, for example, I want to use AI to cheaply and efficiently fulfill my dream of creating an audiobook, then why not? From my view, I had better get on the bandwagon quickly, before everyone and their uncle decides to release an AI-narrated book. That day might be here sooner than we think.

  • Please note that I am in no way affiliated with Resemble.AI. I’ve been following their company closely because I’m apprehensive and excited about what they’re doing. This article is not an ad.

More From The Generator

Voice Cloning
Generative Ai Use Cases
AI
Audiobooks
Generative Art
Recommended from ReadMedium