avatarDaniel Atitienei

Summary

The web content provides a comprehensive guide on implementing Google Cloud Text-to-Speech (TTS) in a Jetpack Compose application, including setup instructions, necessary dependencies, and UI integration.

Abstract

The article titled "Use Google Cloud Text-to-Speech in Jetpack Compose" outlines the process of integrating text-to-speech functionality into an Android application using Jetpack Compose. It begins by instructing developers to set up a Google Cloud project, enable the Google Text-to-Speech plugin, and create credentials for API access. The necessary dependencies for the project are listed, including specific versions for the Google Cloud Text-to-Speech library, Google authentication library, and gRPC. The article details the creation of a TextToSpeech class, which handles the synthesis of text into speech, including the management of credentials, settings, and client initialization. It also addresses the handling of large texts by splitting them into manageable segments for the TTS API. The final section of the article demonstrates how to create a user interface in Jetpack Compose that utilizes the TextToSpeech class to convert text to speech and play it using a MediaPlayer. The author encourages readers to follow them for more content and to subscribe to their YouTube channel for further Android development tutorials.

Opinions

  • The author expresses the importance of following the setup steps carefully to ensure proper functionality of the TTS feature.
  • The article implies that the Google Cloud Text-to-Speech API is a robust solution for TTS needs in Android applications.
  • The author suggests that splitting large texts into smaller chunks is a necessary step due to API limitations on text size.
  • The author provides their personal perspective on the value of TTS in applications by guiding readers through the implementation process.
  • By offering a detailed walkthrough, the author conveys a positive opinion about the ease of integrating Google Cloud services with Jetpack Compose.
  • The author's invitation to follow their work and subscribe to their YouTube channel indicates a commitment to continuous learning and community engagement in the field of Android development.

Use Google Cloud Text-to-Speech in Jetpack Compose

Hey! Take your coffee ☕️ and see how to implement Google Cloud Text-to-Speech (TTS) in Jetpack Compose.

Set up

Before writing some code, we need to create a Google Cloud project and enable the Google Text-to-Speech plugin.

After that click on Create Credentials. Complete the fields by your needs.

This will redirect us to creating a service account. We need this in order to use the TTS feature in the app. Complete these based on your needs.

Now click on Credentials and then click on the service account you’ve created.

Here we need to click on the Keys tab and create a Key for the service account.

Select JSON for the key type, then click create. We generated a JSON because we will introduce this into our application.

Open your Android Studio project and create a new Android Resource Directory in the res package. Now change the resource type to raw.

After that, drag your key into it.

Dependencies

Let’s open :app/build.gradle.kts and add these to your dependencies block.

dependencies {
    implementation("com.google.cloud:google-cloud-texttospeech:2.19.0")
    implementation("com.google.auth:google-auth-library-oauth2-http:1.16.0")
    implementation("io.grpc:grpc-okhttp:1.55.1")
}

Also, you must exclude these otherwise the app won’t run.

android {
    // ...
    packaging {
        resources {
            excludes += "/META-INF/{AL2.0,LGPL2.1}"
            excludes += "META-INF/INDEX.LIST"
            excludes += "META-INF/DEPENDENCIES"
        }
    }
}

Now let’s create a class called TextToSpeech. Its constructor has a context, that is used to access the raw folder for the key.

class TextToSpeech(private val context: Context) {}

Let’s start by creating the synthesize function. This receives the text that we want to be transformed into speech.

suspend fun synthesize(text: String): ByteArray? = withContext(Dispatchers.IO) {
        try {
            // Code
        }
        catch (e: ApiException) {
            // Handle the ApiException
            println("An API error occurred: ${e.message}")
            throw e
        } catch (e: StatusRuntimeException) {
            // Handle the StatusRuntimeException
            println("An error occurred: ${e.message}")
            throw e
        }
    }

Firstly, let’s create the credentials.

try {
    val stream: InputStream = context.resources.openRawResource(R.raw.credentials)
    val credentials: GoogleCredentials = GoogleCredentials.fromStream(stream)
        .createScoped(listOf("https://www.googleapis.com/auth/cloud-platform"))

    // ...
}

Now let’s create the settings and the client.

try {
    // ...

    val settingBuilder: TextToSpeechSettings.Builder = TextToSpeechSettings.newBuilder()
    val sessionsSettings =
        settingBuilder
            .setCredentialsProvider(FixedCredentialsProvider.create(credentials))
            .build()
    
    val client = TextToSpeechClient.create(sessionsSettings)

    // ...
}

Here is how you can set a specific voice and a type of audio encoding.

try {
    // ...

    val voiceBuilder = VoiceSelectionParams.newBuilder()
        .setName("en-US-Studio-M")
        .setLanguageCode("en-US")
    
    val audioConfig = AudioConfig.newBuilder()
        .setAudioEncoding(AudioEncoding.MP3).build()

    // ...
}

Before continuing with the synthesize function, we need to create another one called splitText. As the name suggests, this will split the text into an array of strings. We are doing this because if you have a big text the Google TTS won’t be able to synthesize it.

private val maxTextLength = 500 // Maximum length for each synthesis input

private fun splitText(text: String): List<String> {
    val inputTexts = mutableListOf<String>()
    var startIndex = 0
    var endIndex = maxTextLength

    while (startIndex < text.length) {
        if (endIndex >= text.length) {
            endIndex = text.length
        } else {
            while (endIndex > startIndex && !text[endIndex].isWhitespace()) {
                endIndex--
            }
        }

        val inputText = text.substring(startIndex, endIndex)
        inputTexts.add(inputText.trim())

        startIndex = endIndex + 1
        endIndex = startIndex + maxTextLength
    }

    return inputTexts
}

Now let’s continue with the synthesize function. Let’s call the new function that we’ve created, then iterate through the array of strings. While iterating we’re making requests to the API to synthesize the piece of text.

try {
    // ...

    val inputTexts = splitText(text)
    val audioResults = mutableListOf<ByteArray>()
    
    for (inputText in inputTexts) {
        val input: SynthesisInput = SynthesisInput.newBuilder()
            .setText(inputText)
            .build()
    
        val response = client.synthesizeSpeech(input, voiceBuilder.build(), audioConfig)
        audioResults.add(response.audioContent.toByteArray())
    }

    // ...
}

The last thing we have to do in this function is to return the byteArray. This holds the audio pieces synthesized by the API.

try {
    // ...

    // Concatenate the audio results
    val byteArrayOutputStream = ByteArrayOutputStream()

    for (audioResult in audioResults) {
        byteArrayOutputStream.write(audioResult)
    }
    
    return@withContext byteArrayOutputStream.toByteArray()
}

Create the UI

Let’s create a new composable and then get the context. Then, create a MediaPlayer that is used to play the sound. Also, we need a coroutine scope.

@Composable
fun MyScreen() {
    val context = LocalContext.current

    var mediaPlayer by remember {
        mutableStateOf<MediaPlayer?>(null)
    }

    val coroutineScope = rememberCoroutineScope()
}

Now let’s create a simple Row that has a Text near an IconButton.

Row(
    modifier = Modifier.fillMaxSize(),
    verticalAlignment = Alignment.CenterVertically,
    horizontalArrangement = Arrangement.Center
) {
    Text(text = "Hey, my name is Daniel!")
    IconButton(
        onClick = {
            // Code
        }
    ) {
        Icon(
            imageVector = Icons.Rounded.VolumeUp,
            contentDescription = null
        )
    }
}

In the onClick function, launch a coroutine scope, and let’s synthesize our text.

coroutineScope.launch {
    val audioTask = async {
        TextToSpeech(context = context)
            .synthesize(
                text = "Hey, my name is Daniel!"
            )
    }

    val audio = audioTask.await()

    // ...
}

Now let’s write and put the audio into a file that we’ll be used by the MediaPlayer to read out loud.

coroutineScope.launch {
    // ...

    val outputFile =
        File(
            context.getExternalFilesDir(null),
            "output.mp3"
        )
    val outputStream = FileOutputStream(outputFile)
    outputStream.write(audio)
    outputStream.close()

    // ...
}

The last we have to do is to create the MediaPlayer and play the sound.

coroutineScope.launch {
    // ...
    mediaPlayer = MediaPlayer.create(
        context,
        Uri.fromFile(outputFile)
    )

    mediaPlayer?.start()
}

I hope this article helped in your development journey. Remember to stay updated on my latest content by following me and subscribing to the newsletter. Thank you for reading!

I also run a YouTube channel dedicated to Android Development where I share informative content. If you’re interested in expanding your knowledge in this field, be sure to subscribe to my channel.

If you want to support me, I would appreciate a coffee! ☕️

Jetpack Compose
Programming
Kotlin
Technology
Android App Development
Recommended from ReadMedium