Use Google Cloud Text-to-Speech in Jetpack Compose
Hey! Take your coffee ☕️ and see how to implement Google Cloud Text-to-Speech (TTS) in Jetpack Compose.

Set up
Before writing some code, we need to create a Google Cloud project and enable the Google Text-to-Speech plugin.
After that click on Create Credentials. Complete the fields by your needs.

This will redirect us to creating a service account. We need this in order to use the TTS feature in the app. Complete these based on your needs.

Now click on Credentials and then click on the service account you’ve created.

Here we need to click on the Keys tab and create a Key for the service account.
Select JSON for the key type, then click create. We generated a JSON because we will introduce this into our application.

Open your Android Studio project and create a new Android Resource Directory in the res
package. Now change the resource type to raw
.

After that, drag your key into it.
Dependencies
Let’s open :app/build.gradle.kts
and add these to your dependencies
block.
dependencies {
implementation("com.google.cloud:google-cloud-texttospeech:2.19.0")
implementation("com.google.auth:google-auth-library-oauth2-http:1.16.0")
implementation("io.grpc:grpc-okhttp:1.55.1")
}
Also, you must exclude these otherwise the app won’t run.
android {
// ...
packaging {
resources {
excludes += "/META-INF/{AL2.0,LGPL2.1}"
excludes += "META-INF/INDEX.LIST"
excludes += "META-INF/DEPENDENCIES"
}
}
}
Now let’s create a class called TextToSpeech
. Its constructor has a context, that is used to access the raw
folder for the key.
class TextToSpeech(private val context: Context) {}
Let’s start by creating the synthesize
function. This receives the text that we want to be transformed into speech.
suspend fun synthesize(text: String): ByteArray? = withContext(Dispatchers.IO) {
try {
// Code
}
catch (e: ApiException) {
// Handle the ApiException
println("An API error occurred: ${e.message}")
throw e
} catch (e: StatusRuntimeException) {
// Handle the StatusRuntimeException
println("An error occurred: ${e.message}")
throw e
}
}
Firstly, let’s create the credentials.
try {
val stream: InputStream = context.resources.openRawResource(R.raw.credentials)
val credentials: GoogleCredentials = GoogleCredentials.fromStream(stream)
.createScoped(listOf("https://www.googleapis.com/auth/cloud-platform"))
// ...
}
Now let’s create the settings and the client.
try {
// ...
val settingBuilder: TextToSpeechSettings.Builder = TextToSpeechSettings.newBuilder()
val sessionsSettings =
settingBuilder
.setCredentialsProvider(FixedCredentialsProvider.create(credentials))
.build()
val client = TextToSpeechClient.create(sessionsSettings)
// ...
}
Here is how you can set a specific voice and a type of audio encoding.
try {
// ...
val voiceBuilder = VoiceSelectionParams.newBuilder()
.setName("en-US-Studio-M")
.setLanguageCode("en-US")
val audioConfig = AudioConfig.newBuilder()
.setAudioEncoding(AudioEncoding.MP3).build()
// ...
}
Before continuing with the synthesize
function, we need to create another one called splitText
. As the name suggests, this will split the text into an array of strings. We are doing this because if you have a big text the Google TTS won’t be able to synthesize it.
private val maxTextLength = 500 // Maximum length for each synthesis input
private fun splitText(text: String): List<String> {
val inputTexts = mutableListOf<String>()
var startIndex = 0
var endIndex = maxTextLength
while (startIndex < text.length) {
if (endIndex >= text.length) {
endIndex = text.length
} else {
while (endIndex > startIndex && !text[endIndex].isWhitespace()) {
endIndex--
}
}
val inputText = text.substring(startIndex, endIndex)
inputTexts.add(inputText.trim())
startIndex = endIndex + 1
endIndex = startIndex + maxTextLength
}
return inputTexts
}
Now let’s continue with the synthesize
function. Let’s call the new function that we’ve created, then iterate through the array of strings. While iterating we’re making requests to the API to synthesize the piece of text.
try {
// ...
val inputTexts = splitText(text)
val audioResults = mutableListOf<ByteArray>()
for (inputText in inputTexts) {
val input: SynthesisInput = SynthesisInput.newBuilder()
.setText(inputText)
.build()
val response = client.synthesizeSpeech(input, voiceBuilder.build(), audioConfig)
audioResults.add(response.audioContent.toByteArray())
}
// ...
}
The last thing we have to do in this function is to return the byteArray. This holds the audio pieces synthesized by the API.
try {
// ...
// Concatenate the audio results
val byteArrayOutputStream = ByteArrayOutputStream()
for (audioResult in audioResults) {
byteArrayOutputStream.write(audioResult)
}
return@withContext byteArrayOutputStream.toByteArray()
}
Create the UI
Let’s create a new composable and then get the context. Then, create a MediaPlayer
that is used to play the sound. Also, we need a coroutine scope.
@Composable
fun MyScreen() {
val context = LocalContext.current
var mediaPlayer by remember {
mutableStateOf<MediaPlayer?>(null)
}
val coroutineScope = rememberCoroutineScope()
}
Now let’s create a simple Row
that has a Text
near an IconButton
.
Row(
modifier = Modifier.fillMaxSize(),
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.Center
) {
Text(text = "Hey, my name is Daniel!")
IconButton(
onClick = {
// Code
}
) {
Icon(
imageVector = Icons.Rounded.VolumeUp,
contentDescription = null
)
}
}
In the onClick
function, launch a coroutine scope, and let’s synthesize our text.
coroutineScope.launch {
val audioTask = async {
TextToSpeech(context = context)
.synthesize(
text = "Hey, my name is Daniel!"
)
}
val audio = audioTask.await()
// ...
}
Now let’s write and put the audio into a file that we’ll be used by the MediaPlayer
to read out loud.
coroutineScope.launch {
// ...
val outputFile =
File(
context.getExternalFilesDir(null),
"output.mp3"
)
val outputStream = FileOutputStream(outputFile)
outputStream.write(audio)
outputStream.close()
// ...
}
The last we have to do is to create the MediaPlayer
and play the sound.
coroutineScope.launch {
// ...
mediaPlayer = MediaPlayer.create(
context,
Uri.fromFile(outputFile)
)
mediaPlayer?.start()
}
I hope this article helped in your development journey. Remember to stay updated on my latest content by following me and subscribing to the newsletter. Thank you for reading!
I also run a YouTube channel dedicated to Android Development where I share informative content. If you’re interested in expanding your knowledge in this field, be sure to subscribe to my channel.
If you want to support me, I would appreciate a coffee! ☕️
