Text to speech for Chinese Mandarin

:information_source: Attention Topic was automatically imported from the old Question2Answer platform.
:bust_in_silhouette: Asked By Ragnar Brynjúlfsson

I’m working on a simple Chinese Mandarin memory game, and I was wondering if there was a way to get text-to-speech working for Mandarin in Godot? For now I’m only aiming for PC’s (Linux, Windows, OSX), so I don’t need it to work on mobile.

I have a list of 5000 words (HKS 1-6) I’m using, so Shanghaiing a Chinese friend to record them all, and then organizing the audio files sounds a bit daunting to be honest, so I would prefer to use text-to-speech if possible.

I’m new to Godot and this forum, so; Hi all (wave) :slight_smile:

p.s. Oh, and I don’t speak Chinese, so please answer in English.

:bust_in_silhouette: Reply From: davidoc

You need:
a) A TTS library that supports chinese and works in your supported platforms
b) Import it to Godot (as a module, using GDNative or using C#)

or online

a) A TTS service (like Google cloud)
b) Consume the service from Godot, or you can create an application to automate the audio files creation and organization.

I created a Lipsync project and released the source code, I commented the code of TTS because mono doesn’t support System.Speech but you may find it helpful.

Thanks for the info. I looked a bit into different solutions, but I’m going to leave it out for now (or at least put it at the very end of my ToDo list) as it quickly became a bit too involved for my simple little game. :slight_smile:

I did find some useful stuff though. The sample in the Godot documentation on how to link to an external library is actually for the Festival text-to-speech library. Unfortunately Festival doesn’t support Mandarin, but Ekho (an open source tts library that can be linked to Festival) does. The Google cloud text-to-speech also looks interesting too, but costs a bit for each character spoken.

Ragnar Brynjúlfsson | 2020-05-21 11:56

:bust_in_silhouette: Reply From: Ragnar Brynjúlfsson

I found a solution that should work for my use case. :slight_smile:

My game is not reading out arbitrary sentences, but just a fixed list of words, so it made sense for me to use text-to-speech to pre-generate all the .ogg files I needed. It does add a bit to the size of the game, but saves me from having to link to external libraries or services, making it easier to port the game to any OS or device.

I wound up using Google cloud doing basically what they do here using Python, and simply looping over my word list. You can find a list of available languages and voices here. I did this on Linux, setting up a virtualenv and installing the google cloud Python API using pip (Google has docs for how to set this up as I don’t remember all the steps I did).

I did get random StatusCode.UNAVAILABLE every now and then, so I had to resume the process manually from where it left of, just removing the words from my list it was done with, but it wasn’t too bad as I only had 5000 words to go through (a try/except with a long pause and a retry, would probably work too).

NOTE! Don’t use OGG_VORBIS format, as the OGG files generated by the text-to-speech engine don’t work with Godot (see my bug report for details). You can always download them as WAV (LINEAR16), and then batch convert them to ogg afterwards, which works fine. I haven’t tested with MP3, so it might or might not work.