Create a speaking Linux shell script with Google
After using google to translate text and decode audio into text, now we are about to make a small text to speech engine that uses google too. This is easy and pretty useful. Besides, it supports all languages (it is google, right?)
Let's agree that the script we are about to create should accept 2 arguments: text to transform (required) and input language (optional). If the language is not specified, it would use english as the default value. You should know that the text is going to be transported into a get request to google servers, so we should think to url-encode it, and here is how
rawurlencode() { local string="${1}" local strlen=${#string} local encoded="" for (( pos=0 ; pos<strlen ; pos++ )); do c=${string:$pos:1} case "$c" in [-_.~a-zA-Z0-9] ) o="${c}" ;; * ) printf -v o '%%%02x' "'$c" esac encoded+="${o}" done echo "${encoded}" }
This function doesn't rely on any external tools and it does the job very well! Next thing is to check the inputs, i mean language and text
if [ -z "$1" ] then echo "No text specified, exiting" exit else TEXT=$( rawurlencode "$1" ) fiif [ -z "$2" ] then echo "No language supplied, using en" LANG="en" else LANG="$2" fi
We set the script to expect a first parameter and store it in TEXT variable after url-encoding it. If text in not specified, the script would exit. Then it checks if a language is specified and store it in LANG variable, otherwise LANG would equal en. Now let's see how is interacting with google TTS engine is possible. After a minute with the google translator web page, i realised that was not hard to get at all. The api url looks like this
API="http://translate.google.com/translate_tts?ie=UTF-8&tl=$LANG&q=$TEXT"
As you can see, this new API variable holds the url to be used later. This url returns an MP3 file, so we should save it temporally, best way to choose a name for it is to use MD5 hash
hash="$(echo -n "$TEXT" | md5sum )"
Before proceeding, i should tell you that the above url is protected against wget, it only accepts requests from browser, but when was this a problem? Let's define the UserAgent string to use
UA="Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
This is the latest Google Chrome version in the time i'm writing this. It is even not stable yet lol. Time to make the long waited request!
wget -o /dev/null --user-agent="$UA" -O "/tmp/$hash.mp3" "$API"
Now we tell wget to grab the audio file and store it /tmp folder with the hash name we have already created. The request will go as if it was issued through a browser thanks to the --user-agent option. Now we need a tool that would play a sound file in cli mode, i have chosen a lightweight one called mpg123, go ahead and install it if you didn't
sudo apt-get install mpg123
Let's get back to the talking script, we know exactly where we have left the audio file, time to give mpg123 a try
mpg123 -q "/tmp/$hash.mp3"
Hurray! this is the part of the script that plays the sound, this is exciting! one last thing to do is to remove the temporary audio file
rm "/tmp/$hash.mp3"
Now the script is ready to use! check my gist here to find a full script file, download it and test it, you may probably want to hear things like
./googleTTS.sh "damn i'm a geek" ./googleTTS.sh "j'aime google!" fr
don't forget to make the script file executable before running it
chmod +x googleTTS.sh
This would be really powerful if combined with the previous voice to text decoder You can make much more improvements on this one, maybe a caching system is an obvious need. I should leave you playing around with it, you can use a spell checker (i would explain how if you want) with it or you can put it everywhere on your system and feel like Tony Stark lol
Leave a comment in the box below to tell us how was your experience with this, maybe you want to subscribe or follow me on social medias too.