Hacking Google Voice API in Linux
You should have seen voice-aware input zones coming with the new google chrome release about a month ago. Yeah it's a cool way to input text easily without typing for long seconds, with the opportunity to get search results for "laughable clothes" when you say "fashionable clothes". Seriously i cannot see how this is useful, especially when it comes to desktop PCs.
But there's a good guy on the internet who happily made good use of it. He made a shell script that listens to your voice and use Google Voice API to decode it and convert it to text. I will be explaining this hack he made so you all can make good use of it.
First thing is we need a url for the API, do we define the API variable
API="http://www.google.com/speech-api/v1/recognize?lang=en"
Note that at the end of it there is this lang parameter, we can make our script more efficient if it would be able to handle multiple languages, let's put it in a variable, or maybe get it passed as an argument :)
if [ -z "$1" ] then echo "No language supplied, using en\n" LANG="en" else echo "using $1 as language\n" LANG="$1" fi API="http://www.google.com/speech-api/v1/recognize?lang=$LANG"
Now we need to send to this url a sound file containing our voice. But it's not that simple of course, we need:
- arecord to record our voice over the mic
- flac to convert the file format
- wget to interact with the api
Make sure these 3 packages are installed, if not, you can always use your package manager like apt-get to install it. The reason we're converting the file into flac format is that is required by the API itself. Now let's mix things together!
JSON=`arecord -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac;\ wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" "$API"`
As you can see, we did good so far and the script will receive the response in JSON format, so we need to parse it using sed and awk. I already wrote an article about sed here, you want to check it out. This may look freaky but it does the job
UTTERANCE=`echo $JSON\ |sed -e 's/[{}]/''/g'\ |awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }'\ |awk -F: 'NR==3 { print $3; exit }'\ |sed -e 's/["]/''/g'` echo "utterance: $UTTERANCE"
Yeah now we had our script to echo the text! That seems pretty geeky, but how can this be useful? Controlling our PC maybe? why not! To do that we must define string to which the script compares the final text, if it matches one of the strings, it executes the corresponding command.
CMD_LIST_DIRECTORY="list directory" CMD_WHOAMI="who am i" if [ `echo "$UTTERANCE" | grep -ic "^$CMD_LIST_DIRECTORY$"` -gt 0 ]; then ls . elif [ `echo "$UTTERANCE" | grep -ic "^$CMD_WHOAMI$"` -gt 0 ]; then whoami fi
We can define countless numbers of commands, i will be working on using arrays for this (maybe one of you can do it for us :) ). You can find a complete script here if you are too lazy to save a new file :p
Guess what, we just made good use of Google Voice API! I will leave you to test it, improve it and why not share it. Your comments are welcome.