Monday, January 14, 2013

TTS - Free Text to Speech for Digium Switchvox

The natural progression for a new telephone system will typically find it's way to some form of dynamic IVR that looks to a database for the information.  We've had our Digium Switchvox for a couple years now and feel pretty good about writing basic IVRs. Now it's time for some cool text-to-speech.

Why do we need TTS (Text to speech)?  Simple.  In a basic IVR,  you can record someone saying things that never change like "Press one to reach customer service."  If you want to have a bit more dynamic text coming back to the end user like "Your account is in great standing.  Your account balance is $123.45 and you have one pending claim...." or something.

After a few weeks of research,  I found a couple of solutions.
  1. Buy a TTS engine that runs on your server.  This will cost anywhere from $1,000 and up.
  2. Use a free TTS web service.  I couldn't find any voices that actually could be understood so this wasn't good for us. 
  3. Use a paid TTS web service.  Well, cost is always an issue but more importantly we didn't want to rely on the performance of an internet-based web service to feed a dynamic IVR.
  4. Use the TTS built into microsoft dot net framework.  Sounds great but requires a physical server with a sound card. Not good for us since we're 100% virtual server based.
  5. Write our own. 
Now option 5 seems a bit daunting.  I mean,  how do you write a TTS engine?  Simple answer is you don't.  What you can do is package some free components to make it all work.  Here's what we did:

Get the free TTS engine called eSpeak from here: http://espeak.sourceforge.net/
...and installed it on our IIS server.  If you want to see how it works,  just install it on your PC and try out the command line.

Then we used visual studio to write [or enhance in our case] our http listener for the TTS requests.  We wanted a REST like solution so it would integrate well into the IVR on the Switchvox.

Some code would be nice right? Here you go. This is what the asp.net listener looks like:

 Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

If Not IsPostBack Then
           
     Dim myWords As String = Request.QueryString("myWords")

     FileName &= Trim(Now().Year.ToString) & Trim(Now().DayOfYear.ToString) &              Trim(Now().Hour.ToString) & Trim(Now().Minute.ToString) & Trim(Now().Millisecond.ToString)
                        

     FileName &= ".wav"
 

     Dim p As New Diagnostics.Process
     ' s = speed
     ' p = pitch
     ' a = amplitude or volume
                       
     Dim args As String = "-v en-us -s 150 -a 120  -w " & FileName & " """ & myWords & """ "
     p.StartInfo.Arguments = args
     p.StartInfo.FileName = "d:/data/espeak/command_line/espeak.exe"
     p.StartInfo.UseShellExecute = False
     p.StartInfo.CreateNoWindow = True
     p.StartInfo.RedirectStandardError = True

     p.Start()

     Dim ttsErrors As String = p.StandardError.ReadToEnd
     p.WaitForExit()
 

     Response.Clear()
     Response.ClearHeaders()
     Response.ContentType = "audio/wav"
     Response.AddHeader("Content-Disposition", "inline; filename=test.wav")
     Response.TransmitFile(FileName)
     Response.End()



Pretty easy.  If your listener is called tts.aspx,  you just call it with:

tts.aspx?myWords='Hello World'

...and he returns a wav file.


How do you integrate it into the Switchvox? Simple.  In your IVR add an action type of 'Play Sound From URL' and add the line we just made:

http://mysite.mydomain.com/tts.aspx?myWords='Hello World'

Pretty cool? YES
Free? YES
Supports VMWare servers? YES
LAN based? YES
Works well with asp.net, Visual Studio and IIS? YES


Peace.

Dan

No comments: