Progress on VoiceXML autotest using Bladeware

This is a continuation of this entry: http://jananuraga.blogspot.mx/2013/04/voicexml-autotest-switching-to.html

So, today I managed to get BladewareVxml running, tested with two samples vxml provided by BladewareVxml: helloworld.vxml and inline_grammar.vxml. The whole set of projects (bladeware, pcre, curl, xerces, spidermonkey) is also updated, and uploaded to Box.net, here: https://www.box.com/s/l9yniyq9pcbgb4nhro90

If you want to get them compiled and work with them on your PC, you will need Microsoft Visual Studio 2010. Just unpack the the file in C:\ (using 7-zip), and set the following environment variables:
  • BW_CURLDIR = C:\BladewareProject\curl-7.18.0
  • BW_JSDIR = C:\BladewareProject\bwjs-1.61.3
  • BW_PCREDIR = C:\BladewareProject\pcre-7.1
  • BW_VXMLDIR = C:\BladewareProject\bladeware-vxml
  • BW_XERCESDIR = C:\BladewareProject\xerces-c-src_2_8_0
Just open C:\BladewareProject\bladeware-vxml\vxml\BladewareVXML.sln on your Visual Studio 2010, and we will be on the same page. Just press Ctrl-F5 to start debugging.

Now, the samples vxml are available under C:\BladewareProject\bladeware-vxml\vxml\samples\ . Just copy them to your favorite webserver . In my case I copied them to my Jetty webserver under C:\jetty-distribution-9.0.2.v20130417\webapps\ROOT\samples , so those vxmls are accessible through the web under http://localhost:8080/samples/

The launcher of this Bladeware Vxml is available in the project named "client" (already set as the default StartUp project), a cpp file named runvxml.cpp. It takes various arguments for the execution, which you can set by: right-clicking on the project named ("client") >  Properties > Configuration Properties > Debugging > field named Command Arguments. Currently its value is: -config C:\BladewareProject\bladeware-vxml\vxml\etc\bvxml\runvxml_test.cfg -url  http://localhost:8080/samples/inline_grammar.vxml -channels 1 -calls 1

Now, from the execution of those 2 samples, what did we obtain? Hints to where we could possibly put our hooks. Have a look at these two screenshots. The next task would be to hunt down where those lines (marked with yellow box) are printed from. Those lines correspond to the events we're interested in.

I will be able to continue working on this next weekend. So, if you want to beat me to it before then, please do (and please share your findings). In the meantime I will be working on to get Pentaho BI stack running on my PC, part of my foray to big data analytics efforts :). Cya!




Warming up..., for my foray into big data analytics

Part of preparations for my foray into big data analytics is... mapping out the field. There are so many things to learn, so I have to get organized. I also need to bind together nicely in one single place things I've read so far (for further reference / simply to offload it from my head). This is just a note of my current understanding, which will evolve (corrected / refined) along the way. Without further ado, here it is:

---
A few additional words (added 1 May 2013)
---

As the name suggests: study-note. Has inaccuracies, incompleteness, some reasonings are not as strong as I would like them to be, etc. But I don't want to fall into analysis paralysis, so here it is my write-up, result of lots of googlings :), distilled....

My exercise laundry list:

(1) In datawarehousing, there's Pentaho. My reference is Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL ( http://amzn.to/104lh8k )

(2) In the area of NoSQL: my reference is "Seven Databases in Seven Weeks" ( http://amzn.to/10XxTWe ). Already finished reading the book. Now on to doing the exercises in the book. Will be focusing on PostgreSQL (as reference point, from the familiar SQL standpoint), and MongoDB (the most popular NoSQL db).

(3) In machine learning: my reference is "Data Mining: Practical Machine Learning Tools and Techniques" ( http://amzn.to/13OjqbM ). I'm using Weka and Knime. I have to be very pragmatic, lots of mathematical rigor in the book. In the short term, I only need to get myself familiared with the underlying idea behind those algorithms, and take advantage of already-implemented programming libraries (such as weka and mahout, to name a few). This, like techniques that is more close to art, is not something I can acquire in short time, because it depends on intuition, that requires time and lots of drills to develop.

(4) Data cleansing: key skill in data analysis. Thanks Google for giving Google Refine for free  ( https://code.google.com/p/google-refine/ ). My reference: "Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work" ( http://amzn.to/10Xz1ZI ).

Now... I need to switch back this weekend to finishing my VoiceXML autotest, before taking on that laundry list :D . Cya!

---

Note: the PDF can be downloaded from https://www.box.com/s/6lbafigfkbrta5cnll5u

VoiceXML Autotest: Switching to Bladeware VXML

UPDATE: Progress as of 5 May 2012 is available at: http://jananuraga.blogspot.mx/2013/05/progress-on-voicexml-autotest-using.html

---
Yep, I decided to switch to BladewareVxml to use as the basis for the VoiceXML Autotest I'm working on. Pragmatic reasons:

  • BladewareVxml is VoiceXML 2.1 compliant.
  • It's based on OpenVXI, which to my knowledge is used by many commercial products out there. So..., hopefully people would be more likely to use this tool.

What have I done so far? Very basic: get BladewareVxml compiled :) No brainer, I only had to fix a few build configurations, and one header file, VxiCommon.h, adding #include <iterator>. Download it from here: https://www.box.com/s/l9yniyq9pcbgb4nhro90

Next: I'm going to write a C++ interface and process with SWIG, first step toward integration with Python. The idea: software testers will write their test-script in Python, and that C++ interface will act as a bridge between that test script and the BladewareVXML interpreter object.

The C++ class that implements the interface will:
  • Accept (through its constructors) callback-function pointers written in python, such as:
    • readInput
    • renderPrompts
    • inputRecognized
    • inputNomatch
    • inputNoinput
    • transferPerformed
    • submit
  • Accept (through its constructors) VoiceXML platform properties.
  • Have methods that can be called from test script. These methods will interact with Bladeware VoiceXML interpreter object created during the construction:
    • runVxml
    • feedInputSpoken
    • feedInputDTMF
    • feedNoInput
    • hangup
  • Be implemented as singleton, allowing me to obtain a reference to it from any point in the Bladeware's Vxml code.
Before I get to that, I will have to study Bladeware Vxml code in debug mode. Now... where is the starting point(s)? :D Maybe I can start from these ones.... Ok, that's all for now, until next weekend.





GuiceXML & VoiceXML Autotest, (r.e).u.n.i.t.e.d. !

STOP PRESS :)  20 April 2013: I decided to use Bladeware Vxml instead of JVoiceXML as the engine for this tool. Click here to find out more about it.

***********************

Ok, quickie :)

GuiceXML: I already explained it in previous blog entry. This weekend I had a chance to make some improvement in the code, and it's available here for download: https://www.box.com/s/3bph4o8096489spnhip2 . Don't forget to download the sample VoiceXML files (just deploy them to your webserver), here: https://www.box.com/s/v0tmupsij9ogscullh1k .

I also had a chance to take a screencast of GuiceXML, so you can have a better idea of what it is. Here's the vid:



VoiceXML Autotest: it was explained in this old blog entry (actually it precedes GuiceXML). With that library you can check the VoiceXML flow against your expectations (scenario) expressed the following way:


The good things is: both things now share the same code base. :) 

Well, that's all for tonight. I'm sorry I don't have a chance tonight to write a bit about the code and where it's heading. I hope tommorrow night I'll be able to do it. Cya!

GuiceXML

UPDATE: GuiceXML and VoiceXML Autotest (re)united. This way. New code, screencast to give you a (more) concrete idea of what GuiceXML is about, and a screnshot of a code to express your voicexml automated-test scenario.

----- Original Entry -----

After a long hiatus, time to resurrect old projects. One of them is someting I name... GuiceXML. It's a word play of GUI + VoiceXML.... This is a cool (and useful) thing. Several companies are doing it, check out this one: http://fonolo.com/blog/2012/11/what-does-the-future-hold-for-visual-ivr/

VoiceXML is a language for describing flows of dialogs. Originally it's meant for interactive VOICE response (IVR). But..., it's quite easy to conceive these same flows of dialogs can be presented VISUALLY in a smartphone. We have proliferation of VoiceXML-based IVR; It's been around since 2000, nowadays majority of IVR are written in VoiceXML, so it's pretty standard, just as HTML is to websites. Also, smartphone is commonplace. Connect the dots, you can quickly see the opportunity for this kind of thing (an adaptor, VoiceXML app -> mobile app).

Of course in practice that wouldn't be as straightforward as it sounds. To begin with, inherently there's a difference between visual and audible user interaction. What are those fundamental differences? Well, I still have to read some more about it :), so can't write much now on that matter. This link in the meantime can provide some hints: http://www.informit.com/articles/article.aspx?p=26669 . So..., a good adaptor must provide a good API & tool to overcome this first challenge. I guess a significant amount of tinkering and work would be on tackling this challenge (note to product manager :)).

Maybe this technical description can illustrate some points. In GUI users are accustomed to seeing all the fields, that belong to the same form, on the same screen. Additionaly, in most cases users have freedom in deciding the sequence in which they fill-in the input fields. But that's normally not the case with VUI where users are kind of "forced" to give input one-by-one, in a predefined (and unflexible) sequence, eventhough there isn't dependency between each input. This is not a limitation of VoiceXML per se, but more of UI design (most VoiceXML applications are coded in that simplistic way). VoiceXML actually provides a way to implement mixed initiative dialog. Check out this link: http://www.vxml.org/t_20.htm

Alright, you might ask: why not simply apply some XSL transformations on the VoiceXML documents to produce HTML then?

Well, not that simple, and one of the reason is... something I would call the dynamism embedded in the VoiceXML spec itself, and without even requiring JavaScript (which is another thing that can be used inside a VoiceXML document). For example: an input field (), can have a "cond" attribute that specifies a Boolean expression, which must equate to 'true' in order for the content to be visited and executed. Furthermore, this expression can involve a value of another field(s). Meaning: in that case, there _is_ a dependency between fields. Contrast this to HTML where you would have to use some JavaScripts to provide similar behavior.

A good adaptor must know how to figure out whether or not there are dependencies between fields in a form, and adjust the way it asks for the inputs accordingly. We can think of parsing the fields in a form to see if any of them have "cond" attribute for example; if not, then it might be safe to presents all the fields in the form on a single screen. Now, it's easy to imagine that the task of adapting can be even more challenging if JavaScript codes exist in the VoiceXML document.

The crux of the matter: a simple, static transformation (such as the one with XSL) wouldn't work. The adaptor _must_ execute the VoiceXML (and the embedded JavaScript, if any). We don't want to require programmers to tweak existing VoiceXML code just in order to make it presentable as GUI. We want to provide 100% assurance that existing VoiceXML will processed the same way as it is on existing VoiceXML browser, that no logic inside existing VoiceXML code that might impact business operation will be ignored / droppped. In short, a VoiceXML-compliant adaptor.

For that reason, I decided to pick up JVoiceXML, an open-source VoiceXML interpreter, as the base for this adaptor. What needs to be done is extending it through various mechanism (Subscriber/Listener, injection of alternative Strategy, wrapper + delegation, etc). In some situations I even had to make modification directly in the JVoiceXML code, and I tried not to break its VoiceXML-compliance. It takes a lot of reverse engineering / debugging tricks to understand how it works and know where & how to make the modifications / extensions (link to list of modifications in JVoiceXML code). GuiceXML alpha 0.1 is based on JVoiceXML ver 0.7.5.

Here's the result so far (link to the code). You will need NetBeans to open the project (I'm using the latest version, 7.3). To run the project, create a Run configuration like this one:



The GuiceXML code belongs to a package named net.raka.jvxmltd.

This sequence of screens will give you a picture of what to expect from this alpha version. It still does thing in the most naive way, like I mentioned above, asking input one field at a time. This example requires you to deploy two VoiceXML documents (sample_01.vxml and sample_02.vxml) to your webserver. You just have to change the URL of the landing VoiceXML page (sample_01.vxml) to point to your webserver.



Also, of course this is a very rough version of GuiceXML, where I basically put hooks here and there, in order to be able to intercept and react on a few events that I'm interested in, such as announcement and input collection. In fact in one place I have to do a dirty trick of using execution stack trace, just in order to figure out whether a prompt should be played as part of input collection (i.e.: inside a ), or as an informational message (i.e.: inside a ). If it's part of input collection, that GuiceXML prints it in the text area inside the frame. Otherwise, it displays a blocking message dialog.



Actually, that dirty trick along with modifications I did directly in the JVoiceXML code, leads me to thinking that JVoiceXML code itself needs some refactoring. Ideally, to implement an adaptor / extension like GuiceXML, one shouldn't need to modify the core library (JVoiceXML). The hooks provided the library should be sufficient. However that's not the case with JVoiceXML 0.7.5.

Also, refactoring is needed in JVoiceXML to abstract away several classes on which the current GuiceXML is based / depends on. For example, JVoiceXML's TextTelephony (used by GuiceXML's TextDriverTelephony) communicates with its peer classes using socket. Well, that's not desired in GuiceXML because the idea is to have GuiceXML running in mobile environment (e.g.: J2ME / Android phone), and minimize the use of network resource (only HTTP will be needed to fetch the VoiceXML documents from the app server that hosts the IVR application). TextTelephony I think should be coded in such that way the I/O channel it uses can be injected, and the I/O channel itself is described as a java interface, where network socket is only one of the possible implementations. Click here to understand the role of Telephony in JVoiceXML.

Additional note: this visual IVR idea can be mixed with callback. Many times I get frustrated dealing with contact center on the phone (voice), for two reasons: (1) that tedious IVR and (2) long wait time to speak to an agent (when call gets transferred at the end of IVR interaction). With visual IVR, interaction can be sped up (and companies can reuse their existing IVR-app investment) and caller doesn's have to wait to be attended by the agent. The contact center agent is the one who is supposed to make the call back.

Here's a picture of a callback app concept I sketched a while back, and mentioned in this blog: http://www.nojitter.com/post/240001084/camping-with-avaya?pgno=2
.
 

Alright, that's all for now. Drop me a line if you're interested in the project. ( raka.angga _at_ gmail.com ). Cya!

VoiceXML Automated Testing, using JVoiceXML

UPDATE: GuiceXML and VoiceXML Autotest (re)united. This way. New code, screencast to give you a (more) concrete idea of what GuiceXML is about, and a screnshot of a code to express your voicexml automated-test scenario.

----- Original Entry -----

Okay, just some quick & drafty entry here.

I needed a way to automate the testing of IVR applications (VoiceXML). Quick googling will take us to this (Microsoft) and this (Empirix). I can't make any comment on either of those options, because I haven't used any of them; Tellme retired its free-developer account service, so I can't access Tellme Studio any longer. As for Empirix, based on what I read on its webpage, I don't think it's exactly the one I was looking for (it talks about recognition error rate, prompt quality, etc., while my concern is mainly about the flow of the dialogs).


UPDATE: I guess this product named "Voiyager" is close to what I'm looking for. Link: http://www.syntellect.com/pages/products/voiyager_eng.aspx

What I want is really simple (to begin with): I want to verify (quickly) that if I press "1" in a dialog that asks "What do you want to drink? Press 1 for coffee, Press 2 for tea.", then the next prompt would be "You selected coffee. What type of coffee? Press 1 for cappuccino, press 2 for espresso".

This is based on my observations on how clients specify their expectations..., as a theater script with two actors in it, the IVR and the user. Like this:

IVR: What do you want to drink? Press 1 for coffee, Press 2 for tea. (or: audio_01.wav)
User: Press 1
IVR: You selected coffee. What type of coffee? Press 1 for cappuccino, press 2 for espresso. (or: audio_02.wav)
... (and so on).
So I thought it would be nice, if we can make a little program that takes that script and check it against a running IVR.

I've seen some people attempt to automate the test using automation tool like AutoIt, that basically (1) starts a softphone, (2) dial the IVR, (3) input DTMF (by pressing buttons in the softphone app), and that's it.

The problem with that is:
  1. There's no (easy) way to verify the prompts. Tool like AutoIT is a GUI test tool, designed for testing desktop applications, to check properties of GUI elements in the application. You can't use that to capture audio, let alone to compare it against our expectation (which would be expressed in a text format).
  2. Without a way to verify the prompts, the test is useless.
And then I came across JVoiceXML, an open-source VoiceXML interpreter, written in Java. Somebody else came up with the idea for the test-tool ( described here: http://sourceforge.net/apps/mediawiki/jvoicexml/index.php?title=UnitTest ). I simply took that idea, and implement part of it. I started with something really simple: I want to be able to express each scenario in the following format (plain text file):
i.Please enter 1 to go to formB, or 2 to go to formC%1
a.You are in prompt B
i.Please enter 34 followed by # to go to formD, or 35 followed by # to go to formE%34#
a.You are in prompt D
The line that starts with "i." means: an input collection is expected, where user will prompted with the question to right of the first "." & the left of "%", and user will respond by pressing sequence of digits specified to the right of "%". The line that starts with "a." means: user will be prompted with the question to right of the first ".".

And, to run the test (scenario), I would only have to type this command in the console:
java JVoiceXmlTest http://mywebserver/index.jsp scenario_01.txt
Where the first parameter (http://mywebserver/index.jsp) is the URL of the landing page of the IVR, and the second parameter (scenario_01.txt) is the name of the text file that contains the scenario.

So, here's what I got so far (video below). Nothing interesting :) yet, just some scrolling text in a console. What's more interesting is some findings I made when I was modifying JVoiceXml source code.



Ok, now the findings (I hope this can be a useful feedback for JVoiceXml team in refactoring effort):
  1. JVoiceXml has a dependence to RMI (i.e.: it binds itself to the JNDI during startup). It maybe fine for the intended use of JVoiceXml (as a networked application). However, for testing tool like this one here, I just want to run it as a standalone component. Particularly, I'm only interested in the voicexml interpreter core. For now, I simply comment out the lines of code related to JNDI and RMI. I hope future versions will be refactored to let us use the voicexml interpreter as a plain java object.
  2. JVoiceXML has this architecture that allows you to change the "platform factory". A platform factory is basically an object that creates other objects that know how to obtain and process the (spoken) input and (audio) output. Inputs and outputs go through an instance of "Telephony" (it's the canal).

    The good thing is JVoiceXML comes with a "text platform factory" that takes inputs as texts and produces outputs as texts. A slight modification was needed to that "text platform factory", because its telephony read input / write output from / to server socket. I don't need that; I needed to by-pass that and use simple method invocations. So I created a wrapper around it, that's the PruebaPlatformFactory.
Another finding (lengthy one, and the most challenging), the issue with semantic interpretation of grammar.

Currently the implementation of its GrammarChecker does not support semantic interpretation. So, you can not associate a (custom) value to phrases supported by your grammar. This limits the use of JVoiceXML.

In my case, for example, I need to capture a 4-digits pin number. Well, for that actually you can simply take the "utterance" and treats it as the value. Let's suppose user is required to complete the input by pressing the # after the fourth digits. Well, for that you can also simply take the utterance, and (in your "business logic" you drop the trailing #). You don't need semantic interpretation for such case.

Anyway, semantic interpretation is important; any decent VoiceXml browser must support it (conforming to a specification like SISR). So I set out to solve this, as part of an exercise. I ran into some difficulties, because I still don't have a good grasp on how the GrammarChecker and (to lesser degree) SrgsXmlGrammarParser work; what's the principles & logic. I guess the difficulties stems from the fact that the tree structure in the static model (the SRGS grammar) is transformed to a linear structure when the input is checked against the grammar.

So I just put my modifications in some sensible places in the code where I can intercept the event of " node is visited". My code simply collects the content of those (which basically is line of JavaScript code) , and stitch those lines of JS code (in some order) when the walk is completed. The code then feeds those lines of JS code to the embedded JS-engine (Rhino), and I simply take the return value of the execution of the JS code, and assign it as the "semantic interpretation".

Take the following grammar for instance (click to enlarge).










If the input from user "34#", then the JS-codes that the GrammarChecker will produce would be:
var digits = new Object();
digits.MEANING='D';
var root_rule = new Object();
root_rule.MEAN2=digits.MEANING;
root_rule.MEAN1='form';
root_rule.whereToGo=root_rule.MEAN1+root_rule.MEAN2;
The return value from the JS execution is always the value of the last line, so effectively we will get the string "formD".

My current fix is kind of hackish. I just did the minimal thing to make the cases listed in http://box.net/files#/files/0/f/0/1/f_945790295 pass. The ideal solution would be the one that passes SISR 1.0 conformance tests. For that I will need to take a closer look at the SrgsXmlGrammarParser and GrammarChecker, and the related classes & interfaces. I feel the need for refactoring in that area. The way GrammarNodes and SrgsNode are (currently) structured doesn't make it easy to navigate through the tree / walk, which might be required for efficient implementation of semantic interpreter. I was also thinking, why not use ANTLR for generating the bulk of the grammar interpreter? I guess that would be easier & produce cleaner code.

Last finding: this time about notification. I need a way to get notified whenever any of these two things occurs:
  1. The interpreter is waiting for input, so that I can put code that pro grammatically feeds the input in.
  2. The interpreter is playing a prompt, so that I can put code that compares the prompt with the one specified in the scenario file.
JVoiceXml employs strategy design pattern (see interface TagStrategy), which I exploits here to achieve those two things mentioned above; I simply implements a TagStrategy that wraps around the default strategy, so I can do some interceptions, and fire the notifications from there. Hmm, well, I was lying. I mean, that would be the right way to do it, but for now (because I don't have much time), I simply modified the implementation of the default strategies. Told you, it was hackish :).

The modified JVoiceXml is available here (it is based on JVoiceXml 0.7.4.1). Actually, it's of little use for public right now (it's really yucky!); I will have to modify it anyway sometime later, only after I get a firm understanding of the grammar interpreter, in order to make it SISR 1.0 compliant.

Okay, that's all for now!