GuiceXML & VoiceXML Autotest, (r.e).u.n.i.t.e.d. !

STOP PRESS :)  20 April 2013: I decided to use Bladeware Vxml instead of JVoiceXML as the engine for this tool. Click here to find out more about it.

***********************

Ok, quickie :)

GuiceXML: I already explained it in previous blog entry. This weekend I had a chance to make some improvement in the code, and it's available here for download: https://www.box.com/s/3bph4o8096489spnhip2 . Don't forget to download the sample VoiceXML files (just deploy them to your webserver), here: https://www.box.com/s/v0tmupsij9ogscullh1k .

I also had a chance to take a screencast of GuiceXML, so you can have a better idea of what it is. Here's the vid:



VoiceXML Autotest: it was explained in this old blog entry (actually it precedes GuiceXML). With that library you can check the VoiceXML flow against your expectations (scenario) expressed the following way:


The good things is: both things now share the same code base. :) 

Well, that's all for tonight. I'm sorry I don't have a chance tonight to write a bit about the code and where it's heading. I hope tommorrow night I'll be able to do it. Cya!

GuiceXML

UPDATE: GuiceXML and VoiceXML Autotest (re)united. This way. New code, screencast to give you a (more) concrete idea of what GuiceXML is about, and a screnshot of a code to express your voicexml automated-test scenario.

----- Original Entry -----

After a long hiatus, time to resurrect old projects. One of them is someting I name... GuiceXML. It's a word play of GUI + VoiceXML.... This is a cool (and useful) thing. Several companies are doing it, check out this one: http://fonolo.com/blog/2012/11/what-does-the-future-hold-for-visual-ivr/

VoiceXML is a language for describing flows of dialogs. Originally it's meant for interactive VOICE response (IVR). But..., it's quite easy to conceive these same flows of dialogs can be presented VISUALLY in a smartphone. We have proliferation of VoiceXML-based IVR; It's been around since 2000, nowadays majority of IVR are written in VoiceXML, so it's pretty standard, just as HTML is to websites. Also, smartphone is commonplace. Connect the dots, you can quickly see the opportunity for this kind of thing (an adaptor, VoiceXML app -> mobile app).

Of course in practice that wouldn't be as straightforward as it sounds. To begin with, inherently there's a difference between visual and audible user interaction. What are those fundamental differences? Well, I still have to read some more about it :), so can't write much now on that matter. This link in the meantime can provide some hints: http://www.informit.com/articles/article.aspx?p=26669 . So..., a good adaptor must provide a good API & tool to overcome this first challenge. I guess a significant amount of tinkering and work would be on tackling this challenge (note to product manager :)).

Maybe this technical description can illustrate some points. In GUI users are accustomed to seeing all the fields, that belong to the same form, on the same screen. Additionaly, in most cases users have freedom in deciding the sequence in which they fill-in the input fields. But that's normally not the case with VUI where users are kind of "forced" to give input one-by-one, in a predefined (and unflexible) sequence, eventhough there isn't dependency between each input. This is not a limitation of VoiceXML per se, but more of UI design (most VoiceXML applications are coded in that simplistic way). VoiceXML actually provides a way to implement mixed initiative dialog. Check out this link: http://www.vxml.org/t_20.htm

Alright, you might ask: why not simply apply some XSL transformations on the VoiceXML documents to produce HTML then?

Well, not that simple, and one of the reason is... something I would call the dynamism embedded in the VoiceXML spec itself, and without even requiring JavaScript (which is another thing that can be used inside a VoiceXML document). For example: an input field (), can have a "cond" attribute that specifies a Boolean expression, which must equate to 'true' in order for the content to be visited and executed. Furthermore, this expression can involve a value of another field(s). Meaning: in that case, there _is_ a dependency between fields. Contrast this to HTML where you would have to use some JavaScripts to provide similar behavior.

A good adaptor must know how to figure out whether or not there are dependencies between fields in a form, and adjust the way it asks for the inputs accordingly. We can think of parsing the fields in a form to see if any of them have "cond" attribute for example; if not, then it might be safe to presents all the fields in the form on a single screen. Now, it's easy to imagine that the task of adapting can be even more challenging if JavaScript codes exist in the VoiceXML document.

The crux of the matter: a simple, static transformation (such as the one with XSL) wouldn't work. The adaptor _must_ execute the VoiceXML (and the embedded JavaScript, if any). We don't want to require programmers to tweak existing VoiceXML code just in order to make it presentable as GUI. We want to provide 100% assurance that existing VoiceXML will processed the same way as it is on existing VoiceXML browser, that no logic inside existing VoiceXML code that might impact business operation will be ignored / droppped. In short, a VoiceXML-compliant adaptor.

For that reason, I decided to pick up JVoiceXML, an open-source VoiceXML interpreter, as the base for this adaptor. What needs to be done is extending it through various mechanism (Subscriber/Listener, injection of alternative Strategy, wrapper + delegation, etc). In some situations I even had to make modification directly in the JVoiceXML code, and I tried not to break its VoiceXML-compliance. It takes a lot of reverse engineering / debugging tricks to understand how it works and know where & how to make the modifications / extensions (link to list of modifications in JVoiceXML code). GuiceXML alpha 0.1 is based on JVoiceXML ver 0.7.5.

Here's the result so far (link to the code). You will need NetBeans to open the project (I'm using the latest version, 7.3). To run the project, create a Run configuration like this one:



The GuiceXML code belongs to a package named net.raka.jvxmltd.

This sequence of screens will give you a picture of what to expect from this alpha version. It still does thing in the most naive way, like I mentioned above, asking input one field at a time. This example requires you to deploy two VoiceXML documents (sample_01.vxml and sample_02.vxml) to your webserver. You just have to change the URL of the landing VoiceXML page (sample_01.vxml) to point to your webserver.



Also, of course this is a very rough version of GuiceXML, where I basically put hooks here and there, in order to be able to intercept and react on a few events that I'm interested in, such as announcement and input collection. In fact in one place I have to do a dirty trick of using execution stack trace, just in order to figure out whether a prompt should be played as part of input collection (i.e.: inside a ), or as an informational message (i.e.: inside a ). If it's part of input collection, that GuiceXML prints it in the text area inside the frame. Otherwise, it displays a blocking message dialog.



Actually, that dirty trick along with modifications I did directly in the JVoiceXML code, leads me to thinking that JVoiceXML code itself needs some refactoring. Ideally, to implement an adaptor / extension like GuiceXML, one shouldn't need to modify the core library (JVoiceXML). The hooks provided the library should be sufficient. However that's not the case with JVoiceXML 0.7.5.

Also, refactoring is needed in JVoiceXML to abstract away several classes on which the current GuiceXML is based / depends on. For example, JVoiceXML's TextTelephony (used by GuiceXML's TextDriverTelephony) communicates with its peer classes using socket. Well, that's not desired in GuiceXML because the idea is to have GuiceXML running in mobile environment (e.g.: J2ME / Android phone), and minimize the use of network resource (only HTTP will be needed to fetch the VoiceXML documents from the app server that hosts the IVR application). TextTelephony I think should be coded in such that way the I/O channel it uses can be injected, and the I/O channel itself is described as a java interface, where network socket is only one of the possible implementations. Click here to understand the role of Telephony in JVoiceXML.

Additional note: this visual IVR idea can be mixed with callback. Many times I get frustrated dealing with contact center on the phone (voice), for two reasons: (1) that tedious IVR and (2) long wait time to speak to an agent (when call gets transferred at the end of IVR interaction). With visual IVR, interaction can be sped up (and companies can reuse their existing IVR-app investment) and caller doesn's have to wait to be attended by the agent. The contact center agent is the one who is supposed to make the call back.

Here's a picture of a callback app concept I sketched a while back, and mentioned in this blog: http://www.nojitter.com/post/240001084/camping-with-avaya?pgno=2
.
 

Alright, that's all for now. Drop me a line if you're interested in the project. ( raka.angga _at_ gmail.com ). Cya!

VoiceXML Automated Testing, using JVoiceXML

UPDATE: GuiceXML and VoiceXML Autotest (re)united. This way. New code, screencast to give you a (more) concrete idea of what GuiceXML is about, and a screnshot of a code to express your voicexml automated-test scenario.

----- Original Entry -----

Okay, just some quick & drafty entry here.

I needed a way to automate the testing of IVR applications (VoiceXML). Quick googling will take us to this (Microsoft) and this (Empirix). I can't make any comment on either of those options, because I haven't used any of them; Tellme retired its free-developer account service, so I can't access Tellme Studio any longer. As for Empirix, based on what I read on its webpage, I don't think it's exactly the one I was looking for (it talks about recognition error rate, prompt quality, etc., while my concern is mainly about the flow of the dialogs).


UPDATE: I guess this product named "Voiyager" is close to what I'm looking for. Link: http://www.syntellect.com/pages/products/voiyager_eng.aspx

What I want is really simple (to begin with): I want to verify (quickly) that if I press "1" in a dialog that asks "What do you want to drink? Press 1 for coffee, Press 2 for tea.", then the next prompt would be "You selected coffee. What type of coffee? Press 1 for cappuccino, press 2 for espresso".

This is based on my observations on how clients specify their expectations..., as a theater script with two actors in it, the IVR and the user. Like this:

IVR: What do you want to drink? Press 1 for coffee, Press 2 for tea. (or: audio_01.wav)
User: Press 1
IVR: You selected coffee. What type of coffee? Press 1 for cappuccino, press 2 for espresso. (or: audio_02.wav)
... (and so on).
So I thought it would be nice, if we can make a little program that takes that script and check it against a running IVR.

I've seen some people attempt to automate the test using automation tool like AutoIt, that basically (1) starts a softphone, (2) dial the IVR, (3) input DTMF (by pressing buttons in the softphone app), and that's it.

The problem with that is:
  1. There's no (easy) way to verify the prompts. Tool like AutoIT is a GUI test tool, designed for testing desktop applications, to check properties of GUI elements in the application. You can't use that to capture audio, let alone to compare it against our expectation (which would be expressed in a text format).
  2. Without a way to verify the prompts, the test is useless.
And then I came across JVoiceXML, an open-source VoiceXML interpreter, written in Java. Somebody else came up with the idea for the test-tool ( described here: http://sourceforge.net/apps/mediawiki/jvoicexml/index.php?title=UnitTest ). I simply took that idea, and implement part of it. I started with something really simple: I want to be able to express each scenario in the following format (plain text file):
i.Please enter 1 to go to formB, or 2 to go to formC%1
a.You are in prompt B
i.Please enter 34 followed by # to go to formD, or 35 followed by # to go to formE%34#
a.You are in prompt D
The line that starts with "i." means: an input collection is expected, where user will prompted with the question to right of the first "." & the left of "%", and user will respond by pressing sequence of digits specified to the right of "%". The line that starts with "a." means: user will be prompted with the question to right of the first ".".

And, to run the test (scenario), I would only have to type this command in the console:
java JVoiceXmlTest http://mywebserver/index.jsp scenario_01.txt
Where the first parameter (http://mywebserver/index.jsp) is the URL of the landing page of the IVR, and the second parameter (scenario_01.txt) is the name of the text file that contains the scenario.

So, here's what I got so far (video below). Nothing interesting :) yet, just some scrolling text in a console. What's more interesting is some findings I made when I was modifying JVoiceXml source code.



Ok, now the findings (I hope this can be a useful feedback for JVoiceXml team in refactoring effort):
  1. JVoiceXml has a dependence to RMI (i.e.: it binds itself to the JNDI during startup). It maybe fine for the intended use of JVoiceXml (as a networked application). However, for testing tool like this one here, I just want to run it as a standalone component. Particularly, I'm only interested in the voicexml interpreter core. For now, I simply comment out the lines of code related to JNDI and RMI. I hope future versions will be refactored to let us use the voicexml interpreter as a plain java object.
  2. JVoiceXML has this architecture that allows you to change the "platform factory". A platform factory is basically an object that creates other objects that know how to obtain and process the (spoken) input and (audio) output. Inputs and outputs go through an instance of "Telephony" (it's the canal).

    The good thing is JVoiceXML comes with a "text platform factory" that takes inputs as texts and produces outputs as texts. A slight modification was needed to that "text platform factory", because its telephony read input / write output from / to server socket. I don't need that; I needed to by-pass that and use simple method invocations. So I created a wrapper around it, that's the PruebaPlatformFactory.
Another finding (lengthy one, and the most challenging), the issue with semantic interpretation of grammar.

Currently the implementation of its GrammarChecker does not support semantic interpretation. So, you can not associate a (custom) value to phrases supported by your grammar. This limits the use of JVoiceXML.

In my case, for example, I need to capture a 4-digits pin number. Well, for that actually you can simply take the "utterance" and treats it as the value. Let's suppose user is required to complete the input by pressing the # after the fourth digits. Well, for that you can also simply take the utterance, and (in your "business logic" you drop the trailing #). You don't need semantic interpretation for such case.

Anyway, semantic interpretation is important; any decent VoiceXml browser must support it (conforming to a specification like SISR). So I set out to solve this, as part of an exercise. I ran into some difficulties, because I still don't have a good grasp on how the GrammarChecker and (to lesser degree) SrgsXmlGrammarParser work; what's the principles & logic. I guess the difficulties stems from the fact that the tree structure in the static model (the SRGS grammar) is transformed to a linear structure when the input is checked against the grammar.

So I just put my modifications in some sensible places in the code where I can intercept the event of " node is visited". My code simply collects the content of those (which basically is line of JavaScript code) , and stitch those lines of JS code (in some order) when the walk is completed. The code then feeds those lines of JS code to the embedded JS-engine (Rhino), and I simply take the return value of the execution of the JS code, and assign it as the "semantic interpretation".

Take the following grammar for instance (click to enlarge).










If the input from user "34#", then the JS-codes that the GrammarChecker will produce would be:
var digits = new Object();
digits.MEANING='D';
var root_rule = new Object();
root_rule.MEAN2=digits.MEANING;
root_rule.MEAN1='form';
root_rule.whereToGo=root_rule.MEAN1+root_rule.MEAN2;
The return value from the JS execution is always the value of the last line, so effectively we will get the string "formD".

My current fix is kind of hackish. I just did the minimal thing to make the cases listed in http://box.net/files#/files/0/f/0/1/f_945790295 pass. The ideal solution would be the one that passes SISR 1.0 conformance tests. For that I will need to take a closer look at the SrgsXmlGrammarParser and GrammarChecker, and the related classes & interfaces. I feel the need for refactoring in that area. The way GrammarNodes and SrgsNode are (currently) structured doesn't make it easy to navigate through the tree / walk, which might be required for efficient implementation of semantic interpreter. I was also thinking, why not use ANTLR for generating the bulk of the grammar interpreter? I guess that would be easier & produce cleaner code.

Last finding: this time about notification. I need a way to get notified whenever any of these two things occurs:
  1. The interpreter is waiting for input, so that I can put code that pro grammatically feeds the input in.
  2. The interpreter is playing a prompt, so that I can put code that compares the prompt with the one specified in the scenario file.
JVoiceXml employs strategy design pattern (see interface TagStrategy), which I exploits here to achieve those two things mentioned above; I simply implements a TagStrategy that wraps around the default strategy, so I can do some interceptions, and fire the notifications from there. Hmm, well, I was lying. I mean, that would be the right way to do it, but for now (because I don't have much time), I simply modified the implementation of the default strategies. Told you, it was hackish :).

The modified JVoiceXml is available here (it is based on JVoiceXml 0.7.4.1). Actually, it's of little use for public right now (it's really yucky!); I will have to modify it anyway sometime later, only after I get a firm understanding of the grammar interpreter, in order to make it SISR 1.0 compliant.

Okay, that's all for now!


Hmm.... Pretziiii....

Just toying around with Prezi....

Microsoft Lifecam Cinema Component for Windows Embedded Standard 2009

I have a need to have Microsoft Lifecam Cinema (high-definition PC camera) working on Windows Embedded Standard 2009.

I'm aware that several people in the internet is also looking for a solution for that task. Well here it is: download the component design file (SLD) here. You can then simply import it to your component repository, and use it in your target design (search for "Lifecam" in the component list). Download the zipped folder that contains the Lifecam's driver here.

In your target design, don't forget to add the following components as well: PNP (User-Mode) and DirectShow Capture. I still have to analyze if they both should go into the SLD file.

I'm writing "from the ground up" document with topic: building a lean custom image of Windows Embedded Standard 2009 that supports Microsoft Lifecam Cinema. Stay tuned. Will have it ready this weekend. Cya! In the meantime, enjoy the music :) -- a composition by Collin McPhee (read this one as well) from his work Tabuh-Tabuhan.



UPDATE: here it is!
Microsoft Lifecam on Windows Embedded Standard 2009

Continuing DBConan Zeta....

Alright,

I changed the implementation of the domain model of DBConan Zeta. In the previous entry I mentioned the domain object (Schema, Table, Column, etc.) will make use of QExplicitlySharedDataPointer. Well..., I decided this weekend to use QSharedDataPointer instead. More on that later.

Second, on the trick to avoid that "field has incomplete type". In the previous entry I said "use pointer". Now I decided to get rid of pointers altogether. It looks so un-QT-ish :) to me. I might be wrong here..., but the * signs hurt my eyes :).
Oh no, I'm slowly becoming a QT zealot. It reminds me of my early days learning Java 12 years ago....
Source code here: http://www.box.net/shared/8zt5p6u9vz

Ok, on the jump from QExplicitlySharedDataPointer to QSharedDataPointer. Why? At first I thought QExplicitySharedDataPointer is the way to go, because -- the way I understand it -- it practically gives the effect of "pass-by-reference", which is the default mode of Java / C#.

For those who are not familiar with QT: QSharedDataPointer means implicit sharing..., while QExplicitlySharedDataPointer means explicit sharing.

In practical terms, implicit sharing means: when you modify an object, from within a function (that gets passed in the object), the caller of the function -- the one that passed the object into the function -- will not see the change you made on that object. Surprise (!). The reason is, at the point a non-const function is invoked on that object, the sharing is automatically broken; a deep copy is made, which means that object now has its own set of data, so that any changes made by that function would not ripple (inadvertently) to other part(s) of the program.

Hmm..., actually I'm not very happy with the way I explain it :). So, please head over to the authoritative source (in QT ref.) instead.

Ok, sounds like implicit sharing is not-intuitive (at least for Java / C# programmers). In Java / C#, by default, we expect the change we made in an object to ripple through the program (and we do some tricks when we want otherwise).

Here, with implicit sharing, it's exactly the opposite. So, why not use explicit sharing instead? Well, in a few words: I found it hard to achieve consistency with explicit sharing, partly because my domain objects have member fields of types that use implicit sharing (e.g.: QString and QList). Mixing implicit sharing and explicit sharing, in my opinion, is not a good thing. It will only lead to confusion.

Ok, let's suppose the Schema object is explicitly shared. Suppose you pass that Schema to a function (xxxFunc). Inside the function you obtain the list of Tables that belong to the schema. Then, you append another instance of Table to that list. You would expect that change to be seen from the part where you invoke the xxxFunc. Chance is: that's not the case. Because QList is implicitly shared.

So..., I thought "I don't want to mess with that. Let's stick with one approach only, implicit sharing". The (only) problem, or I'd rather say "consequence", with that decision is...: it requires a shift in my way of thinking (in the rest of the program). I will have to keep in mind (everytime I pass an object to a function): it's a pass-by-value, it's a pass-by-value.

That may not be as bad as it sounds. I think it has something to do with immutability. I remember Joshua Bloch in his book (Effective Java) mentions "favor immutability". Maybe this pass-by-value will force us to lean ourselves more toward that principle. I don't know..., we'll see. I'll report back on that later.

In the meantime you can download the C++ source code. Compare it with dbconan_schema.py below (complete python source: here). You see, there are many checks -- that are available in the python version -- currently not implemented in the C++ port. I'll get to that later. Ciao!
UPDATE: I also decided to change the target database from Oracle to Postgre. The reason: I don't have time to (re)compile QT on my Ubuntu Linux in order to have the QT's Oracle SQLDriver. On the other hand, Postgre SQLDriver comes out of the box (when we install QT SDK from the repository).

In the Python+Oracle version the program makes heavy use of Oracle Data Dictionary (that's how it learns the structure of the database).

The equivalent of it in Postgre is "information schema". Here are a few tables in the information_schema that I think will be useful for the program:

select * from information_schema.tables where table_schema = 'public' -- public is the default schema name
select * from information_schema.columns where table_name = 'table_a'
select * from information_schema.constraint_column_usage
select * from information_schema.key_column_usage
select * from information_schema.referential_constraints
select * from information_schema.constraint_table_usage
select * from information_schema.table_constraint
dbconan_schema.py