Wibbly Wobbly WBXML

Ok, so I couldn’t think of a title that’s as wilfully obscure as the usual ones.  Whatever.

For reasons of Commerce, I need to be able to generate WBXML messages within the guts of the mighty Python/Zope engine that powers the Mobile Phone Project[0].  What, I hear you ask, the blinking flip is WBXML?  Well, if you don’t know, you probably want to keep it that way, but you did ask.

WBXML is a binary encoding of XML.  XML is, of course, a textual encoding of data… some of which may originally have been binary.  So it’s sort of an extra level of complication added to something that’s already complicated, but hey, that’s what geeks do, isn’t it?  The reason that it’s a binary encoding is that XML is bulky.  Most of the time that bulk doesn’t matter that much; I’ll trade bandwidth, memory or CPU time for explicitness any day of the week.  But if you’re trying to pack XML over a slow, laggy, prone-to-being-interrupted-by-trees-or-birds wireless link to a phone, bulk is bad.  It’s even worse if you’re trying to pack an XML Service Indication (essentially, a pushed URL) into the tiny size of a single SMS message.  Hence the binaryness.

WBXML isn’t anything as simple as, say, a gzipped version of the XML stream.  Instead, it’s a carefully rigorous specification of how individual single byte values map to either XML or text strings.  For example, the XML <SI> maps to the binary value 0x05, and <INDICATION> maps to 0x06.  But it’s clever; if the HREF attribute of the INDICATION starts with “http://&#8221;, then the whole attribute-starting-http maps to 0x0C.  If the HREF starts with “http://www&#8221; then it’s mapped to 0x0D, saving another three bytes, and so on.  The more common the string, the more likely it is to have a fixed mapping.  There’s also a neat string-table option; commonly used string can be folded into single-byte offsets into a string table (in effect, any repeated string longer than three bytes is worth string-table-izing).

This is non-trivial stuff to knock up in a hurry, so it’s just as well that there’s the libwbxml open-source library to handle it all.  That library, however, is in C, and I’m working in Python.  There appear to be no published Python binding to libwbxml, so it was time to dust off my ancient experience of #include <Python.h> and get to it.

Here’s the C code that allows a Python call to libwbxml’s xml2wbxml function:

static PyObject *wbxml_xml2wbxml(PyObject *self, PyObject *args) {
/*A WB_UTINY is an unsigned char, so we can allow conversion directly from the Python string*/
WB_UTINY *xml;
WB_UTINY *wbxml;
WB_ULONG wbxmllen;
int status;
WBXMLConvXML2WBXMLParams    params;
WB_UTINY *errstr;
PyObject *result;
 
    /* Verify and read a string arg (xml) */
    if (!PyArg_ParseTuple(args, "s", &xml))
        return NULL;
 
    /* Pass that to libwbxml2 */
    params.keep_ignorable_ws = FALSE;
    params.use_strtbl = TRUE;
    params.wbxml_version = WBXML_VERSION_11;
    status = wbxml_conv_xml2wbxml(xml,&wbxml,&wbxmllen,&params);
    if(status == WBXML_OK) {
        errstr = NULL;  /*we return None to mean no error*/
    } else {
        errstr = (WB_UTINY *)wbxml_errors_string(status);
    }
 
    /* Build the return tuple of wbxml, error.
    The wbxml string is binary, so we need to convert it with a z#
    rather than a z.*/
 
    result = Py_BuildValue("(z#z)",wbxml,wbxmllen,errstr);
 
    /* Free the buffer that came back from the converter (thanks, Bob!) */
    wbxml_free((void *)wbxml);
 
    return result;
}

For details, I recommend you to the excellent on-line reference to the Python/C API (link goes directly to the section that explains conversion of values between C and Python).

[edit – removed dead link to the source for this… I don’t have it anymore, sorry!]

Right now, I don’t need to convert WBXML back to XML, so there’s no link to the reverse wbxml2xml function.  Like they say, open-source software scratches one’s own itches.  And if you did want to send the resulting SI in one or more SMS messages to a mobile, you’d also need to wrap the binary WBXML in a WSP (Wireless Session Protocol) header, which is another topic entirely[1].

[0] I suppose I should start calling this the Mobile Phone Content Project, since that’s a more accurate name.  Maybe when the tv adverts start…
[1] I do have working code to do this, so if anyone needs to wrangle WSP, feel free to drop me an email (or ask via blog comment) and I’ll share what I know.

Advertisements

6 thoughts on “Wibbly Wobbly WBXML

    • Gah… I’m assuming you spotted that I didn’t free the buffer coming back from xml2wbxml. Pah. I spent so long looking up the exact usage of Py_INCREF et al that I plain forgot about the boring old C semantics.

      Of course, maybe you spotted something else entirely ๐Ÿ™‚

      cheers
      b

      • Here’s the C code that allows a Python call to libwbxml’s xml2wbxml function: static PyObject *wbxml_xml2wbxml(PyObject *self, PyObject *args) { /*A WB_UTINY is an unsigned char, so we can allow conversion directly from the Python string*/ WB_UTINY *xml; WB_UTINY *wbxml; WB_ULONG wbxmllen; int status; WBXMLConvXML2WBXMLParams params; WB_UTINY *errstr; PyObject *result; /* Verify and read a string arg (xml) */ if (.

  1. Re: src code link invaid

    Unfortunately not. The question of ownership of some of that code is complex and I don’t want to go near it. However, the example source code in the post was developed on my time first and belongs to me (I’m careful about that sort of thing), so feel free to use it. All you really need is that snippet of C, the ability to compile and link it, and libwbxml.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s