Wibbly Wobbly WBXML

Ok, so I couldn’t think of a title that’s as wilfully obscure as the usual ones.  Whatever.

For reasons of Commerce, I need to be able to generate WBXML messages within the guts of the mighty Python/Zope engine that powers the Mobile Phone Project[0].  What, I hear you ask, the blinking flip is WBXML?  Well, if you don’t know, you probably want to keep it that way, but you did ask.

WBXML is a binary encoding of XML.  XML is, of course, a textual encoding of data… some of which may originally have been binary.  So it’s sort of an extra level of complication added to something that’s already complicated, but hey, that’s what geeks do, isn’t it?  The reason that it’s a binary encoding is that XML is bulky.  Most of the time that bulk doesn’t matter that much; I’ll trade bandwidth, memory or CPU time for explicitness any day of the week.  But if you’re trying to pack XML over a slow, laggy, prone-to-being-interrupted-by-trees-or-birds wireless link to a phone, bulk is bad.  It’s even worse if you’re trying to pack an XML Service Indication (essentially, a pushed URL) into the tiny size of a single SMS message.  Hence the binaryness.

WBXML isn’t anything as simple as, say, a gzipped version of the XML stream.  Instead, it’s a carefully rigorous specification of how individual single byte values map to either XML or text strings.  For example, the XML <SI> maps to the binary value 0x05, and <INDICATION> maps to 0x06.  But it’s clever; if the HREF attribute of the INDICATION starts with “http://&#8221;, then the whole attribute-starting-http maps to 0x0C.  If the HREF starts with “http://www&#8221; then it’s mapped to 0x0D, saving another three bytes, and so on.  The more common the string, the more likely it is to have a fixed mapping.  There’s also a neat string-table option; commonly used string can be folded into single-byte offsets into a string table (in effect, any repeated string longer than three bytes is worth string-table-izing).

This is non-trivial stuff to knock up in a hurry, so it’s just as well that there’s the libwbxml open-source library to handle it all.  That library, however, is in C, and I’m working in Python.  There appear to be no published Python binding to libwbxml, so it was time to dust off my ancient experience of #include <Python.h> and get to it.

Here’s the C code that allows a Python call to libwbxml’s xml2wbxml function:

static PyObject *wbxml_xml2wbxml(PyObject *self, PyObject *args) {
/*A WB_UTINY is an unsigned char, so we can allow conversion directly from the Python string*/
WB_UTINY *xml;
WB_UTINY *wbxml;
WB_ULONG wbxmllen;
int status;
WBXMLConvXML2WBXMLParams    params;
WB_UTINY *errstr;
PyObject *result;
 
    /* Verify and read a string arg (xml) */
    if (!PyArg_ParseTuple(args, "s", &xml))
        return NULL;
 
    /* Pass that to libwbxml2 */
    params.keep_ignorable_ws = FALSE;
    params.use_strtbl = TRUE;
    params.wbxml_version = WBXML_VERSION_11;
    status = wbxml_conv_xml2wbxml(xml,&wbxml,&wbxmllen,&params);
    if(status == WBXML_OK) {
        errstr = NULL;  /*we return None to mean no error*/
    } else {
        errstr = (WB_UTINY *)wbxml_errors_string(status);
    }
 
    /* Build the return tuple of wbxml, error.
    The wbxml string is binary, so we need to convert it with a z#
    rather than a z.*/
 
    result = Py_BuildValue("(z#z)",wbxml,wbxmllen,errstr);
 
    /* Free the buffer that came back from the converter (thanks, Bob!) */
    wbxml_free((void *)wbxml);
 
    return result;
}

For details, I recommend you to the excellent on-line reference to the Python/C API (link goes directly to the section that explains conversion of values between C and Python).

[edit – removed dead link to the source for this… I don’t have it anymore, sorry!]

Right now, I don’t need to convert WBXML back to XML, so there’s no link to the reverse wbxml2xml function.  Like they say, open-source software scratches one’s own itches.  And if you did want to send the resulting SI in one or more SMS messages to a mobile, you’d also need to wrap the binary WBXML in a WSP (Wireless Session Protocol) header, which is another topic entirely[1].

[0] I suppose I should start calling this the Mobile Phone Content Project, since that’s a more accurate name.  Maybe when the tv adverts start…
[1] I do have working code to do this, so if anyone needs to wrangle WSP, feel free to drop me an email (or ask via blog comment) and I’ll share what I know.

Advertisements

He’s More Machine Now Than Man

Or, combining Exchange and PostFix to form a hideous cyborg being.

This useful article popped up on OSNews about avoiding the need to pay mucho dinero to upgrade ageing Exchange 5 setups by using PostFix (or some other secure MTA) to insulate Exchange from the wild and woolly Internet.

At our main office we have an old NT server (SBS to be exact) running Exchange 5.5.  It came as part of an original IT installation I inherited when I joined, together with a web-proxy-only Net connection over ISDN for which the installers charged an extortionate fee per month (as well as ISDN dialup charges).  One day I’ll have a little rant about how unsurprising it is that many small businesses don’t trust IT companies when ripoffs like that are so common… but not today.  Anyway, the problem I had was that the users were throroughly wedded to Outlook, when we switched to a sensible Net connection I had no intention of having the NT server and Exchange directly connected.  My previous job had included a huge mix of NT and Unix servers and I’d had the unpleasant experience of watching the Microsoft kit fall before the onslaught of vulnerabilities like sandcastles under an incoming tide.  I wanted something reliable and robust between the Net and NT.

The first job was to liberate an old machine and put RedHat on it (this was so long ago that RedHat 6.1 was current).  Next, Squid proxying to make the most of the (initially limited) bandwidth.  Then PostFix to deal with all incoming and outgoing email.  I used the redirect facility in ipchains to force all outgoing SMTP connections to port 25 to be rerouted through PostFix (thus giving me a way to at least track any trojans with built-in MTAs).  The NT server was moved behind this firewall system onto the LAN and Exchange was set to use Postfix for all outgoing mail.  All incoming mail was also routed to Exchange (after spam and virus filtering).  The users all keep their Outlook mailboxes and shared calendars.  All is well.  I’d migrate everyone to IMAP or even POP mail access, but frankly there’s no benefit to them and a lot of work for me.  So Exchange can stay, at version 5.5.

One of the many criticisms of MS operating systems is how often they need to be rebooted, but after this migration the NT server has actually been extremely stable.  It’s been rebooted after the odd IE update[0], but otherwise it’s run alongside a brace of Linux machines quite happily.  If only I could manage it by command line instead of VNC-over-VPN, I’d be even happier.

[0] Worth pointing out that it’s never used for web browsing, except to download the occasional update from Microsoft.  That in itself reduces the risk of exposure considerably.