You Have Mail

I’ve spent much of the last week or so in an office where the Internet is accessible only through the tightest of firewalls.  There’s no complaint implied here; the value of some of the IP in the building might exceed the value of the entire company, so paranoia is amply justified, but it does mean that I can’t fetch my usual POP3 email.  This gave me the perfect excuse to see how GMail is going.

I should really refer to it as GoogleMail since I’m in the UK where, amusingly enough, some other outfit has been using the name “GMail” for a while.  I do enjoy it when Big American Corporations forget that the rest of the world exists when they’re looking at trademarks and patents.  Any road up, whatever you call it, a couple of IMs later I had an invitation to sign up and a shiny new address.  Many others have written with far more skill and judgement than I on the subject of how G[oogle]Mail does what it does, so I’ll refrain from cluttering up the RSS feeds with yet more.  What struck me as worth commenting on was the contrast with other web mail interfaces… specifically Exchange’s.

For reasons of not-getting-around-to-it, I have only web access to an email account at the place I’m working (it’s my laptop, it’s not in their domain, etc, etc).  This gives me the sort of web interface that takes one back to the heady, pre-Ajax days five years ago, when Hotmail was king.  I mean: it’s awful.  Pages refresh for any change, looking up any data whilst in the middle of writing an email involves an endless dance of Open Link In New Tab (this is all in Firefox, but IE doesn’t add anything).  Google, in contrast, have really worked hard and come up with a web interface that’s arguably better than some PC-based mail clients.

I think it’s a question of attitude.  Google have jumped headfirst into the whole Ajax & web thing (with the exception of Google Earth).  Thus the web interface is a poor relation in the eyes of any company who see the world in terms of PC-based applications, whereas it’s the primary way of doing anything for Google.

Plus Ca Change

The lack of updates (for which many will have found their RSS feeds to be the cleaner and more informative) is for a good reason; I’m just emerging from a period of transition from one job to another.  During such a time, Those Who Blog can find it appropriate to stop saying anything new in public lest it affect the impression that potential employers have of them.  Thus the silence.
Of course, it’s daft to start worrying about what your blog says when you move jobs; you should worry when you make any entry.  In fact, you should worry just before you click “Post” or the equivalent and consider: What will that entry say about me in five years’ time?  Google caching and the Wayback Machine, amongst others, mean that every youthful indiscretion that one blogs is, potentially, there forever (and that, as Prince said, is a mighty long time).
I don’t think there’s anything on here that’s counted against me, though one contact did ask me why there was so much about programming when I don’t really market myself as a programmer any more.  My response was to point out that one should never lose touch with the basic skills of your profession, whatever your level in it.  And that there’s no good reason I shouldn’t mess around with bits of Python and Java if I want to, even if I’m now moving on to do a job even more removed from the basic bit-shuffling that we laughingly term software engineering.  For better or worse, whether I’m presenting a proposal to a board or trying to define a mobile product strategy, I am at heart still someone who sees systems in terms of lines of code; I don’t think that’ll ever change and I wouldn’t want it to. Vive, if you will, la provenance

Absolument Disparu, Like Mother’s Mink

The title today being a quotation from Nigel Molesworth’s serious and worthwhile autobiographies, which you either know or you don’t.

The guys over at Penny Arcade experience the Lesson About Backups that hits us all, eventually.  I’ve long believed that there’s an axiom of computing (of the practise, not the theory) that states:

You will only truly understand how important backups are when you realise that you don’t have one, and that because you don’t, you have lost something now irretrievable.

At that moment, one becomes enlightened, though it is probably not nirvana that you reach.  However, experience, it is said, is cheap at any price and doubly so when someone else is paying.  So I took a quarter of an hour to quickly run through the many and varied backup systems here and at the office, verifying that they’re working as they should.  Conveniently enough, there’s a gap of ninety miles between the two; sufficiently large that any natural disaster capable of affecting both would leave me with other priorities than recovering backups, like basic survival in a post-holocaust wilderness, for example.  Thus backups run between the two sites, overnight, replicating all critical data.  That’s in addition to the suite of tape drives in the office, and the Linux mirrored array that holds duplicates of the family photos and digital documents from the working PCs here.

I don’t think I’ve ever heard a genuine geek say that backup system is too redundant.  At least, nobody who’s been through their own Lesson About Backups.

Wibbly Wobbly WBXML

Ok, so I couldn’t think of a title that’s as wilfully obscure as the usual ones.  Whatever.

For reasons of Commerce, I need to be able to generate WBXML messages within the guts of the mighty Python/Zope engine that powers the Mobile Phone Project[0].  What, I hear you ask, the blinking flip is WBXML?  Well, if you don’t know, you probably want to keep it that way, but you did ask.

WBXML is a binary encoding of XML.  XML is, of course, a textual encoding of data… some of which may originally have been binary.  So it’s sort of an extra level of complication added to something that’s already complicated, but hey, that’s what geeks do, isn’t it?  The reason that it’s a binary encoding is that XML is bulky.  Most of the time that bulk doesn’t matter that much; I’ll trade bandwidth, memory or CPU time for explicitness any day of the week.  But if you’re trying to pack XML over a slow, laggy, prone-to-being-interrupted-by-trees-or-birds wireless link to a phone, bulk is bad.  It’s even worse if you’re trying to pack an XML Service Indication (essentially, a pushed URL) into the tiny size of a single SMS message.  Hence the binaryness.

WBXML isn’t anything as simple as, say, a gzipped version of the XML stream.  Instead, it’s a carefully rigorous specification of how individual single byte values map to either XML or text strings.  For example, the XML <SI> maps to the binary value 0x05, and <INDICATION> maps to 0x06.  But it’s clever; if the HREF attribute of the INDICATION starts with “http://&#8221;, then the whole attribute-starting-http maps to 0x0C.  If the HREF starts with “http://www&#8221; then it’s mapped to 0x0D, saving another three bytes, and so on.  The more common the string, the more likely it is to have a fixed mapping.  There’s also a neat string-table option; commonly used string can be folded into single-byte offsets into a string table (in effect, any repeated string longer than three bytes is worth string-table-izing).

This is non-trivial stuff to knock up in a hurry, so it’s just as well that there’s the libwbxml open-source library to handle it all.  That library, however, is in C, and I’m working in Python.  There appear to be no published Python binding to libwbxml, so it was time to dust off my ancient experience of #include <Python.h> and get to it.

Here’s the C code that allows a Python call to libwbxml’s xml2wbxml function:

static PyObject *wbxml_xml2wbxml(PyObject *self, PyObject *args) {
/*A WB_UTINY is an unsigned char, so we can allow conversion directly from the Python string*/
WB_UTINY *xml;
WB_UTINY *wbxml;
WB_ULONG wbxmllen;
int status;
WBXMLConvXML2WBXMLParams    params;
WB_UTINY *errstr;
PyObject *result;
    /* Verify and read a string arg (xml) */
    if (!PyArg_ParseTuple(args, "s", &xml))
        return NULL;
    /* Pass that to libwbxml2 */
    params.keep_ignorable_ws = FALSE;
    params.use_strtbl = TRUE;
    params.wbxml_version = WBXML_VERSION_11;
    status = wbxml_conv_xml2wbxml(xml,&wbxml,&wbxmllen,&params);
    if(status == WBXML_OK) {
        errstr = NULL;  /*we return None to mean no error*/
    } else {
        errstr = (WB_UTINY *)wbxml_errors_string(status);
    /* Build the return tuple of wbxml, error.
    The wbxml string is binary, so we need to convert it with a z#
    rather than a z.*/
    result = Py_BuildValue("(z#z)",wbxml,wbxmllen,errstr);
    /* Free the buffer that came back from the converter (thanks, Bob!) */
    wbxml_free((void *)wbxml);
    return result;

For details, I recommend you to the excellent on-line reference to the Python/C API (link goes directly to the section that explains conversion of values between C and Python).

[edit – removed dead link to the source for this… I don’t have it anymore, sorry!]

Right now, I don’t need to convert WBXML back to XML, so there’s no link to the reverse wbxml2xml function.  Like they say, open-source software scratches one’s own itches.  And if you did want to send the resulting SI in one or more SMS messages to a mobile, you’d also need to wrap the binary WBXML in a WSP (Wireless Session Protocol) header, which is another topic entirely[1].

[0] I suppose I should start calling this the Mobile Phone Content Project, since that’s a more accurate name.  Maybe when the tv adverts start…
[1] I do have working code to do this, so if anyone needs to wrangle WSP, feel free to drop me an email (or ask via blog comment) and I’ll share what I know.

He’s More Machine Now Than Man

Or, combining Exchange and PostFix to form a hideous cyborg being.

This useful article popped up on OSNews about avoiding the need to pay mucho dinero to upgrade ageing Exchange 5 setups by using PostFix (or some other secure MTA) to insulate Exchange from the wild and woolly Internet.

At our main office we have an old NT server (SBS to be exact) running Exchange 5.5.  It came as part of an original IT installation I inherited when I joined, together with a web-proxy-only Net connection over ISDN for which the installers charged an extortionate fee per month (as well as ISDN dialup charges).  One day I’ll have a little rant about how unsurprising it is that many small businesses don’t trust IT companies when ripoffs like that are so common… but not today.  Anyway, the problem I had was that the users were throroughly wedded to Outlook, when we switched to a sensible Net connection I had no intention of having the NT server and Exchange directly connected.  My previous job had included a huge mix of NT and Unix servers and I’d had the unpleasant experience of watching the Microsoft kit fall before the onslaught of vulnerabilities like sandcastles under an incoming tide.  I wanted something reliable and robust between the Net and NT.

The first job was to liberate an old machine and put RedHat on it (this was so long ago that RedHat 6.1 was current).  Next, Squid proxying to make the most of the (initially limited) bandwidth.  Then PostFix to deal with all incoming and outgoing email.  I used the redirect facility in ipchains to force all outgoing SMTP connections to port 25 to be rerouted through PostFix (thus giving me a way to at least track any trojans with built-in MTAs).  The NT server was moved behind this firewall system onto the LAN and Exchange was set to use Postfix for all outgoing mail.  All incoming mail was also routed to Exchange (after spam and virus filtering).  The users all keep their Outlook mailboxes and shared calendars.  All is well.  I’d migrate everyone to IMAP or even POP mail access, but frankly there’s no benefit to them and a lot of work for me.  So Exchange can stay, at version 5.5.

One of the many criticisms of MS operating systems is how often they need to be rebooted, but after this migration the NT server has actually been extremely stable.  It’s been rebooted after the odd IE update[0], but otherwise it’s run alongside a brace of Linux machines quite happily.  If only I could manage it by command line instead of VNC-over-VPN, I’d be even happier.

[0] Worth pointing out that it’s never used for web browsing, except to download the occasional update from Microsoft.  That in itself reduces the risk of exposure considerably.

Have It Your Way

…As Burger King allegedly say.  Pondering customizability.

The inscrutable Raymond Chen has an blog called The Old New Thing, in which he (as a Microsoft person) posts on many detailed and interesting topics related to the internals of Windows.  Even if you’re a Unixite so fervent that your car has a command line, it’s worth reading to see why certain things in Windows are they way that they are.  He has a recent entry on Why can’t the default drag/drop behavior be changed? (in Explorer) which highlights something I was thinking about recently; the twin and somewhat opposed worldviews regarding interfaces.

When, in the past, I’ve run Linux desktops[0], I’ve spent happy hours playing with the myriad subtle and singular configuration options that let me set it up just how I like it, with all my favourite key and mouse combinations spread across all the applications that I need to use.  Windows, naturalmente, doesn’t let me do that.  It works how it works.  The rationale for this, according to Raymond is:

[customization] removes some of the predictability from the user interface. One of the benefits of a common user interface is that once you learn it, you can apply the rules generally. But if each user could customize how drag/drop works, then the knowledge you developed with drag/drop wouldn’t transfer to other people’s machines.


Infinite customizability also means that you can’t just sit down in front of somebody’s machine and start using it. You first have to learn how they customized their menus, button clicks, default drag effects, and keyboard macros.

I’m not sure where I’d stand on this point.  I like things the way I like them, but I also hate sitting down at an unfamiliar system and not having things ready-to-hand.  Perhaps it’s a question of ownership of the machine in question…

[0] No axe to grind here, it’s just that all my Linux systems are servers these days.

Misidentification cards

Things spread via links.  Thus, at the risk of annoying the places where this blog is syndicated, may I offer for any other UK citizens who have concerns about ID cards, this link to a pledge to refuse to register for them.
I now return you to your regularly scheduled programmes of useful Python/Java-related information.

Those Sodding Puzzles

The perceptive Tom Hume posts about getting puzzles onto mobile phones.  Our own mobile project(s) are beginning to ramp up heavily (we’re even recruiting, so if you know a ZPT-literate web designer who might want a job, let me know), but not in the direction of puzzles.  I think it’ll be interesting to see how someone approaches the problem of making a phone interface do anything as nice as a piece of paper with a puzzle on.

Like many other Brits, I’ve been caught up to some extent in the frenzy of Sudoku.  And being, at the very core of my head, a programmer, I’ve been pondering algorithms for solving them.  I’m not about to post any Python or Java to help here (not yet, at any rate), but what might be useful to others is this worksheet.  The idea’s very simple; the bottom sheet has all the possible numbers for every square shown.  Cross them out with a pen/pencil as they become evidently impossible.  As Sherlock Holmes said, when you eliminate the impossible, whatever’s left must be true.  Although he was a sociopathic junkie, really.  And didn’t do sudoku.

A Matter Of Questions

The PlayStation Project revealed.  Another case study of Python and open-source tools on another Interesting Problem.

A Little Bit Of Background
E3 thunders towards us on the calendar[0] and mortal development teams quail before the onrushing juggernauts of deadlines.  Apart from our team, for whom the E3 deadline was a while back.  Now we have a whole new set of amusing comedy deadlines for beta tests, but that’s not important right now.  E3 is a big deal for us because it’s where the Playstation Project is now exposed to the nerveless, searching gaze of the Eye of Sauron… no, wait, I mean The Games Industry Press.  Same thing, smaller spiked helmets, as I understand it.

I’m a regular reader of Penny Arcade.  Not that I’m really a player in any sense these days; my favourite game is still Homeworld2, which is pretty much gathering dust on the shelves of most gamers’ rooms.  But I like Tycho’s writing, and Gabe’s style of art.  A while ago, they posted some comments from Geoffrey Zatkin, on the subject of new ideas for games.  Here’s a quote:

At PAX this year I was a judge for their “pitch your idea for a game” sit-in. I got to break a lot of hearts by telling the audience a very sad fact – that in my 8+ years as a professional game designer, not once has any boss of mine ever asked me for an idea for a new game. Not once. Again, unless you own the company, you get assigned a project (or jump ship to another company working on a game that sounds interesting). Sure, I’ve helped flesh out any number of games from concept to fully realized design. And that’s the hard part. Coming up with a good idea for a game is like coming up with a good idea for a novel. Everybody does that. But very few people have the discipline to sit down and write the book. The ideas are easy – the execution of the idea is the hard part.

But that’s what we’ve done; we came up with a new idea, something that genuinely hasn’t been done before[3], and we’ve done a massive great chunk of the work of executing that idea.  And it’s been interesting, in the best senses of the word.

It isn’t a first-person shooter.  It’s not a racing game.  No busty women swing over pits and solve puzzles.  It’s something a bit new, in a number of ways.  The game’s called Buzz![1], and it is, at heart, a music quiz.  There are nearly 1200 different clips of tracks involved, with a total of over 47,000 questions, in ten languages.  Our development partners do the clever 3D interface work (and a damn fine job they’ve done of it as well).  It’s been our job to gather, generate, edit, collate, audit, process and provide all of this data on which the quiz is based.  So, since this blog is (ostensibly) chiefly about techie things, I thought it might be interesting to explain the set of open-source software and tools that we’ve used to manage all of this.

Of Databases And Babel
Let’s start with the database engine (since that’s the core of it all).  The requirements that I drew up[2] were pretty much these:

  1. Open-source database.  There is no religious reason for this.  It’s a budget thing.
  2. That interfaces well with Python.  More on this below.
  3. And that supports Unicode properly.  I mean; has tools that support input and output of international characters sets.  And that does Unicode via the Python interface.
  4. Supports transactions.
  5. Accessible from Windows tools (ODBC, Python)
  6. Runs on Linux.  All our serious development servers run Fedora.

Given the above, it was pretty much a question of MySQL or PostgreSQL.  I chose MySQL for two reasons.  First; I’d used it many times before.  Second; the various Python/PostgreSQL packages all seemed to lack in one way or another, especially when it got down to sorting out Unicode.  It may be that there are neat solutions to any/all of the issues I found, but there seemed to be no great advantage to swapping a database I was familiar with for another I wasn’t.  MySQL (as of version 4.1.1) also has excellent Unicode support and speaks unto Python via the truly excellent MySQLdb package, courtesy of Andy Dustman, to whom I shall one day build a small shrine.

None of this would be much use if the general query tools for MySQL (like the Query Browser) didn’t work properly with international strings on Windows.  All our desktop machines run XP, and it’s critical that display, edit and copy/paste of strings in any language work.  Well, as long as you choose the right tools, they do, but that’s the nature of Unicode work for you.

The Language That Gets Everywhere
Why, you might ask, purely because it would help me move on to the next point, do you need a database that links with Python?  Thanks for asking.  Early on in the whole development process, I gave a lot of thought to the jobs that we’d have to do.  There were some guiding principles I followed.

  1. Whatever we think we’re going to be doing, it probably won’t happen the way we think it will.  Columns will change meaning.  Entities will be discarded and new ones appear.  Be flexible.
  2. We’re going to be gathering data from a zillion different sources.  Any data we capture is going to be dirty and need auditing and cleaning.
  3. There’s going to be so much data that any manual task applied to the thousands of records we’ll have might mean that we’ll run out of time.  Automate.

It would be nice, one day, to work again on the sort of project for which BigDesignUpFront would be applicable (that doesn’t, of course, mean that I’d do it; I’m pro-iterative myself).  Unfortunately, when you’re starting to gather data well in advance of knowing how that data will be used, you need to come up with something that’ll handle Big Change.  Thus “the database” in our little world doesn’t really mean the MySQL repository in which the data’s kept.  It’s MySQL plus thirty or so Python scripts and modules that Do Things To Data.

It’s always seemed to me that there are two approaches to database work.  I shall, for the purposes of discussion[4] class them as Database-centric and Code-centric development.

The database-centric approach was typified for me by a database developer I worked with at breathe, several years ago.  His approach to starting the working day was to (a) sit down at PC, (b) fire up a SQL Server client.  And that was him done; everything (and I mean everything) else was done from within the client.  List a directory?  Do some file copying?  It could all, apparently, be done from within the Database That God Gave Him.  A nice enough guy, but a tad fixated.

Of course, there are lesser examples of the approach and there’s much to be said for keeping the business logic next to the code in funky stored procedures.  But I never liked the languages in which they were written, nor the way in which the rectangular nature of the data forced its way into every corner of the design.  I am a dyed-in-the-wool code centric guy.  And, naturally, the code-centric approachs works thusly; your database holds your data.  Operations on that data are carried out by code, which fetches the data (whether that be into simple structures, objects that relate to the schema or objects that relate to whatever the hell they want to), Does Stuff to the data and rewrites the data back to the database.

The big point for me, though, that favours code-centricity is that it’s proven in the past to be more flexible in the ways that matter to me.  Maybe it’s a sign that I came to databases after high level programming languages (and to those after assembler).  Whatever; I like the code paradigm.  And in this case, it meant that I didn’t even attempt to create a vast and allencompassing ERD that encapsulated every last semantic of the data.  We just created a base set of simple tables that the data gathering team could begin to populate.  Over time and as prototypes of the game were created and revised, the schema grew to reflect the uses of the data; now it’s pretty complex.

So for this project, the phrase “the database” usually refers to the set of tables in the MySQL server plus the code that operates on it.  And that’s all in Python.

There were a couple of other options open to us.  We could have started in Access… but apart from the obvious scale issues, we needed a proper multi-user database that could be efficiently hosted on a remote server, with replication.  Also, I’m not a great fan of VB as a development language; for quick-and-dirty user interface jobs it’s great.  Slapping together forms to edit data?  Access can be the solution you need, using linked tables to get at the MySQL data where necessary.
We could have used Java for the main development, but working in a dynamic language has rather spoiled me for Java in odd ways.  Most of the supporting code for the database is all in a single Python module and the convenience of firing up a Python interpreter and being able to do ad-hoc processing at the command line was too good to pass up.  In fact, as we’ve approached the last few deadlines there have been many, many audits and sensibility checks all run from inside a Python interpreter (often Quasi).  The Java development cycle is just that bit too slow, and the definitions of objects a little too fixed to match that convenience.

The Swiss Army Chainsaw
There’re many audio files involved in this project, which means that sox has been a complete necessity at times.  Same rule applies as with the data; for a small company, even the simplest, fastest manual processing can be impossible when multiplied up by thousands of files.  Thus automation’s been key here, allowing all the samples to be crossfaded and normalised in batches.  For a lot of editing work, though, there is really no substitute for having the waveform visible on a PC and here, for once, proprietary tools have really won out.  Two of us in the company are musicians with home studios, and applications like Cool Edit or Wavelab have done a lot of the work.

So that’s it.  Perhaps it’s not an in-depth industry exposé of the use of open-source stuff to create the next Duke Nukem or Doom, but there’s a real project here resulting in a real game and that’s a snapshot of some of the ways we did it.  And are continuing to do it… there’s still work to do.  Back to Eclipse and QueryBrowser for me…

[0] RSS feeds and individual workloads being what they are, you may well be reading this after E3 2005 has come and gone.  In which case, try to imagine yourself back in the heady old days of May 2005.  Feel the period atmosphere.  Good, isn’t it?  Do they have flying cars yet in your time?
[1] Yes, the exclamation mark’s part of the name.  Like Yahoo!
[2] This was way back at the end of 2003.  It’s taken a long time to get this thing under way, and I might do another blog entry to explain why, and what happened along the way.
[3] True.  Yes, there have been quizzes.  Yes, there have been quizzes with clips of music (like the DVD Pepsi Challenge game, or the CD-based Spot The Intro).  But nobody’s ever done one with 1200 tracks on it.
[4] As opposed to the Porpoises of Discursion.  I had a small spelling checker issue that was too good to delete entirely…