A Matter Of Questions

The PlayStation Project revealed.  Another case study of Python and open-source tools on another Interesting Problem.

A Little Bit Of Background
E3 thunders towards us on the calendar[0] and mortal development teams quail before the onrushing juggernauts of deadlines.  Apart from our team, for whom the E3 deadline was a while back.  Now we have a whole new set of amusing comedy deadlines for beta tests, but that’s not important right now.  E3 is a big deal for us because it’s where the Playstation Project is now exposed to the nerveless, searching gaze of the Eye of Sauron… no, wait, I mean The Games Industry Press.  Same thing, smaller spiked helmets, as I understand it.

I’m a regular reader of Penny Arcade.  Not that I’m really a player in any sense these days; my favourite game is still Homeworld2, which is pretty much gathering dust on the shelves of most gamers’ rooms.  But I like Tycho’s writing, and Gabe’s style of art.  A while ago, they posted some comments from Geoffrey Zatkin, on the subject of new ideas for games.  Here’s a quote:

At PAX this year I was a judge for their “pitch your idea for a game” sit-in. I got to break a lot of hearts by telling the audience a very sad fact – that in my 8+ years as a professional game designer, not once has any boss of mine ever asked me for an idea for a new game. Not once. Again, unless you own the company, you get assigned a project (or jump ship to another company working on a game that sounds interesting). Sure, I’ve helped flesh out any number of games from concept to fully realized design. And that’s the hard part. Coming up with a good idea for a game is like coming up with a good idea for a novel. Everybody does that. But very few people have the discipline to sit down and write the book. The ideas are easy – the execution of the idea is the hard part.

But that’s what we’ve done; we came up with a new idea, something that genuinely hasn’t been done before[3], and we’ve done a massive great chunk of the work of executing that idea.  And it’s been interesting, in the best senses of the word.

It isn’t a first-person shooter.  It’s not a racing game.  No busty women swing over pits and solve puzzles.  It’s something a bit new, in a number of ways.  The game’s called Buzz![1], and it is, at heart, a music quiz.  There are nearly 1200 different clips of tracks involved, with a total of over 47,000 questions, in ten languages.  Our development partners do the clever 3D interface work (and a damn fine job they’ve done of it as well).  It’s been our job to gather, generate, edit, collate, audit, process and provide all of this data on which the quiz is based.  So, since this blog is (ostensibly) chiefly about techie things, I thought it might be interesting to explain the set of open-source software and tools that we’ve used to manage all of this.

Of Databases And Babel
Let’s start with the database engine (since that’s the core of it all).  The requirements that I drew up[2] were pretty much these:

  1. Open-source database.  There is no religious reason for this.  It’s a budget thing.
  2. That interfaces well with Python.  More on this below.
  3. And that supports Unicode properly.  I mean; has tools that support input and output of international characters sets.  And that does Unicode via the Python interface.
  4. Supports transactions.
  5. Accessible from Windows tools (ODBC, Python)
  6. Runs on Linux.  All our serious development servers run Fedora.

Given the above, it was pretty much a question of MySQL or PostgreSQL.  I chose MySQL for two reasons.  First; I’d used it many times before.  Second; the various Python/PostgreSQL packages all seemed to lack in one way or another, especially when it got down to sorting out Unicode.  It may be that there are neat solutions to any/all of the issues I found, but there seemed to be no great advantage to swapping a database I was familiar with for another I wasn’t.  MySQL (as of version 4.1.1) also has excellent Unicode support and speaks unto Python via the truly excellent MySQLdb package, courtesy of Andy Dustman, to whom I shall one day build a small shrine.

None of this would be much use if the general query tools for MySQL (like the Query Browser) didn’t work properly with international strings on Windows.  All our desktop machines run XP, and it’s critical that display, edit and copy/paste of strings in any language work.  Well, as long as you choose the right tools, they do, but that’s the nature of Unicode work for you.

The Language That Gets Everywhere
Why, you might ask, purely because it would help me move on to the next point, do you need a database that links with Python?  Thanks for asking.  Early on in the whole development process, I gave a lot of thought to the jobs that we’d have to do.  There were some guiding principles I followed.

  1. Whatever we think we’re going to be doing, it probably won’t happen the way we think it will.  Columns will change meaning.  Entities will be discarded and new ones appear.  Be flexible.
  2. We’re going to be gathering data from a zillion different sources.  Any data we capture is going to be dirty and need auditing and cleaning.
  3. There’s going to be so much data that any manual task applied to the thousands of records we’ll have might mean that we’ll run out of time.  Automate.

It would be nice, one day, to work again on the sort of project for which BigDesignUpFront would be applicable (that doesn’t, of course, mean that I’d do it; I’m pro-iterative myself).  Unfortunately, when you’re starting to gather data well in advance of knowing how that data will be used, you need to come up with something that’ll handle Big Change.  Thus “the database” in our little world doesn’t really mean the MySQL repository in which the data’s kept.  It’s MySQL plus thirty or so Python scripts and modules that Do Things To Data.

It’s always seemed to me that there are two approaches to database work.  I shall, for the purposes of discussion[4] class them as Database-centric and Code-centric development.

The database-centric approach was typified for me by a database developer I worked with at breathe, several years ago.  His approach to starting the working day was to (a) sit down at PC, (b) fire up a SQL Server client.  And that was him done; everything (and I mean everything) else was done from within the client.  List a directory?  Do some file copying?  It could all, apparently, be done from within the Database That God Gave Him.  A nice enough guy, but a tad fixated.

Of course, there are lesser examples of the approach and there’s much to be said for keeping the business logic next to the code in funky stored procedures.  But I never liked the languages in which they were written, nor the way in which the rectangular nature of the data forced its way into every corner of the design.  I am a dyed-in-the-wool code centric guy.  And, naturally, the code-centric approachs works thusly; your database holds your data.  Operations on that data are carried out by code, which fetches the data (whether that be into simple structures, objects that relate to the schema or objects that relate to whatever the hell they want to), Does Stuff to the data and rewrites the data back to the database.

The big point for me, though, that favours code-centricity is that it’s proven in the past to be more flexible in the ways that matter to me.  Maybe it’s a sign that I came to databases after high level programming languages (and to those after assembler).  Whatever; I like the code paradigm.  And in this case, it meant that I didn’t even attempt to create a vast and allencompassing ERD that encapsulated every last semantic of the data.  We just created a base set of simple tables that the data gathering team could begin to populate.  Over time and as prototypes of the game were created and revised, the schema grew to reflect the uses of the data; now it’s pretty complex.

So for this project, the phrase “the database” usually refers to the set of tables in the MySQL server plus the code that operates on it.  And that’s all in Python.

There were a couple of other options open to us.  We could have started in Access… but apart from the obvious scale issues, we needed a proper multi-user database that could be efficiently hosted on a remote server, with replication.  Also, I’m not a great fan of VB as a development language; for quick-and-dirty user interface jobs it’s great.  Slapping together forms to edit data?  Access can be the solution you need, using linked tables to get at the MySQL data where necessary.
We could have used Java for the main development, but working in a dynamic language has rather spoiled me for Java in odd ways.  Most of the supporting code for the database is all in a single Python module and the convenience of firing up a Python interpreter and being able to do ad-hoc processing at the command line was too good to pass up.  In fact, as we’ve approached the last few deadlines there have been many, many audits and sensibility checks all run from inside a Python interpreter (often Quasi).  The Java development cycle is just that bit too slow, and the definitions of objects a little too fixed to match that convenience.

The Swiss Army Chainsaw
There’re many audio files involved in this project, which means that sox has been a complete necessity at times.  Same rule applies as with the data; for a small company, even the simplest, fastest manual processing can be impossible when multiplied up by thousands of files.  Thus automation’s been key here, allowing all the samples to be crossfaded and normalised in batches.  For a lot of editing work, though, there is really no substitute for having the waveform visible on a PC and here, for once, proprietary tools have really won out.  Two of us in the company are musicians with home studios, and applications like Cool Edit or Wavelab have done a lot of the work.

So that’s it.  Perhaps it’s not an in-depth industry exposé of the use of open-source stuff to create the next Duke Nukem or Doom, but there’s a real project here resulting in a real game and that’s a snapshot of some of the ways we did it.  And are continuing to do it… there’s still work to do.  Back to Eclipse and QueryBrowser for me…

[0] RSS feeds and individual workloads being what they are, you may well be reading this after E3 2005 has come and gone.  In which case, try to imagine yourself back in the heady old days of May 2005.  Feel the period atmosphere.  Good, isn’t it?  Do they have flying cars yet in your time?
[1] Yes, the exclamation mark’s part of the name.  Like Yahoo!
[2] This was way back at the end of 2003.  It’s taken a long time to get this thing under way, and I might do another blog entry to explain why, and what happened along the way.
[3] True.  Yes, there have been quizzes.  Yes, there have been quizzes with clips of music (like the DVD Pepsi Challenge game, or the CD-based Spot The Intro).  But nobody’s ever done one with 1200 tracks on it.
[4] As opposed to the Porpoises of Discursion.  I had a small spelling checker issue that was too good to delete entirely…

We Do Not Live In Textbooks

Firefox lead Ben Goodger posts on the well-explored topic of perfection vs. the real world, or Pragmatism vs. Perfection as it might more snappily be captioned.  Apple chose to cut corners in the way they fixed bugs in the Safari rendered.  This did not meet with approval from the KDE team trying to remerge the fixes back into their tree.  Apple chose getting code to customers over doing the job perfectly.  That’s their prerogative, based on their needs.  The KDE team are likewise entitled to their opinions, which are based on their priorities.  Friction occurs when one team assume that their own worldview is axiomatic; that it is a Rule Of The Universe, rather than just one way of seeing the world.

A while ago, I was talking over database design with a contact.  I talked about the way in which the database for one of our projects was defined (this is something I’ll post more about; it’s a big project announced at E3 on the 17th, so when real-time passes that date I’ll be allowed to say more).  He was pretty scathing about it.  “This is all wrong”, he said, “it’s not normalized properly”.

My response was to point out the reasons for its existence; that it held the data for a project which had started more than eighteen months ago and at that time had been heading in a rather different direction.  That the requirements for the data had changed radically.  That the budget for the whole project was tight and that, most importantly of all, the system had delivered what was required when it was required.  In other words, it works, and works well, despite differing from his preferred Right Way To Do It.  That it doesn’t meet his textbook definition of the perfect database structure is not really that important a factor.  Sure, given the time to redo the whole thing from scratch we’d have used a different approach; but this is true of every project, everywhere.  Until you’re done, you won’t know the best way to do it.  The real world, which pays the bills, gets the casting vote.

Beer being the excellent social facilitator that it is, we moved on from the point of potential disagreement and ended up considering the many ways in which the pursuit of perfection can lead software engineers astray.  The perfect toolkit that is unusable because of its complexity.  The perfect operating system that tries to be all things to all users and ends up having to violate its own axioms[0].

I’m a great believer in pragmatism, having worked (in my younger days) on a system that strove to be perfect and grew to be a monster, never quite meeting the actual needs of the users because of striving for the stars.  Aim at the heavens by all means, but that’s a direction, not a goal.  Because we all fall short of perfection.

[0] Just to forestall the usual rash of rude emails, I’m referring to Windows here – the GUI is the axiom and Microsoft’s recent conceding that a command-line shell is a good (and maybe even better) way to control remote servers is the violation of that axiom.  You can, of course, insert your own Un*x-based example, should your personal opinions be more suited to it.

An Ever-Rollin’ River Of Rant

When I posted my lil’ case study of open-source software a week or so ago, a number of individuals were kind enough to contact me and unload their negative opinions of my comments.  I had, in the process of discussing the pros of various OSS projects, also mentioned a couple of cons, and this met with disfavour amongst a subsection of the “community”.  The positive comments I made apparently met with universal approval.  I found it a touch one-sided.  Maybe it was the crack about zealotry that did it… I didn’t intend to cause offence then, and I don’t now.

Anyway, in the hope that it will cause distress, upset, spontaneous combustion and a wave of myocardial incidents amongst the more highly-strung of those who peruse the RSS feeds, here is a link to the excellent Bile Blog (courtesy of Miguel de Icaza’s blog).  The particular entry linked is about Apache’s new Harmony project, but I encourage you to read down for more.  Enjoy.

I should point out that I offer this link in exactly the same spirit as Miguel did; because it’s funny. The BileBlog is the same sort of thing as Old Man Murray; the author hates almost everything and everyone. It’s done for effect, and shouldn’t be taken seriously 🙂

Greater Than The Sum Of Its Parts

A little case study of Python and open-source tools on a big, complex and yet oddly routine sort of problem.

The Floofs project continues to grow, with distributors signing up in many different countries.  This means that we have the job of producing many, many different sizes of any given animation to suit their requests.  And every animation can have several different watermarked variants; web preview, WAP preview, distributors logo, no logo, etc., all of which may need to be regenerated if the source image is updated.  It all builds up into one huge asset management job: over 153 000 files at the last count.

So how does a geek approach a task like this?  Firstly from the standpoint that has stood us in good stead for many years and shall continue to be our watchword into the future. Do as little work as possible by automating everything.  The second principle, given that this is a self-funding project, is to avoid the use of fancy content-management “solutions” and build as much as possible from open-source software.

The overall job of finding that which has changed/is new, building that which needs to be built and uploading that which needs to be uploaded is an absolutely canonical task for make; no prizes for guessing that.  But the sheer complexity of the makefiles (and the need to keep several hundred of them up to date) seemd to imply a mammoth task of rule creation and macro generation.  Being (a) lazy and (b) a programmer at heart, I opted for a better solution.  Write a Python script that creates the Makefiles.

The overall Python script takes around twenty seconds to run per group of around nine animations.  Given that this is on a 1Gb dual-processor build server, that might give you an idea of the large number of targets and dependencies involved.  It turns out that it makes more sense to write very big explicit makefiles, in which all the dependencies and commands are laid out in full, than to play with clever Make rules; they save time for humans, but when it’s code writing code, there’s little to be gained.  Essentially, the script gathers a list of the source images and then builds a huge list of targets, dependencies and commands that’s finally sorted and spat out into a Makefile.

In order to make the process more flexible, several of the commands that make will eventually invoke are themselves Python scripts.  Consider the job of resizing an animated GIF.  In theory it’s simple; take the GIF apart into component frames, resize each one, then reassemble.  In practise[0], it’s more complex than that.  GIF frame-sequence compression works best when pixels remain the same between frames, so the resizing process needs to try and ensure that happens even if the set of colours used between frames varies (most single-frame image resizing tools don’t work too well on this).  Also, GIF in-frame compression doesn’t work well with fuzzy edges and gradients, so anti-aliasing can result in big images.  But then again, non-antialiased images look terrible.  So there’s a set of Python scripts designed specifically to handle the seemingly easy job of resizing images without also making them twice the file size[2].

All of this image manipulation must be command-line; there are nothing like enough resources (whether you count time, money or people who grok PSP) to do the work manually in a GUI tool like Photoshop.  So it’d all collapse if it weren’t for gifsicle and ImageMagick.

The first is the best command-line GIF manipulation tool, bar none.  Runs on Windows and Unix.  Free.  And damn good at everything (except resizing, at which it does a non-anti-aliased quick and dirty job).  But for exploding, optimizing, commenting or running a soft polishing cloth over your GIFs, nothing comes close.

The second is the sort of toolset that free software zealots ought to parade down the street as a shining example[1].  ImageMagick tools can perform operations on images for which you’d normally expect to have to fire up the Gimp, PhotoShop or PSP, but from the command line.  Which means that once you’ve sorted out your commands and source materials, doing 153 000 images is as easy as doing one.  Its support for animated GIFs is not as good as for static images, but given that gifsicle can explode a GIF into separate frames and then reconstitute the original after those frames have been modified, the combination of the two is all you need.  Really.

And finally, I’d be nowhere without the language with which the IT systems of Paradise are no doubt built; Python.  “Your mileage” (as the Americans like to say) “may vary”, but there are damn few languages that are so completely cross-platform, scalable, supported by decent IDEs and object-oriented.  The ftplib module’s been used to build all the uploaders.  The very funky paramiko module does the same for SFTP.  The only thing that let me down was… the damn PIL.  An imaging library that has some of the worst GIF support I’ve yet seen.  Yes, I know all about the GIF patent issues[3], but de-emphasising support for a de-facto standard because of ideological convictions doesn’t work in the real world.  GIFs are what we’re stuck with; one works with what one has, not what one would wish for in an ideal world.  Still, if that’s the only fly in the Python soup, then I’ll keep eating.

So there; that probably wasted less than five minutes of your day on a brief description of how we manage several hundred thousand images with one command line.  Now excuse me; I must go and type make and watch it do my job for me…

[0] In theory, there’s no difference between theory and practise, but in practise, there is.
[1] Though to be honest, I can live without any more free software zealots, thank you very much.
[2] Part of the secret is dead obvious; always scale down from a larger size to a smaller.  Always.
[3] The biggest issue being that they’re no longer an issue in any area.  And they never were a barrier to writing a decent GIF reader.

Variations On A Theme By Adams

Naturally, nobody is the slightest bit interested in my opinions of the HitchHiker movie.

So here they are.

First of all, gratuitous and unnecessary metaphor time.  I see the HHG as a theme, rather like a Jazz standard, that’s been reworked in several media.  There were the original radio plays, the books, the BBC TV series and now the movie.  In most cases, Douglas A himself was at least partially involved in each, which is all to the good – it was his creation, after all.  Stretching the metaphor a little, let’s also consider that the HHG has, at its core, certain key attributes (let’s say these are like the basic chord structure or tensions of a tune).  I’d argue that these are:

  • Verbal humour
  • A sparkling, surreal and tagential wit (and by tangential I mean “goes off at tangents for the sheer fun of it)
  • A slightly cynical and rather British attitude to life (the Universe, and Everything)
  • The media in which the HHG has worked best are those where the key attributes have shone.  The radio plays were all about words (though the sound and music were a fantastic support for it all).  Ditto the books, where DA could spend even more time on his love of verbal trickery and prose designed to flick the reader’s mind from image to image.  In both cases, the plot (such as existed) was a stream on which to hang jokes.  The TV series and the movie were visual; because they’re visual media.  In general, I thought the TV series was the worst incarnation.

    That’s thought, past tense.  And then I went and saw the movie.  With my wife, an intelligent and thoughful person who has never been exposed to the HHG in any depth before.  And I wanted to love it, I really did; and for her to (at the very least) enjoy it… but both of us, for different reasons, were hideously disappointed.

    She saw a movie with no real plot, with a few sketchy characters that made little attempt to explain the context of a series of jokes that left her cold.  It meant nothing to her; didn’t even seem to want to try to explain itself to her.  It preached to the faithful, not to the neophyte.
    I saw something that had been a part of the way I think since I first heard the plays at the age of 13… but via a medium that had torn it up, shredded it, picked up some of the juicier bits and crammed them into a structure that didn’t have much to do with the things that had entranced me.  Gone was much of the verbosity, the tangential exposition, the love of irrelevant but wonderful detail.  Even the characters weren’t the same; true, the Vogons looked exactly like Vogons should look but Arthur… wasn’t.  Arthur Dent should not be not brave, should not risk all for the woman he lost.  I saw elements that I recognised (the jewelled scuttling crabs and the beautiful creatures of Vogsphere) that made no sense at all to anyone who wasn’t steeped in HHG lore.  I came out bitterly disappointed.  I well appreciated the problems involved in turning the HHG into a movie, but this failed both me and my wife on so many levels.

    The elucidatory and expositional Yoz Graham (whose merest blog entries I am not worthy to footnote) puts it thus:
    The secret of Hitchhiker’s success is that it means something different to everyone.  That something could be a mood, or a scene, or even a single line. If your particular something is included in the film, you’ll probably love it. If it isn’t, you probably won’t.

    But some of my favourite ever lines were in that movie, and love was the emotion furthest from my mind as I emerged into the light of the Trafford Centre.  There went two hours of my life which I had not only lost forever, but invested in something that had lost me far more; part of the love that I had felt for what the HHG had meant to me.  Chewed up and spat out, with the best possible intentions, by movie makers.

    I think you ought to know I’m feeling very depressed.