Greater Than The Sum Of Its Parts

A little case study of Python and open-source tools on a big, complex and yet oddly routine sort of problem.

The Floofs project continues to grow, with distributors signing up in many different countries.  This means that we have the job of producing many, many different sizes of any given animation to suit their requests.  And every animation can have several different watermarked variants; web preview, WAP preview, distributors logo, no logo, etc., all of which may need to be regenerated if the source image is updated.  It all builds up into one huge asset management job: over 153 000 files at the last count.

So how does a geek approach a task like this?  Firstly from the standpoint that has stood us in good stead for many years and shall continue to be our watchword into the future. Do as little work as possible by automating everything.  The second principle, given that this is a self-funding project, is to avoid the use of fancy content-management “solutions” and build as much as possible from open-source software.

The overall job of finding that which has changed/is new, building that which needs to be built and uploading that which needs to be uploaded is an absolutely canonical task for make; no prizes for guessing that.  But the sheer complexity of the makefiles (and the need to keep several hundred of them up to date) seemd to imply a mammoth task of rule creation and macro generation.  Being (a) lazy and (b) a programmer at heart, I opted for a better solution.  Write a Python script that creates the Makefiles.

The overall Python script takes around twenty seconds to run per group of around nine animations.  Given that this is on a 1Gb dual-processor build server, that might give you an idea of the large number of targets and dependencies involved.  It turns out that it makes more sense to write very big explicit makefiles, in which all the dependencies and commands are laid out in full, than to play with clever Make rules; they save time for humans, but when it’s code writing code, there’s little to be gained.  Essentially, the script gathers a list of the source images and then builds a huge list of targets, dependencies and commands that’s finally sorted and spat out into a Makefile.

In order to make the process more flexible, several of the commands that make will eventually invoke are themselves Python scripts.  Consider the job of resizing an animated GIF.  In theory it’s simple; take the GIF apart into component frames, resize each one, then reassemble.  In practise[0], it’s more complex than that.  GIF frame-sequence compression works best when pixels remain the same between frames, so the resizing process needs to try and ensure that happens even if the set of colours used between frames varies (most single-frame image resizing tools don’t work too well on this).  Also, GIF in-frame compression doesn’t work well with fuzzy edges and gradients, so anti-aliasing can result in big images.  But then again, non-antialiased images look terrible.  So there’s a set of Python scripts designed specifically to handle the seemingly easy job of resizing images without also making them twice the file size[2].

All of this image manipulation must be command-line; there are nothing like enough resources (whether you count time, money or people who grok PSP) to do the work manually in a GUI tool like Photoshop.  So it’d all collapse if it weren’t for gifsicle and ImageMagick.

The first is the best command-line GIF manipulation tool, bar none.  Runs on Windows and Unix.  Free.  And damn good at everything (except resizing, at which it does a non-anti-aliased quick and dirty job).  But for exploding, optimizing, commenting or running a soft polishing cloth over your GIFs, nothing comes close.

The second is the sort of toolset that free software zealots ought to parade down the street as a shining example[1].  ImageMagick tools can perform operations on images for which you’d normally expect to have to fire up the Gimp, PhotoShop or PSP, but from the command line.  Which means that once you’ve sorted out your commands and source materials, doing 153 000 images is as easy as doing one.  Its support for animated GIFs is not as good as for static images, but given that gifsicle can explode a GIF into separate frames and then reconstitute the original after those frames have been modified, the combination of the two is all you need.  Really.

And finally, I’d be nowhere without the language with which the IT systems of Paradise are no doubt built; Python.  “Your mileage” (as the Americans like to say) “may vary”, but there are damn few languages that are so completely cross-platform, scalable, supported by decent IDEs and object-oriented.  The ftplib module’s been used to build all the uploaders.  The very funky paramiko module does the same for SFTP.  The only thing that let me down was… the damn PIL.  An imaging library that has some of the worst GIF support I’ve yet seen.  Yes, I know all about the GIF patent issues[3], but de-emphasising support for a de-facto standard because of ideological convictions doesn’t work in the real world.  GIFs are what we’re stuck with; one works with what one has, not what one would wish for in an ideal world.  Still, if that’s the only fly in the Python soup, then I’ll keep eating.

So there; that probably wasted less than five minutes of your day on a brief description of how we manage several hundred thousand images with one command line.  Now excuse me; I must go and type make and watch it do my job for me…

[0] In theory, there’s no difference between theory and practise, but in practise, there is.
[1] Though to be honest, I can live without any more free software zealots, thank you very much.
[2] Part of the secret is dead obvious; always scale down from a larger size to a smaller.  Always.
[3] The biggest issue being that they’re no longer an issue in any area.  And they never were a barrier to writing a decent GIF reader.

Advertisements