Greater Than The Sum Of Its Parts

A little case study of Python and open-source tools on a big, complex and yet oddly routine sort of problem.

The Floofs project continues to grow, with distributors signing up in many different countries.  This means that we have the job of producing many, many different sizes of any given animation to suit their requests.  And every animation can have several different watermarked variants; web preview, WAP preview, distributors logo, no logo, etc., all of which may need to be regenerated if the source image is updated.  It all builds up into one huge asset management job: over 153 000 files at the last count.

So how does a geek approach a task like this?  Firstly from the standpoint that has stood us in good stead for many years and shall continue to be our watchword into the future. Do as little work as possible by automating everything.  The second principle, given that this is a self-funding project, is to avoid the use of fancy content-management “solutions” and build as much as possible from open-source software.

The overall job of finding that which has changed/is new, building that which needs to be built and uploading that which needs to be uploaded is an absolutely canonical task for make; no prizes for guessing that.  But the sheer complexity of the makefiles (and the need to keep several hundred of them up to date) seemd to imply a mammoth task of rule creation and macro generation.  Being (a) lazy and (b) a programmer at heart, I opted for a better solution.  Write a Python script that creates the Makefiles.

The overall Python script takes around twenty seconds to run per group of around nine animations.  Given that this is on a 1Gb dual-processor build server, that might give you an idea of the large number of targets and dependencies involved.  It turns out that it makes more sense to write very big explicit makefiles, in which all the dependencies and commands are laid out in full, than to play with clever Make rules; they save time for humans, but when it’s code writing code, there’s little to be gained.  Essentially, the script gathers a list of the source images and then builds a huge list of targets, dependencies and commands that’s finally sorted and spat out into a Makefile.

In order to make the process more flexible, several of the commands that make will eventually invoke are themselves Python scripts.  Consider the job of resizing an animated GIF.  In theory it’s simple; take the GIF apart into component frames, resize each one, then reassemble.  In practise[0], it’s more complex than that.  GIF frame-sequence compression works best when pixels remain the same between frames, so the resizing process needs to try and ensure that happens even if the set of colours used between frames varies (most single-frame image resizing tools don’t work too well on this).  Also, GIF in-frame compression doesn’t work well with fuzzy edges and gradients, so anti-aliasing can result in big images.  But then again, non-antialiased images look terrible.  So there’s a set of Python scripts designed specifically to handle the seemingly easy job of resizing images without also making them twice the file size[2].

All of this image manipulation must be command-line; there are nothing like enough resources (whether you count time, money or people who grok PSP) to do the work manually in a GUI tool like Photoshop.  So it’d all collapse if it weren’t for gifsicle and ImageMagick.

The first is the best command-line GIF manipulation tool, bar none.  Runs on Windows and Unix.  Free.  And damn good at everything (except resizing, at which it does a non-anti-aliased quick and dirty job).  But for exploding, optimizing, commenting or running a soft polishing cloth over your GIFs, nothing comes close.

The second is the sort of toolset that free software zealots ought to parade down the street as a shining example[1].  ImageMagick tools can perform operations on images for which you’d normally expect to have to fire up the Gimp, PhotoShop or PSP, but from the command line.  Which means that once you’ve sorted out your commands and source materials, doing 153 000 images is as easy as doing one.  Its support for animated GIFs is not as good as for static images, but given that gifsicle can explode a GIF into separate frames and then reconstitute the original after those frames have been modified, the combination of the two is all you need.  Really.

And finally, I’d be nowhere without the language with which the IT systems of Paradise are no doubt built; Python.  “Your mileage” (as the Americans like to say) “may vary”, but there are damn few languages that are so completely cross-platform, scalable, supported by decent IDEs and object-oriented.  The ftplib module’s been used to build all the uploaders.  The very funky paramiko module does the same for SFTP.  The only thing that let me down was… the damn PIL.  An imaging library that has some of the worst GIF support I’ve yet seen.  Yes, I know all about the GIF patent issues[3], but de-emphasising support for a de-facto standard because of ideological convictions doesn’t work in the real world.  GIFs are what we’re stuck with; one works with what one has, not what one would wish for in an ideal world.  Still, if that’s the only fly in the Python soup, then I’ll keep eating.

So there; that probably wasted less than five minutes of your day on a brief description of how we manage several hundred thousand images with one command line.  Now excuse me; I must go and type make and watch it do my job for me…

[0] In theory, there’s no difference between theory and practise, but in practise, there is.
[1] Though to be honest, I can live without any more free software zealots, thank you very much.
[2] Part of the secret is dead obvious; always scale down from a larger size to a smaller.  Always.
[3] The biggest issue being that they’re no longer an issue in any area.  And they never were a barrier to writing a decent GIF reader.

Advertisements

9 thoughts on “Greater Than The Sum Of Its Parts

    • Hmmm, hadn’t even heard of it. Looks interesting, but make does the job right now, so we’re unlikely to change. Processor time or memory usage in make itself isn’t really an issue – we run almost everything as “make -j” so that all jobs run in parallel. All that image processing is I/O bound…
      Thanks for the link anyway!

  1. de-emphasising support for a de-facto standard

    because of ideological convictions doesn’t work in the real world.

    It does work if you are the people writing the no cost software. If you were paying for PIL then I think you could whine or take your cash elsewhere.

    • Re: de-emphasising support for a de-facto standard

      Sure, it works in that you can skimp on the support and feel good. Which then leaves the software you’ve written lacking in that area. I’d love to use PIL, (and to contribute to it; in fact, I did start out rewriting the GIF support and may complete that one day, pressure of work permitting). Unfortunately it doesn’t work as a comprehensive GIF reader & writer.

      When I say “doesn’t work in the real world”, that’s what I mean. Software that costs money fails to earn money if it doesn’t work. People who write free software (which includes me, by the way) don’t do it to earn money, they do it and *publish* it so that others will use it. And if it’s lacking in functionality, it doesn’t get used.

      The whole “you didn’t pay for it so you have no grounds for complaint” argument is fairly weak. There are others way of paying than in money, and if one never points out the weaknesses in software, it’s unlikely that they’ll be fixed. I welcome people who have slagged off Quasi (Python shell that I wrote, available for free on sourceforge) for lacking features; prompts me to put them in.

  2. Support and Patents

    The only thing that let me down was… the damn PIL. An imaging library that has some of the worst GIF support I’ve yet seen.

    Didn’t you rant about this before? Free/open source software is all about dialogue, you know, not just free stuff to trash in public after the fact.

    Yes, I know all about the GIF patent issues[3], but de-emphasising support for a de-facto standard because of ideological convictions doesn’t work in the real world.

    Yeah, well since the members of the mobile ‘phone manufacturer cartel are all indulging in some promiscuous cross-licensing patent orgy, ancient modem-era Compuserve formats encumbered by submarine patents still live on in the “real world”. As a result, those with “ideological convictions” seem to be the only ones looking honest.

    • Re: Support and Patents

      I’ve mentioned PIL before, yes. I wouldn’t call it a rant; no swearing, no stating that it “sux0rs”. It does have bad GIF support. I’m making the comment that it does. Isn’t that dialogue? Or should I have entered into some Slashdot-style diatribe?

      Perhaps, in focusing on the single point about PIL, you missed the praise for ImageMagick, gifsicle, GNU make and Python? Is it ok to praise free software, but never ok to point out shortcomings?

      Regarding patents; couldn’t agree more. But there never was an ideological reason to avoid GIFs. There were damn good reasons to point out that Unisys behaved badly and that the patent only covered one aspect of the GIF format, but no reason to make some overtly principled stand against all GIFs. My point is more that we’re stuck with the GIF format, unwieldy as it is, and that’s the world we have to work in. I believe we should deal with the world as it is, not how we wish it would be.

      And as I also pointed out in a previous comment, I write free software. I’ve submitted changes and patches to free software that I use. I’ll continue to do that. And I’ll also continue to notice when something’s been done badly, irrespective of whether it’s proprietary or F/OSS.

  3. PIL/GIF

    How about using gdk-pixbufloaders via PyGTK instead?
    Is that for some reason impractible?
    As a full replacement for PIL you could check out Cairo/PyCairo.
    It is yet incomplete, certainly. But what is there seems to work quite nicely already. PyCairo has a Gtk bridge.

    • Re: PIL/GIF

      That’s an interesting idea, though the code that does the work runs on Windows and a headless Linux system, neither of which have GTK installed. I’ll also check out PyCairo; thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s