We Do Not Live In Textbooks

Firefox lead Ben Goodger posts on the well-explored topic of perfection vs. the real world, or Pragmatism vs. Perfection as it might more snappily be captioned.  Apple chose to cut corners in the way they fixed bugs in the Safari rendered.  This did not meet with approval from the KDE team trying to remerge the fixes back into their tree.  Apple chose getting code to customers over doing the job perfectly.  That’s their prerogative, based on their needs.  The KDE team are likewise entitled to their opinions, which are based on their priorities.  Friction occurs when one team assume that their own worldview is axiomatic; that it is a Rule Of The Universe, rather than just one way of seeing the world.

A while ago, I was talking over database design with a contact.  I talked about the way in which the database for one of our projects was defined (this is something I’ll post more about; it’s a big project announced at E3 on the 17th, so when real-time passes that date I’ll be allowed to say more).  He was pretty scathing about it.  “This is all wrong”, he said, “it’s not normalized properly”.

My response was to point out the reasons for its existence; that it held the data for a project which had started more than eighteen months ago and at that time had been heading in a rather different direction.  That the requirements for the data had changed radically.  That the budget for the whole project was tight and that, most importantly of all, the system had delivered what was required when it was required.  In other words, it works, and works well, despite differing from his preferred Right Way To Do It.  That it doesn’t meet his textbook definition of the perfect database structure is not really that important a factor.  Sure, given the time to redo the whole thing from scratch we’d have used a different approach; but this is true of every project, everywhere.  Until you’re done, you won’t know the best way to do it.  The real world, which pays the bills, gets the casting vote.

Beer being the excellent social facilitator that it is, we moved on from the point of potential disagreement and ended up considering the many ways in which the pursuit of perfection can lead software engineers astray.  The perfect toolkit that is unusable because of its complexity.  The perfect operating system that tries to be all things to all users and ends up having to violate its own axioms[0].

I’m a great believer in pragmatism, having worked (in my younger days) on a system that strove to be perfect and grew to be a monster, never quite meeting the actual needs of the users because of striving for the stars.  Aim at the heavens by all means, but that’s a direction, not a goal.  Because we all fall short of perfection.

[0] Just to forestall the usual rash of rude emails, I’m referring to Windows here – the GUI is the axiom and Microsoft’s recent conceding that a command-line shell is a good (and maybe even better) way to control remote servers is the violation of that axiom.  You can, of course, insert your own Un*x-based example, should your personal opinions be more suited to it.

An Ever-Rollin’ River Of Rant

When I posted my lil’ case study of open-source software a week or so ago, a number of individuals were kind enough to contact me and unload their negative opinions of my comments.  I had, in the process of discussing the pros of various OSS projects, also mentioned a couple of cons, and this met with disfavour amongst a subsection of the “community”.  The positive comments I made apparently met with universal approval.  I found it a touch one-sided.  Maybe it was the crack about zealotry that did it… I didn’t intend to cause offence then, and I don’t now.

Anyway, in the hope that it will cause distress, upset, spontaneous combustion and a wave of myocardial incidents amongst the more highly-strung of those who peruse the RSS feeds, here is a link to the excellent Bile Blog (courtesy of Miguel de Icaza’s blog).  The particular entry linked is about Apache’s new Harmony project, but I encourage you to read down for more.  Enjoy.

I should point out that I offer this link in exactly the same spirit as Miguel did; because it’s funny. The BileBlog is the same sort of thing as Old Man Murray; the author hates almost everything and everyone. It’s done for effect, and shouldn’t be taken seriously 🙂

Greater Than The Sum Of Its Parts

A little case study of Python and open-source tools on a big, complex and yet oddly routine sort of problem.

The Floofs project continues to grow, with distributors signing up in many different countries.  This means that we have the job of producing many, many different sizes of any given animation to suit their requests.  And every animation can have several different watermarked variants; web preview, WAP preview, distributors logo, no logo, etc., all of which may need to be regenerated if the source image is updated.  It all builds up into one huge asset management job: over 153 000 files at the last count.

So how does a geek approach a task like this?  Firstly from the standpoint that has stood us in good stead for many years and shall continue to be our watchword into the future. Do as little work as possible by automating everything.  The second principle, given that this is a self-funding project, is to avoid the use of fancy content-management “solutions” and build as much as possible from open-source software.

The overall job of finding that which has changed/is new, building that which needs to be built and uploading that which needs to be uploaded is an absolutely canonical task for make; no prizes for guessing that.  But the sheer complexity of the makefiles (and the need to keep several hundred of them up to date) seemd to imply a mammoth task of rule creation and macro generation.  Being (a) lazy and (b) a programmer at heart, I opted for a better solution.  Write a Python script that creates the Makefiles.

The overall Python script takes around twenty seconds to run per group of around nine animations.  Given that this is on a 1Gb dual-processor build server, that might give you an idea of the large number of targets and dependencies involved.  It turns out that it makes more sense to write very big explicit makefiles, in which all the dependencies and commands are laid out in full, than to play with clever Make rules; they save time for humans, but when it’s code writing code, there’s little to be gained.  Essentially, the script gathers a list of the source images and then builds a huge list of targets, dependencies and commands that’s finally sorted and spat out into a Makefile.

In order to make the process more flexible, several of the commands that make will eventually invoke are themselves Python scripts.  Consider the job of resizing an animated GIF.  In theory it’s simple; take the GIF apart into component frames, resize each one, then reassemble.  In practise[0], it’s more complex than that.  GIF frame-sequence compression works best when pixels remain the same between frames, so the resizing process needs to try and ensure that happens even if the set of colours used between frames varies (most single-frame image resizing tools don’t work too well on this).  Also, GIF in-frame compression doesn’t work well with fuzzy edges and gradients, so anti-aliasing can result in big images.  But then again, non-antialiased images look terrible.  So there’s a set of Python scripts designed specifically to handle the seemingly easy job of resizing images without also making them twice the file size[2].

All of this image manipulation must be command-line; there are nothing like enough resources (whether you count time, money or people who grok PSP) to do the work manually in a GUI tool like Photoshop.  So it’d all collapse if it weren’t for gifsicle and ImageMagick.

The first is the best command-line GIF manipulation tool, bar none.  Runs on Windows and Unix.  Free.  And damn good at everything (except resizing, at which it does a non-anti-aliased quick and dirty job).  But for exploding, optimizing, commenting or running a soft polishing cloth over your GIFs, nothing comes close.

The second is the sort of toolset that free software zealots ought to parade down the street as a shining example[1].  ImageMagick tools can perform operations on images for which you’d normally expect to have to fire up the Gimp, PhotoShop or PSP, but from the command line.  Which means that once you’ve sorted out your commands and source materials, doing 153 000 images is as easy as doing one.  Its support for animated GIFs is not as good as for static images, but given that gifsicle can explode a GIF into separate frames and then reconstitute the original after those frames have been modified, the combination of the two is all you need.  Really.

And finally, I’d be nowhere without the language with which the IT systems of Paradise are no doubt built; Python.  “Your mileage” (as the Americans like to say) “may vary”, but there are damn few languages that are so completely cross-platform, scalable, supported by decent IDEs and object-oriented.  The ftplib module’s been used to build all the uploaders.  The very funky paramiko module does the same for SFTP.  The only thing that let me down was… the damn PIL.  An imaging library that has some of the worst GIF support I’ve yet seen.  Yes, I know all about the GIF patent issues[3], but de-emphasising support for a de-facto standard because of ideological convictions doesn’t work in the real world.  GIFs are what we’re stuck with; one works with what one has, not what one would wish for in an ideal world.  Still, if that’s the only fly in the Python soup, then I’ll keep eating.

So there; that probably wasted less than five minutes of your day on a brief description of how we manage several hundred thousand images with one command line.  Now excuse me; I must go and type make and watch it do my job for me…

[0] In theory, there’s no difference between theory and practise, but in practise, there is.
[1] Though to be honest, I can live without any more free software zealots, thank you very much.
[2] Part of the secret is dead obvious; always scale down from a larger size to a smaller.  Always.
[3] The biggest issue being that they’re no longer an issue in any area.  And they never were a barrier to writing a decent GIF reader.

Variations On A Theme By Adams

Naturally, nobody is the slightest bit interested in my opinions of the HitchHiker movie.

So here they are.

First of all, gratuitous and unnecessary metaphor time.  I see the HHG as a theme, rather like a Jazz standard, that’s been reworked in several media.  There were the original radio plays, the books, the BBC TV series and now the movie.  In most cases, Douglas A himself was at least partially involved in each, which is all to the good – it was his creation, after all.  Stretching the metaphor a little, let’s also consider that the HHG has, at its core, certain key attributes (let’s say these are like the basic chord structure or tensions of a tune).  I’d argue that these are:

  • Verbal humour
  • A sparkling, surreal and tagential wit (and by tangential I mean “goes off at tangents for the sheer fun of it)
  • A slightly cynical and rather British attitude to life (the Universe, and Everything)
  • The media in which the HHG has worked best are those where the key attributes have shone.  The radio plays were all about words (though the sound and music were a fantastic support for it all).  Ditto the books, where DA could spend even more time on his love of verbal trickery and prose designed to flick the reader’s mind from image to image.  In both cases, the plot (such as existed) was a stream on which to hang jokes.  The TV series and the movie were visual; because they’re visual media.  In general, I thought the TV series was the worst incarnation.

    That’s thought, past tense.  And then I went and saw the movie.  With my wife, an intelligent and thoughful person who has never been exposed to the HHG in any depth before.  And I wanted to love it, I really did; and for her to (at the very least) enjoy it… but both of us, for different reasons, were hideously disappointed.

    She saw a movie with no real plot, with a few sketchy characters that made little attempt to explain the context of a series of jokes that left her cold.  It meant nothing to her; didn’t even seem to want to try to explain itself to her.  It preached to the faithful, not to the neophyte.
    I saw something that had been a part of the way I think since I first heard the plays at the age of 13… but via a medium that had torn it up, shredded it, picked up some of the juicier bits and crammed them into a structure that didn’t have much to do with the things that had entranced me.  Gone was much of the verbosity, the tangential exposition, the love of irrelevant but wonderful detail.  Even the characters weren’t the same; true, the Vogons looked exactly like Vogons should look but Arthur… wasn’t.  Arthur Dent should not be not brave, should not risk all for the woman he lost.  I saw elements that I recognised (the jewelled scuttling crabs and the beautiful creatures of Vogsphere) that made no sense at all to anyone who wasn’t steeped in HHG lore.  I came out bitterly disappointed.  I well appreciated the problems involved in turning the HHG into a movie, but this failed both me and my wife on so many levels.

    The elucidatory and expositional Yoz Graham (whose merest blog entries I am not worthy to footnote) puts it thus:
    The secret of Hitchhiker’s success is that it means something different to everyone.  That something could be a mood, or a scene, or even a single line. If your particular something is included in the film, you’ll probably love it. If it isn’t, you probably won’t.

    But some of my favourite ever lines were in that movie, and love was the emotion furthest from my mind as I emerged into the light of the Trafford Centre.  There went two hours of my life which I had not only lost forever, but invested in something that had lost me far more; part of the love that I had felt for what the HHG had meant to me.  Chewed up and spat out, with the best possible intentions, by movie makers.

    I think you ought to know I’m feeling very depressed.

    The Unbearable Importance Of Detail

    Or, The Importance Of Being Accurate

    Every so often, I find myself reading something that I know I really shouldn’t.  It’s rather like food; the realisation that you’re reaching for the fourth unhealthy fat-filled biscuit, the knowledge that you should stop right now, the dread of knowing that you won’t.  With me and books it’s happened twice recently.

    The first was with Tom Clancy’s recent slab of nonsense, The Teeth Of The Tiger.  I’m sure you’re familiar with the routine involved when approaching any book like this.  First, suspend your disbelief.  No, higher than that; as high as if you were watching a Bond movie, or anything starring Arnold Schwartzenegger.  Next, let go of your brain.  Convince yourself, against all reason, that complex geopolitical problems and deeply involved cultural clashes have simple solutions.  If you’re not American, adapt yourself now to that curiously parochial viewpoint, in which the concerns of the President are the concerns of the planet.  Now you may read a Tom Clancy.

    But partway through I felt the suppressed parts of my brain trying to stir.  A muffled voice in my subconscious pointed out that Jack Ryan’s son (around whom the book is formed) is nothing like Ryan Snr who was (or could have been read as) a reasonably decent man.  Ryan Jnr is a prat.  He prefers classical music to rock.  He always wears a blazer.  He has no apparent life.  But I suppressed that voice, and told it that the suspension of disbelief is key.  And I continued to read.

    Then two of Clancy’s characters ended up in the UK.  And here’s where the straws began to pile up on the back of the camel.  First, Clancy’s research seems to have consisted of a couple of visits to London, which has equipped him with an in-depth understanding of the local culture.  Not.  The final straw, which led me to throw the book across the room, adopt the position of the stork and hop around mouthing obscenities, was trival; his two characters have a drink in a pub.  With waitress service.  And when they get up, they leave money to pay.  Pubs don’t work like that.  They have never worked like that.  That is the way in which American bars work.  And it’s easy to find this out; just take a look at this excellent guide to pub etiquette.

    Of course, I’m over-reacting hopelessly.  But then again; accuracy matters.  If Clancy’s so off-target on something that I know about (ie, what the UK is like), then I have to assume he’s equally wrong about, say, the culture of Russia.  Or Germany.  Or Italy, or various Arab nations.  And it brings into sharper focus how sketchy his understanding is of computers, or networking.  About all I’d grant is that he’s accurate about weapons, since he seems to have a rather fetishistic fascination with the damn things.

    But Mr Clancy’s works are as the writings of Stephen Hawking for accuracy and clarity compared with the demon that Beelzebub has inflicted upon the reading public; the craven creature that unknowing mortal man names… Dan Brown.

    Dan Brown, who believes (Da Vinci Code) that GPS tracking can work indoors, in a room with no windows, and to an accuracy of centimetres.  Dan Brown who believes (ditto) that a tiny solar cell can power a hard-drive recorder, as long as it gets a few minutes of sunlight.  Oh, and it can power a transmitter that can be received clearly at vast distance.  Dan Brown, who tells us that the Chief Constable of a British police force carries a sidearm.

    Dan Brown, whose portrayal of the NSA, computer viruses and the Internet (Digital Fortress) is now accepted by millions of people.  Fear for the enlightenment, my people, for the rise of Ignorance is at hand.

    Zope, Threads and Things That Are Not Modules

    Thread-local storage & thread-safe locking in Zope External Methods

    You may, like me, have been misled by part of the Zope management interface into believing something that’s not true; that External Methods live in a module.  This would be a perfectly natural assumption; when you add an External Method, one of the fields in the form you fill out is titled “Module Name”.  If you read the Zope Book, you’ll also see phrases like “You’ve created a Python function in a Python module”, “You can define any number of functions in one module” or “put this code in a module called my_extensions.py”.

    But they’re not modules; at least, not in key senses of the word.  What actually happens is that the source file is loaded, compiled and the resulting code object is then used to get access to the methods as needed.  This is clever, but has some unfortunate side-effects, one of which is that it isn’t possible to rely on certain module-level semantics.

    One of the things that doesn’t appear to work is a trick like this:

    import thread
     
    my_module_level_lock = thread.allocate_lock()  
    def my_mutex_method():   """A method that uses the lock to enforce thread-safety"""   my_module_level_lock.acquire()   try:     f = open("myfile.log","a")     f.write("Yowza!\n")     f.close()   finally:     my_module_level_lock.release()

    Obviously, what we’re doing here is sharing the one module-level lock amongst all threads.  But because of the way Zope uses external methods, there may in fact be more than one module-level lock.  And thus, of course, you won’t get proper thread exclusion.  Even worse – it fails silently, so you may not even be aware that you’re not locking around access to shared resources.

    Need proof?  Try this trick.  Create a “module level” class in your External Method file (ie, the file that contains the source to your External Methods; call it an External Method file).  Create a “module-level” instance of it that, in the __init__ method, dumps it’s id and the thread ident to stdout:

    class Noddy:
        def __init__(self):
            print "Noddy %s" % self
            print "Created by thread %d" % thread.get_ident()
     
        def __del__(self):         print "Noddy %s is being deleted" % self  
    noddy = Noddy()  #Allocate a module-level Noddy()

    Use bin/runzope to run the Zope instance and catch the output.  Here’s some I collected earlier:

    Noddy __builtin__.Noddy instance at 0x40fa8f6c
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x4130a3cc
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x40fa8f6c is being deleted
    Noddy __builtin__.Noddy instance at 0x40fa8c6c
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x4131afec
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x4130202c
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x40fd076c
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x40fe084c
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x41153fac
    Created by thread 1026
    Noddy __builtin__.Noddy instance at 0x4114ceac
    Created by thread 1026

    Yow.  The module level object is being created multiple times, all by the one thread.  Imagine if we’d put some useful resource-eating allocation at the module level.  However, if we put the Noddy stuff into a module that the External Method file imports, we get:

    Noddy ThreadShared.Noddy instance at 0x40fa976c
    Created by thread 1026

    Just the one.  As expected.  And even if we run parallel requests to get multiple Zope threads working, we still get only the one instance of Noddy.

    I ran into this problem whilst trying to allocate a single MySQLdb database connection per thread; for this, I wanted to keep a mapping of thread idents to MySQLdb Connection objects.  But whatever I did, the mapping refused to behave as a module-level object.  Thus, from necessity, was born the ThreadShared module, which returns a TLS (thread-local-storage) object per thread.

    And it looks like this:

    #!/usr/bin/env python
    #Shared resource module for Zope-level ExternalMethod thread-local storage
     
    import thread  
    #A TLS is just an object on which arbitrary attributes may #be set.  In a Zope system, you could make this a #Products.PythonScripts.standard.Object #You can add methods on it to do anything you need. #If you have a thing about preferring dicts rather than arbitrary objects, #then use a dict. class TLS:     pass  
    tlsMap = {} #dict that maps thread idents to TLS objects tlsLock = thread.allocate_lock()    #global lock object over map  
    def getTLS():     """Return the thread-local-storage object for the current thread.     If there isn't one, create one."""  
        #Lock the tlsLock.     tlsLock.acquire()  
        #Obtain or create the object     try:         tls = tlsMap[thread.get_ident()]     except KeyError:         tls = TLS()         tlsMap[thread.get_ident()] = tls  
        #Release the lock     tlsLock.release()  
        return tls

    Okay, so you have a TLS object, how might you use it?  Well, for anything that needs to be allocated once per thread.  For example, assuming you’ve imported ThreadShared:

    tls = ThreadShared.getTLS()
    conn = getattr(tls,'DatabaseConnection',None)
    if not conn:
      conn = MySQLdb.connect( etc etc )
      setattr(tls,'DatabaseConnection',conn)
    #And here we have a connection for this thread.
    

    In this case, it might well be worth adding a method to the TLS class to generate the connection, but you get the general idea.

    Another problem that comes from the same source is this; how do you get a module-level lock in an External Method file?  The answer is – put it in another module (a real module) that your External Method file imports.  Be careful with the import path, though – by default the Extensions instance directory isn’t on the import path, so if you like all your External Method code to live in the one place, you’ll need to mess with sys.path to add it.

    Of course, all this will come as no surprise to seasoned Zopistas, and I expect the usual level of flames informing me of my rank ignorance and stupidity in not having worked this out ages ago (presumably by reading the source code).  Yet given that this is a pretty key point, you’d expect that, at the very least, External Method files weren’t called “modules” in so many places.  Because they’re not modules, are they?

    Reality, modified

    Fascinating article in Wired about a guy called Graham Flint, whose Gigapxl Project [sic] is capturing huge, um, gigapixel images of places in America.  But I found this odd; when talking about a picture of a nude-sunbathing beach, he’s quoted as saying: “We might have to add fig leaves in Photoshop, it’s that good”.

    Why capture reality in as dense detail as you can… and then edit it to fit the retarded moral sensibilities of those who think the human body is something to be ashamed of?

    Self-Starting Singleton Scripts

    Stupid Python Tricks, No. 3453

    I wrote some nonsense a while back about how to make a Python script run one instance of itself only.  This works very well for me, with scripts being (re)started at regular intervals by cron jobs, the new instances detecting that there’s an old instance and dying off.  But it’s sometimes a bit of a pain to have to manually kill off a task when I’ve updated the actual Python script.  I don’t want to arbitrarily kill it, because it might be in the middle of something that I want to run to completion – a lot of the database stuff that goes on is time-consuming and whilst a kill will roll back anything done so far, that’s not efficient.

    So a few minutes’ though brought this to light (this includes the run-one-of-me-only code):

    #Make a note of the time we start running.
    startTime = long(time.time())
     
    #Get the current PID of this process
    pid = os.getpid()
     
    #Use an external command to see if any other process is running this script
    #Derive the scriptname from sys.argv.  Note that if there are scripts with
    #the same name in different directories, this check will fail.
    scriptName = os.path.basename(sys.argv[0])
    ps = os.popen('ps --no-headers -elf|fgrep %s' % scriptName)
    psl = ps.readlines()
    ps.close()
    for p in psl:
            if p.find('python')>-1 and p.find('trigger.py')>-1:
                    otherpid = int(p.split()[3])
                    if otherpid  pid: sys.exit()
    
    

    Then, in the main loop of the script…

    while True:
            #Sleep for a bit...
            time.sleep(SLEEPTIME)
     
            #Check to see if this script has been updated since the last
            #iteration.  If so, die now so that the new version can take over
            #and be restarted by cron.  Assumes that the working directory
            #is that of the script (or that from which the script was started).
            x = os.stat(sys.argv[0])
            #Check the modification time of the script against the time we started running
            if x.st_mtime > startTime:
                    print "Script has changed; will exit to allow restart"
                    sys.exit(0)
    

           
    Cheap and cheerful stuff, but useful.  And hey, I didn’t even mention static typing once…

    Total Eclipse Of The Python

    Eclipse & Python, as the title suggests.  I’m not always elliptical in my references, you know.

    I apologise if the title brings back memories of one of the several records with which Jim “I Produced Meatloaf, You Know” Steinman ruined Bonnie “Gravel Is For Gargling” Tyler’s 80s career, but if I have it in my head, I see no reason why it shouldn’t also end up in yours.  A problem shared is a problem halved.

    Most of my development these days is done in a mixture of Java and Python, each language being used for the tasks for which it is best suited.  For a while there, I was doing Java in Eclipse 3.0 and Python in Boa Constructor, but even with 512Mb and not much else open, the laptop was beginning to groan.  Eclipse is approximating wonderful, but it eats memory like Americans eat corn-syrup-enhanced snacks.  Boa also likes to take up a fair amount of heap, so one of them had to go.  It had reached the point at which even partially exposing an Eclipse window after half an hour of ignoring it would slow everything down to a grind for a minute.  So I did some digging and discovered that Eclipse does support Python development.  Well, support is a relative term; it’s possible, if you’re willing to give up some things, or find workarounds.

    First off, how to get Eclipse to handle Python at all.  There are at least five Eclipse Python plugins at some stage of development; after some testing (and not a few Eclipse crashes) I chose pydev.  The most obvious effect of installing this is that Eclipse recognises .py files and opens them in a Python-aware editor, with the Outline showing a nice list of methods imports and classes, including any if __name__ == ‘__main__’: block (no module-level variables shown, though).  The editor follows the Eclipse style of syntax colouring; might not be to your taste, but at least it’s more-or-less consistent between languages.

    The most obvious lack is fully-Python-style code completion.  Many and better people than I have written on how damn difficult it is to implement this in Python anyway, so I suppose it’s a benefit to get anything.  Code completion’s off by default, and the PyDev Preferences tab for it contains dire warnings about how it works and what the drawbacks are.  In short, it “executes the code you write on the top level on the module” – I assume this means that it “imports” your code.  It looks like code completion’s then done with dir().  There are no tips for parameters to methods or functions, but you do at least get the docstring displayed.

    Something else I missed pretty soon – a built-in Python shell.  I’ve impressed programmers in other languages before now by flipping between code-in-progress and an interactive shell to test out the statements I’m writing (avoids so many fence-post errors).  I’ve always found it especially good for regular expressions.  Also, of course, when you’re done coding, you have a bunch of handy test data and statements to hand to build the unit tests.  The quickest Eclipse workaround I’ve found is just to keep a Cygwin or Quasi window open running Python and to keep handy with the Windows console cut-and-paste.  I suppose SPE or PyCrust would do as well…

    PyDev debugging has never worked for me, at all, though this is probably a blessing; meaning I tend to do much more test-driven development.  Since there are a bunch of other Python environments around if I really need to step through code, I haven’t bothered to dig into the depths of exactly why the PyDev Debug option never even appears for me.  Anyway, most of the Python code runs in Zope, and debugging Zope interactively is just… don’t go there.  Really.

    A nice touch is that PyDev does it’s best to implement the Eclipse refactoring support, including renaming and extracting code to methods, though there are limitations on how far the renaming search appears to go.  Also, it sometimes seems to get slightly confused as to whether the source has changed or not.

    Less obvious features: if you have the Subclipse plugin for Subversion integration with Eclipse, it’s now as easy to keep Python scripts and modules under control as it is Java (or any other file).  I especially like the ability to check in a set of Java and Python files all together when changes to both are needed to implement some system-wide change.  The Eclipse Local History (a sort of “persistent Undo”) also applies to Python scripts.

    In short; it’s mature enough to do real serious work with, and the benefits of having all my code in one environment outweight the niggles.  After all, there’s no other Python IDE I’ve found that has everything I want in one package, let alone SubVersion integration.  In this world we are destined always to strive for perfection and never to find it.  I’ll settle for Eclipse & PyDev for the moment.

    More info at http://www.python.org/moin/EclipsePythonIntegration

    Sublunary Paths

    Zope alternatives to GET/POST parameters.

    Before you ask, sublunary paths comes from a poem by Philip Larkin (it’s Many famous feet have trod, partway down that page). It has nothing much to do with web programming, nor Zope, but every time I see the identifier subpath it reminds me. Besides, I’m sure you appreciate the break from IT once in a while?

    A subpath in Zope terms is that part of a URL that comes after the script, method or whatever that’s actually executed. Bear in mind that Zope is all about inheritance and object-orientedness, so if we have a script available at:

    http://www.myserver.com/scripts/myScript

    …then it’s possible to access it via URLs like:

    http://www.myserver.com/scripts/myScript/part1/part2/part3

    …and you can consider that the “part3” URL “inherits” myScript.  Those extra parts (in bold) don’t need to refer to actual directories[1]; they can be pretty much aribtrary.  They’re made available in the subpath element of the REQUEST object, and here’s where that comes in handy.

    If you point a browser at a URL that results in the contents of a file being returned, then the question arises of what the file name should be.  HTTP allows the server to be specific about the type of the data that’s sent, but there’s facility to supply a suitable name.  This makes sense; naming conventions are extremely varied across platforms, so there’s no guarantee that a given name will be appropriate.  Instead, what usually happens is that the browser deduces a name from the URL.  And here problems may occur.

    Consider a URL of the form:

    http://www.myserver.com/scripts/downloadFile?customer=144876ab6&transaction=76yghj576xz7

    …which downloads a GIF file.  There are browsers in existence[0] that will propose the filename downloadFile?customer=144876ab6&transaction=76yghj576xz7.gif, which is less than useful.  The worst part is that the parameters to the request get added to the filename.  The request could be made as a POST rather than a GET to avoid this, but there’s still no way to specify the filename, meaning that all the files downloaded from this URL may end up with the same name.  Not good, especially if they’re being paid for and later ones overwrite previous ones.

    The subpath trick, however, can be used in a couple of ways to work around this.
    First, since the subpath is ignored by Zope, it can be used to suggest a filename when a POST request is made:

    http://www.myserver.com/scripts/downloadFile/myNewFile.gif

    That’s good, but we can go further.  The subpath can be used itself to pass the parameters to the script.  Consider this:

    http://www.myserver.com/scripts/downloadFile/customer=144876ab6/transaction=76yghj576xz7/myNewFile.gif

    In Python terms, we need to look along the subpath and extract those parameters.  Here’s a suitable method to do it:

    #Regular expression to spot parameters of the form  = , where
    #value may be empty.
    paramRe = re.compile(r"(\w+)\s*=\s*(.*)")
    def getParametersFromRequest(self):
        """Extract parameters from the subpath and return them in a dict."""
        params = {}
        #The subpath is a list of path elements; in other words, it's
        #been split by '/' characters for us already.
        for p in self.REQUEST.subpath:
            #Check if it's a parameter.
            m = paramRe.match(p)
            if m:
                #It is a match.  group(1) is the identifier,
                #group(2) is the value, which we strip.
                params[m.group(1)] = m.group(2).strip()
        return params
    

    You can call this from any ExternalMethod or Script (Python), passing either self or the container respectively, and it’ll return a dict of parameters.

    [0] This is an issue that crops up a lot on mobile phone browsers, but wget shows it too.
    [1] This depends on what it is that you append the subpath to; it works for ExternalMethods, but other types of Zope object require the subdirectories to exist.  Which is a pain.