Theft Is Non-Commutative

Apparently, Apple have stolen at least one idea in their new OS.  Stolen is a harsh word, but it’s a perfectly good interpretation of the term “plundered”.  Yet; hang on a moment… I don’t remember any accusations of theft being levelled at open source projects that take ideas from one platform and implement them on another, whether they call it “inspiration” or are as open and honest as the xmms people about their borrowing from Winamp.  So why is it a problem if Apple do it?

It’d be a misrepresentation of the worst kind to state that the open-source community is against intellectual property; hell, I’m an OSS advocate and I make my living from IP.  But there is, I think, a general agreement that patents and other forms of restriction should apply only to certain classes of IP; that patenting look-and-feel, for example, is a Bad Idea.  It’s always seemed to me that redoing someone else’s Cool Application (only doing it better, as one always tries to) is well within the bounds of acceptability.  Unless Apple do it.  Maybe I’m missing something.

Advertisements

God Help Us; We’re In The Hands Of Engineers

Dr Ian Malcolm, in Jurassic Park, of course.  Yet I think the good Doctor (and possibly, by extension, Dr Crichton himself) is confusing the nature of the engineer with that of the control freak.  Two different subsets of people, though, of course, there may be overlap.  What I understand the quote to mean (and this is pretty much informed by watching the film and reading the book[1]) is that it is Engineers who believe they can control the Park, and who fail to appreciate that such control is impossible; hubris, baby, to the max.  I come from a line of engineers (in that my father is also proud to bear that title) and, as you might expect, I know a few others, here and there.  I don’t think any of them believe in absolute control; rather the opposite.

Let’s take a subset of the species Engineer; the Software Engineer, or to use the common name, Programmer.  Let’s take a look at the code that such a person produces.  I’ll lay a small amount of money[2] that you’ll see defensive coding; exceptions caught and errors handled.  Throughout the source there will be an implicit understanding made evident; things can go wrong.  This tends to show up especially where the program interacts with its environment.  Opening a file?  It might not be there.  Writing data to it?  Beware of disk failures or lack of space.  Sending data over the Internet?  Prepare to retry.

But the recognition that errors are a necessary and inevitable part of code doesn’t imply that they can always be handled.  Every program has some level of error at which it will give up and yield to fate, throwing its hands in the air and commending itself to the mercy of the operating system.  What the engineer does is to minimize the risk of that happening, no more than that, by balancing what can occur against the cost of handling it.

If you’ve ever put up wallpaper, you’ll remember this; after the paper meets the wall, there comes the smoothing, where the bubbles trapped underneath are pushed out to the edges.  There are always some left, forever trapped as the paste dries around them.  All you can do is minimize them to a point where they don’t get in the way of the main purpose of wallpaper; to look nice from a reasonable distance.  Engineers are like interior decorators; we massage the bubbles of risk out to the margins of what is likely.  Luckily we have less need to climb precariously perched ladders.

Personally, despite him being torn apart by the T-Rex, I’d blame the lawyer.

[1]I’m unreasonably proud that my old paperback copy of Jurassic Park was bought in the US from an airport news-stand and has no mention or tie-in with the movie at all; no red/black/yellow logo, no T-Rex, nada.  I have no idea why this should matter to me.
[2]Not my money, necessarily.

The Nagging Doubts

Tuples, asserts Guido, are for heterogenous data.  The C programmer I used to be would, of course, throw together a struct (almost certainly typedef‘d) to be the equivalent.  As that same C programmer I was burned sufficiently often by rogue constants and editing errors to adopt a firm rule of thumb; if I could avoid hard-wiring of constants, I would.  This extended such that wherever possible, my code would be controlled by constants #defined at the top of the source file; change those, and the code changed with them.

Which brings me back to tuples.  Imagine that I construct a tuple thusly, to represent, say, a book on Amazon:

record = (author, title, ISBN, price)

To access it, I can write code like:

print "Author is %s" % record[0]

Yet the C-programmer-within spots that stray [0] and nags me: “rogue hardwired constant!”.  There’s a strong urge to replace it with a constant, something like:

AUTHOR=0
TITLE=1

print "Author is %s, Title is %s" % (record[AUTHOR], record[TITLE])

Yet even that doesn’t satisfy him; he worries that someone will write an assignment to the tuple and inadvertently transpose a couple of elements.  There is, he says, no way to hardwire the relationship between the order in which items are stuffed into a tuple and the indexes used to retrieve them.

To which I say; well, if you’re that worried use an object.  Yet still I find those stray remaining indexes… disturbing.

Just now I wrote (apropos of some database code):

# Read a record, take the first element, strip off whitespace and lower-case it.
value = cr.readone()
if value and value[0]: return value[0].strip().lower().split()

Now that really sets off the deep, C-derived danger instincts.  What, he cries, if anything were to return NULL in that wild and assumptive sequence of function calls?  I think I need to go and make him a cup of tea and settle his nerves; he’s still not entirely used to exceptions.

u”In The Beginning”

I have to blog this; the story of where UTF-8 came from.  Linked from Joel Spolsky’s excellent article on Unicode, which has been in Favorites\Unicode like, forever (well, since late 2003).

On the subject of Python and Unicode; I find that none of the IDEs that don’t cost real money can handle Unicode paste on Windows XP.  Boa Constrictor, IDLE, Pythonwin, PyCrust etc – all fail when submitted to the Москва test (and if that word appears as a set of question marks or boxes, then some feed that doesn’t grok Unicode has mangled this posting).  The test is simple; copy the word Москва from Notepad (which handles UTF8 files very nicely, thank you) and paste into Python environment of your choice, in a command such as:

a = 'Москва'

or

a = u'Москва'

Several fail at this point, replacing the Unicode pasted string with question marks.  Those that pass then get subjected to Part 2, in which I grill them mercilessly with:

print a

None have so far succeeded in printing the string as it should be shown.  In the case of Pythonwin I’ve tracked through the source looking for how pasting is handled and become mired in a swamp of win32 integration, locale and pywin32 interactions.

Feel free to try different settings of the default encoding in site.py and if you get it to work, please, post it somewhere!

Let me not be misunderstood here; Python’s Unicode support is excellent.  The mismatch appears to be where the Python rubber meets the win32 road.

Programming For Klingons

The Spanish for “business” is negocios, which I like.  The French seem to use the rather straight translation le business or the more formal affaires.  Do the Spanish tend to see a business deal as primarily about the negotiations?  I don’t think so – I’m no big fan of the Sapir-Whorf Hypothesis (something I first heard of as the “Whorfian Heresy”, which might give you a clue as to the opinion of the linguist who I was talking to).

I think there’s another Whorfian Hypothesis, that Klingons would have a different approach to programming; in their version of Python, del would become obliterate, remove would become abduct-and-return-for-questioning and there would be no co-operatively multi-tasking systems; everything would compete aggressively for resources.  There’s a whole blog entry in there, about the co-operative environment within computers but the Darwinistic competition outside… though not today.

But enough of such babblings; the Sapir-Whorf Hypothesis does apply, I think, in the area of programming languages.  Most programmers I know tend to think via a particular programming idiom[0]; that is, they tend to address a problem in the ways that their main languages allow.  It goes further than languages too; as a Unix-esque person I’ll tend to think of networks of processes piping stuff to each other.  A Windows developer might tend to see everything in terms of COM objects.  And so it goes.

I’m not arguing that a programmer is blinkered by her choice of language; that would be extreme and I don’t believe it to be true.  What I do think is that different languages afford different ways of doing things (and I mean afford in the Heideggerean sense[1]; they make various possible ways to solve a problem either easier or harder.  For example, any C programmer who’s knocked around a bit will have dealt with a fair number of C strings and the complex and tedious code that often attends them.  Therefore solving a problem in a string-oriented manner might not be the first approach they’d take.  Conversely, someone whose world revolves around Perl (or Python) would consider complex string operation as easy.  One of the biggest differences is probably in the nature of the data structures that a language supports.  I find myself constructing far more complex arrangements in Python than I’d ever have considered in C, simply because the language makes it easy and safe to do so; the potential for error is vastly reduced.

As someone whose predominant language is Python these days, I use mapping objects and lists far more than I ever would have in my C days.  I was a good C programmer; no corner of the language was a mystery to me and I could generate or parse obfuscated examples with the best of them.  But far too much of my time was spent in (a) being clever for the sake of it and (b) writing the same code over and over to address the same mismatch between the essential high-level-assembler-ness of C and the abstractions that the project at hand actually required.  One day, deep in a debugger, I realised that pointer errors just weren’t fun anymore; probably the start of my long and all-too-slow growing up from wild hacker to something that might one day be called “software engineer”.

But let’s take one particular language vs. language debate familiar to most Pythonistas; that of C++ vs Python.  For me, the single biggest difference between the two is not their abilities in the abstruse and specialised areas like metaclasses, nor the particular semantics of their class and object mechanisms, but the low-level and almost invisible (to a Python developer) issue of garbage collection and object disposal.  It’s this, I think, that has at least the potential to make Python such a productive system; the removal of the C/C++ need to continually concern oneself with the question of who is to free allocated memory[2]; that whole notion of memory “ownership” that the programmers have to track themselves through comments and documentation[3].  Sure, if I’m going to think carefully about the nature of my data structures I may need to choose to use weak references at time, or use del as appropriate, but even at its worst there’s nothing like the weight that’s imposed on someone thinking in the medium of C++.

[0] That is, they think in terms of what a language can do rather than in the specific details of its syntax.  I’m not aware of any Python distributions that can be cranially installed (yet).
[1] Readiness-to-hand or utility of tools and objects, to be a little more specific.
[2] See the section “Automatic Transmissions Win The Day”.
[3] Before a flood of C++ enthusiasts bombard me with links to articles like this (which argues that explicit memory management a feature, not a defect) or this (which proposes solutions to the issue, written in more C++), let me say that (a) it’s only a blog entry and you’re free to disagree and that (b) the wealth of results that a Google search for “C++ memory management” returns is an illustration of how big an issue this is. 🙂

You Can’t Change Me

Mutability is interesting.  A while ago, someone pointed me to these discussions of the essential differences between tuples and lists.  Guido’s quote: Tuples are for heterogeneous data, list are for homogeneous data. Tuples are *not* read-only lists.  On the other hand, one can find plenty of Python references such as this which includes the statement “A tuple is an immutable ordered list of objects”.  Ditto this from DiveIntoPython; “A tuple is an immutable list”.  Does “list” in that sentence mean the list type, or the list concept?  Perhaps it depends on your point of view, and how much you revere Guido 🙂

Anyway, a general Van Rossum principle that I tend to like[1] is that where a function returns an object as a result, that object should be immutable.  Obviously, like all sweeping generalisations, this is demonstrably wrong in various circumstances, but it’s a useful rule of thumb when pondering architectures.  Thus I find mutability interesting.

What I’d really like to find is an elegant way to product a list subclass that could be made immutable easily (and doing tuple(myList) is cheating).  The  most interesting approach would be a class that proxied around an object that starts as a list but then converts that to a tuple when set immutable.  Perhaps overriding __getattr__ and catching calls for __methods__ might help… but then there’s the risk of someone being clever and saving a reference to a bound method object.  More on this, I fear, later.

Meanwhile, here’s a snippet of my playing around; a class that uses the inspect module to do a clever thing; it will only allow its attributes to be set by code defined in the same module as the class itself:

#File semimutable.py
#Note that this module imports itself
import inspect, semimutable

class AssignmentError(TypeError):     pass

class SelfMutable(object):     """An object whose values can only be changed by     its own methods."""     def __setattr__(self, attr, value):         #get code object of caller         c = inspect.currentframe(1).f_code         if not inspect.getmodule(c) == semimutable:             raise AssignmentError, "object is immutable"         object.__setattr__(self, attr, value)

    def setTest(self, value):         self.test = value

And here’s a little test script that demonstrates it (you’d need to put this in some other module or run it interactively).

import semimutable

d = semimutable.SelfMutable()
d.setTest(2)
print "test is %d" % d.test
d.test = 1

It gives the output:

test is 2
Traceback (most recent call last):
  File "C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\ben\My Documents\ben\python\mutabilitytest.py", line 8, in ?
  File "semimutable.py", line 83, in __setattr__
    raise AssignmentError, "object is immutable"
AssignmentError: object is immutable

[1] But for which I have no reference I can find at the moment…

__init__ Considered Harmful

Of course, I don’t really think __init__ is harmful, but why let logic stand in the way of a good title?

On the other hand, there are times when one would like objects that don’t need to be explicitly initialized; the need to remember to invoke all the appropriate __init__ methods for all superclasses in an appropriate order is a source of error, in that I forget, from time to time.  Hey, I’m only human.  Python’s dynamic nature means that we can write objects that only set up their values if any of their methods are actually called.  Useful for mixin classes that provide some functionality that may or may not be used.

Here’s such a class:

class Snapshottable(object):
    """When mixed into an object, provides a way to snapshot the value of some
    or all alltributes into a log.  The log is only created if at least one snapshot is taken."""

    def snapshot(self, attributes=None, note=None):         """Record the str() values of all attributes in the         log (or the contents of the object's __dict__ if         attributes is None.  If note is passed, prepend to         the log entry."""         if not attributes:             attributes = [x for x in self.__dict__.keys() if not x.startswith('__')]

        if note:             logentry=note+'\n'         else:             logentry=''         for a in attributes:             if hasattr(self, a):                 logentry += "%s=%s\n" % (a,getattr(self,a))         try:             self.log.append(logentry)         except AttributeError:             self.log = [logentry]

                def getlog(self):         try:             return self.log         except AttributeError:             self.log = []             return self.log

    def clearlog(self):         self.log = []

There is no __init__.  A call to snapshot(), getlog() or clearlog() will create the log as needed.  Rather than test for the presence of the log attribute on entry to any method (which is less efficient), we let access to it throw an exception.  Nothing profound, but it’s an interesting idea.

Self-initialising objects can create only as much state as they need to, on demand; thus more complex examples may need to deal with being partially initialised.  There comes a point (and I’d say it’s earlier rather than later) when it becomes more straightforward to have a proper initialisation so that one can be more sure of always having a consistent state.

The Swiss Army Application

Many blog entries have pointed me at Ten Must-Have Tools Every Developer Should Download Now, so I eventually got around to clicking.  Interesting, but possibly unsurprising, that the MSDN definition of “developer” is “C# .NET developer”[1], as though there were only the one sort of programmer.  However… moving along… I see that it includes:

  • “Snippet Compiler to compile small bits of code”.  What, you mean like the interactive window in PythonWin?  Or the compile() builtin?  Or the Python interpreter in interactive mode?  Or python -c, perhaps?
  • “Regulator to build regular expressions”.  I guess an interactive window and re.compile() would do much the same thing; certainly that’s how I tend to interactively build up regexps, running them over a list of candidate strings to check for matches.
  • “.NET Reflector to examine assemblies”.  This looks quite useful, though I would score Python’s built-in introspection capabilities very highly.  Yesterday I was trying to discover the capabilities of a COM object (that drove a camera) and I longed for the simple power of dir() and __doc__.  And because one tends to get the source of any Python modules one imports, there’s always that ability to go directly there for the definitive answers.

Some of the other tools look interesting, though scarcely packed with amazing novelty.  It’s not a bad checklist of the sort of tools that a developer would find of use, rather than the sole definitive “must-have” collection.  Makes me ponder about knocking up a Python list; I’m sure there’s a few equivalents of NUnit out there already.

[1] I’m being unfair for effect… they include Visual Basic .NET as well.

A Human In The Loop

The Python twain module has a shortcoming.  It’s summed up by this quote in the documentation; “The twain module can only be used in an application with a message loop”.  This is (as far as my limited research can find out) in the design of the whole TWAIN system on Windows; as image acquisition progresses, the calling application is notified of progress via a series of messages.

On Windows, that means you need a window, since messages are sent to windows.  Which means Windows has to be logged on, more or less  Now, maybe I’m just a tired old Unix hound who’s spent too much of his life writing daemon code that executes in the dark bowels of the machine where no human could be or would want to be, but I rather like systems that can automate stuff without assuming there’s someone sat in front of the damn box.  What if there’s been a powerfail and the system isn’t logged in?  Should all Windows servers require a human whose job is to scurry around them and check for rogue dialogue boxes?

No, of course not.  Yet it’s tedious and annoying that I can’t write my little Python script to make my camera take a picture every hour without messing in wxWindows.  Hey ho.

Oh, and if you don’t read the excellent Rupert Goodwin’s Diary on a Friday, you should.

Get Your Kicks On Nokia’s Sixty-Six (Hundred)

Naturellement, all Pythonistas will have seen the stuff about Python, Nokia and Series 60.  Oddly enough, I’m heavily involved in the Mobile Phone Project which has a strong Series 60 flavour… and I’m working with a bunch of people who are almost certainly in the Forum Nokia PRO or better.  Only I have a Sony Ericsson phone, a P800.  So I didn’t bother too much.

Until this afternoon when my network provider called me to offer me a free upgrade… to a 6600.  Hmmm.  Gimme the NDA, Nokia, and I’ll sign…