I Counted Them All Back In…

A while ago I mused on programming languages (as programmers are wont to do), and concluded that the best thing about languages like Python (and, I now admit, Java) is the freedom from all the hideous memory management that Some Other Languages have to do.

Raymond Chen’s blog has an interesting example of querying information from an Explorer window.  Interesting not for the way in which it introspects down through the multiple-interface objectery[1] of Windows to get at the data, but because it exemplifies, for me, the reasons I dislike C++, and Windows C++, so much.

Take a look (should you feel like it) at his final example (at the end of that page).  He excuses the ugliness by saying “because I put everything into one huge function instead of breaking them out into subfunctions”, and indeed, this is partly an excuse.  But look at the actual code; that ever-deepening cascase of nested ifs matched by the sudden flurry of CoTaskMemFree and Release calls.  I have to wonder how much simpler this would be in a dynamic language, with grown-up memory management.  But, as Raymond often points out, that’s legacy for ya.

The syntax of C++ also grates, after a good couple of years now with the simpler, more streamlined Python; stuff like if (SUCCEEDED(pwba->QueryInterface(IID_IServiceProvider, (void**)&psp))) just… looks ugly (though I can still read it perfectly well), but I think it’s the sheer effort required to do anything via COM and the Windows API that I remember – the slow trudge through looking up IIDs, playing with QueryInterface and trying to work out who owned what.  It was a lot of work for what was often very little effect and I don’t miss it at all.

But then I’m a curmudgeonly old Hector at the best of times…

[1] I don’t know if this is a real word, but it should be, shouldn’t it?  Like an orangery is to oranges.

Do What I Mean

Quasi (0.8) variable substitution has, as the Americans would say, “gotten smarter”.  What an odd language they do speak.

Anyway, for a good while, there have been two forms of variable substitution in Quasi.  Given a variable called a, you can write either $a or $(a).  Example:

>>>a = "*.py"
>>>os ls $a
['bag.py', 'bomreader.py', 'fix.py', 'floofinstall.py', 'grabbit.py', 'listproxy.py', 'mutabilitytest.py'...

or

>>>os ls $(a)
['bag.py', 'bomreader.py', 'fix.py', 'floofinstall.py', 'grabbit.py', 'listproxy.py', 'mutabilitytest.py'...

The original idea was that the short form (no parentheses) would be for simple identifiers, while the longer form allowed the parser to recognise Python expressions and the like.  That’s still true, so if you wanted to do something clever with a list of filenames, you might write:

os ls $(map(lambda x: x.split('.')[0]+'.pyc', files))

…and that’d all work.  In the context-centric Quasi way of looking at the world, the $() construction escapes out of the os context back into Python.

However, the short form now has added intelligence (“New! Improved! With added context-awareness!”).  Since one can always write $(a) instead of $a, there’s no real downside to this; it addeth, but it taketh not away.

In general, I’m not a fan of languages that try to do what they think you want (*cough* Perl *cough), as opposed to requiring you to state it explicitly.  Explicit is better than implicit, in fact.  But context-sensitive substitution is plain Interesting, and has a certain “cool!” factor to it, as much as a mundane subject like coding ever can.

Consider this; I have the result of a SQL query in variable res.  In Quasi fashion, it’s a self-describing object, an instance of quasi.Bag() which records the column names (as keys), the values (as, er, values) and allows dict or object semantics to access them.  Let’s say that it has the effective value:

{ "ColumnA" : 1, "ColumnB" : 2, "ColumnC", "Eric" }

Now, here are some SQL statements with clever variable substitutions and their results.

Ben@KANDINSKY2 ~/python
$ ./quasi.py
Welcome to Quasi, the multiple-context Python Shell.
    (c) 2004 Ben Last, and released under a BSD license.  Enjoy.

Version 0.8

>>>res = quasi.Bag()
>>>res["ColumnA"]=1
>>>res["ColumnB"]=2
>>>res["ColumnC"]="Eric"
>>>
>>>#Check the values
>>>res
(1, 2, 'Eric')
>>>res.keys()
['ColumnA', 'ColumnB', 'ColumnC']
>>>
>>>#use res as a set of column names
>>>explain `sql select $res from MyTable`
"select 'ColumnA','ColumnB','ColumnC' from MyTable"
>>>
>>>#use res as a set of values
>>>explain `sql insert into MyTable set $res`
"insert into MyTable set ColumnA=1,ColumnB=2,ColumnC='Eric'"
>>>
>>>#use res as both
>>>explain `sql update MyTable ($res) values ($res)`
"update MyTable ('ColumnA','ColumnB','ColumnC') values (1,2,'Eric')"
>>>

I think that’s rather excellent.  SQL is just regular enough that, by looking backwards along the command from the point where the variable substitution occurs, the parser can figure out how to insert the variable.  It also works (in a different way) for the os and shell contexts – strings get quoted and lists get comma-separated on Windows.  The explain command (as in the examples above) also lets you see what a substitution will do, without having to actually do it.

The Secret Is To Bang The Rocks Together, Guys

Ponderings on the relationship between programming, systems engineering and systems administration.

‘Nuff respect, as the Young People say, to the late Douglas Adams for the title line.  There is a link of sorts to the topic; I got an email from my talented and beloved brother a few days ago, pointing me at reports on the Snowdon Race.  The company he works for did a wireless link from the summit of the mountain to get the results out; tres impressive.  He’s sort of also in the IT thing, but the big difference between us (other than the fact he came to it later in life and I’ve always geeked out) is this; he doesn’t program.
Last time I was over at his house he was showing me all sorts of monitoring he’s doing on the huge point-to-point wireless WAN that they run over much of North Wales.  All Windows-based stuff, pulling useful information from traffic monitoring.  And all monolithic big programs.  A Unix system might do the same, but chances are that a lot of the same data could easily be gathered using built-in tools plus judicious amounts of scripting.  But he doesn’t program, so he can’t do that.
I find it a difficult thing to imagine – how one can install, maintain and operate complex mixes of kit from multiple vendors without the glue that binds them together – the ability to knock up the sort of tiny script that texts you when the disk space is low, or creatively purges older files, or scans logs for attack attempts.  I suppose one way is to pay; someone out there, especially in the Windows World, will always sell you an application that does the job for you.  But with that, you lose flexibility – the application will do what the designer intended it to do and nothing more.
I’m responsible for around five servers and (depending on how you count) seven desktops or laptops.  These live on a variety of networks, some wired, some wireless, connected together via VPNs over ADSL connections.  The whole thing would have cost a lot more and take a damn sight more work to administer if it weren’t for scripts (and, since I am who I am, if it weren’t for Python).
Consider this here laptop on which I’m typing.  Every couple of hours it does a backup.  The script wakes, checks IP addresses to work out where it is (all the networks use different ranges in the 10.x.x.x reserved space), connects to the appropriate server and kicks off a low-priority trawl through selected directories, copying modified files over to the backup area.  Where another script runs nightly, squirrelling away older material into tar.bz2 files and synchronizing them between work and home.  Nobody could ever sell me an application that’d do the job in exactly the right way, yet it only took a morning’s work to set up.
Programming, I argue, is the equivalent of banging the rocks together.  It’s the difference between using the machines that you watch over and controlling them.  Without the ability to program, to speak their language, you’re condemned to use the tools you were given.  Without the ability to bang the rocks together to make better flint axes, all you can do is pick up what’s lying around.  That’s no way to evolve.

Backslash And Burn

Grr.  I wrote a lovely long blog, all about working on trains and software development in the shower.  Then Semagic went and lost it.  A pox, I say, upon blogging software that doesn’t autosave.

Ah well.  There’s a new version of Quasi out, for those (nearly a hundred!) who have downloaded it so far.  This one has, at last, proper cd, ls, pwd and dir commands; including proper handling of backslashes so the Windows-ites can type

cd c:\temp

or even

cd d:\

and have their backslashes treated with the respect they deserve.  Good enough for an interim release.  Next, a decent web page for it, and maybe even an RSS feed for updates.  Hoorah and celebrations.  I’m off for a beer.

Autopoesis, Databases And Reference

On the nature of structural coupling between companies engaged in a development project.  And before you ask, this is what it means.  Unless you knew already.
The PlayStation Project proceeds apace, along two parallel lines.  One is the visionary path – the place where people sit around table and discuss the grand Shared Idea, the form (if you will) of the Thing we are creating.  All essential and necessary, no doubt, but the second path is of more interest to me; the Way of Function; what it is that the Thing will do.
Much of that is associated with data; many thousands of records of manually typed, carefully checked, thoughfully constructed data that lives in a MySQL database, surrounded by an attendant court of Python scripts that validate, transform and generally munge it, as well as overcoming some of the limitations of the current MySQL release[1], like replacing triggers with continually running background daemons.
Our development partner, the people who are writing the bulk of the actual code that will run on PS/2s in living rooms and bedrooms worldwide, need access to this data.  So, of course, does the actual game code itself, to do what it has to do; it needs a runtime version of it, a translation from the context in which the data lives now to some other, subtly different environment and set of usages.  And so it begins, with me sending a snapshot of representative rows and tables, the output of a mysqldump.
However often I come across it, the number and depth of unspoken assumptions that lie in any code, whether it be a program or a database definition, still amaze me.  Just as the overall architecture of a system, the UML-level conceptions of objects and relationships, isn’t visible from the source-code (and therein lies the subject of a rant about certain weaknesses in open source, but that’s for another day), the context in which a database operates is invisible from the dry recitation of column names and attributes that define it.  Passing a database to someone else involves whiteboards and a good session with a set of coloured pens.  Why isn’t there anything about that in the methodologies?[2]
A table is not just a table.  Nor is it just a set of records that record attributes of some abstract entity that maps to others in a regular and defined manner – one-to-many, one-to-one and so forth.  No, a record is the capturing of a related set of data about something in the world, that messy, human-dependent network of relationships and fuzzy concepts.  Without being embedded, either for real or by proxy, though conversation, in that world, misunderstandings about what the data represents inevitably arise.
But our partner is based Down South, on the coast below London where old, rich people go to retire.  Or something like that; we’re nominally located in the Midlands and I spend most of my time even further North.  We don’t have a whiteboard that big, so we fall back on email and, while writing a long and careful set of comments on someone else’s SQL, I remembered the Zope reference pages.  In the user comments, there’s this little exchange:

Anonymous User – June 26, 2002 1:43 pm:
The reference describes nearly nothing when you don’t know the solution!
Examples are absolutely necessary.

Anonymous User – July 14, 2002 5:21 am:
Strongly disagree.  This is a reference, not a tutorial or explanatory text.

Anonymous User – Aug. 20, 2002 11:13 pm:
A reference doesn’t contain explanatory text?

Anonymous User – Nov. 12, 2003 10:52 am:
Look at http://www.php.net : it’s both a precise API reference for PHP developers, and a tutorial full of
interesting samples.
Apart from the Zope Book (which efficiently sums up the ZMI) I have rarely seen such a poor pedagogical and
practical approach as among the zope documentation.

The deficiencies of the Zope project’s approach to educating those who seek to learn it are well documented elsewhere, and I don’t propose to go over it again, only to point out that I’m in agreement with the first, third and last comments.  Firstly, you cannot sufficiently explain something without reference to how it’s supposed to fit into the world.  And secondly; examples reveal assumptions and contexts – the way in which the designers thought about how a system should operate.  A reference document, or a SQL definition, may be exact, precise and to-the-point.  Yet without hearing from the humans who wrote them, one can understand only the bare minimum from them.

[1]No, please don’t email me links to PostGreSQL.  Everything in life has pros and cons.
[2]That’s unfair, of course 🙂

Quasi Mojo

Two things… first, someone was kind enough to point out that starting blog entries with irrelevant rubbish means that readers miss what the posting’s actually about.  Good point.  So, yesterday’s entry was about a new Python shell called Quasi, and you can read it if you’re interested in that sort of thing.
Second thing; Quasi is now at version 0.2, with a bunch of additions (mostly in the SQL stuff) and some bugfixes very kindly contributed by Mark (eichin), who’s tested it a little on MacOS X; many thanks, Mark.  Changelog added (oops!) 🙂

The Shells, The Shells, Esmerelda!

You know, given his thpeeth impedimenth, you’d have thought Quasimodo would have found a girlfriend with an easier name, maybe a Betty or an Anna (probably not a Thuthan).  Anyway, that has nothing to do with the subject of today’s content-free babblings, which is the crossover point between shells and environments.

Jeremy Zawodny writes on using the MySQL client like a shell, according to this Forever Geek post.  Now, having rattled on about using interactive Python as an environment for working on MySQL a while back, I thought this could be interesting, until I found that it was just a short blog on aliasing the “cd” and “ls” commands to “use” and “show tables”.  Sort of useful, I guess.

But it got me thinking about shells.  Rather than attempt to gradually retrofit mysql with the basic commands that a shell has, why not take a programmable environment like, oh, I dunno, let me take something at random, say, Python and extend that to hook better into MySQL, or whatever other environment you happen to be in.

There are a few Python shells around, such as IPython and PyCrust or the extremely bashalike PySH (which I like to pronounce to rhyme with fish, just because I can).  I took a look at each one, and found that the whole subject is packed with Interesting Questions.

First off, what’s a Python Shell?  To some people, it’s an interactive Python environment.  To others, it’s something more like bash or cmd, an environment for interacting with an operating system, but which has a degree of programmability (in Python, naturally).  My definition would be the latter – I like command-line shells.  To me, bash is Unix, in the sense that the underlying OS is rather less important than the pipe/small commands/vi nature of the environment, which is why I like cygwin.

Second in the list of Interesting Things To Ponder, it seems to me, is that of the context within which commands are executed.  For instance, all OS shells that I have ever used have a notion of a current working directory, whether this is set explicitly by cd or as some sort of default value for command expansion, as in VMS’s set def.  The mysql client has a notion of the current database which is arguably similar; also it has a connection to a given database server, specified at startup.  These are its context, thus to integrate MySQL into a Python shell, we need to extend the “current context” to include both the server and the current database.

But context-dependence is a double-edged sword.  As well as the working context within which Stuff Happens, there’s the deep and complex question of how a given line of input is interpreted, when there are multiple contexts in which the various elements of it can be assigned semantics.

But that’s putting it too formally.  What I did next was to write some commands that might work in my ideal shell.  First off, I want the ability to do something like:

for x in `ls *.py`: cp x x.replace('.py',.py.sav')

For now, ignore the mix of bash and Python; I want to be able to use ls to get me a list of names and use that as the sequence over which to iterate a for loop.  Then I want to be able to apply Python string operations to the variables to do clever stuff with names, and I want the operating system commands, like cp to be invoked for me.

Here’s another:

for a in "Table1 Table2".split():
  rows = select * from a
  print "%s has %d rows" % (a,len(rows))

After all, the original article that sparked this off was about extending MySQL, so why shouldn’t I be able to mix in SQL commands as well as OS stuff?

One way to do this is to try adding new keywords to Python, but that way madness lies; also that way involves C programming, and I do this sort of stuff for fun, so why drop into another language?  No, there had to be a way to do this in pure Python.

Have you ever had a problem work like a catalyst in your head?  You know the way that grit acts on an oyster, provoking the formation of a pearl, or a seed crystal dropped into a solution can kick off the rapid growth of a bigger structure?  This was like that – over two nights I lost sleep whilst my brain tickled away at the problem.  I dreamed in code, something that hasn’t happened in a long time.  Finally, after two days of nibbling away at the issue without going near a keyboard, I thought I had a breakthrough.  Take a look at the second example above.  How does the interpreter know that the “select” is a SQL expression?  That the “*” is to be passed to the SQL interpreter, where a Unix-esque shell would glob it in place.  I decided that the solution had two parts.  The first is disambiguation: the context in which words are to be interpreted has to be more explicit.  The second is that the shell should be a preprocessor – it would rewrite the input lines so that the Python interpreter always executes pure Python.

The next question – what was an appropriate syntax to use?  I chose to use the backtick, “`”, for these reason:

  • It’s used in Unix shells for a roughly similar purpose – to bracket a command that’s to be executed in a subshell and its output captured, rather like that first example command above.
  • It’s used in Python only as syntactic sugar – one can replace `a` with str(a), so a Python in which the meaning of “ has been overridden loses nothing.

Thus the shell can take any sequence of characters within backticks and rewrite it to be legal Python that will have the desired effect.
Let’s go back to that example:

for x in `ls *.py`

It’s more-or-less equivalent to:

for x in os.popen("ls *.py")[1].readlines():

Python that has the desired effect.

However, I also want to be able to mix in SQL commands, so the shell has to be able to support multiple different contexts in which stuff can be understood.  I also want to be able to add new contexts as I feel like it; choosing a new alternative to backticks for each one would get difficult very fast, so I chose to stay with backticks and disambiguate some other way.

Finally, I chose a name for the shell: Quasi.  This is because:

  • It let me make terrible jokes; when asked how I thought of it, I could reply ‘I had a hunch…’
  • I liked this title of this entry too much not to make the link
  • Nobody will ever be sure how to pronounce it – does the A rhyme with “day” and the I with “eye”, or does the whole rhyme with “Ozzy”?

And after a weekend during which I wrote altogether too much code to be healthy, version 0.1 exists.  You can download the source, if you really want to, here..

So what can it do?  Here are some examples of commands that really work:

>>>#Get a list of all Python files in the current directory and copy them to backup versions.
>>>for x in `os ls *.py`: `cp $x $(x.replace('.py','.py.sav'))`
>>>#connect to a MySQL server.  The keyword "sql" at the start of the line
>>>#is syntactic sugar that lets us leave off the `` ticks.
>>>sql connect host=myHost, user=MyUser, passwd=MyPassword, db=myDb
>>>#do a query and capture the results
>>>table = 'Sample'
>>>rows = `sql select * from $table`
>>>rows
(('Some', 'All'), ('Here', 'There'), ('A', 'B'))

There are three contexts:

  • The SQL context, marked by the keyword “sql” at the start of the line or after the opening backtick (or the character “&”, so `& select * from Sample` is legal).
  • The OS context, marked by the keyword “os”.  OS commands return their output (captured from popen) as a list of strings.
  • The shell context, marked by the keyword “shell” or “!”.  Shell commands return their status, but not their output, so you can do `shell vi $x` to edit the file whose name is in x and it’ll work.

There’s an extra trick for the OS context – if you put a variable at the start, before a “|” (pipe), then that variable will be printed to the stdin of the process.  It handles lists appropriately.
Variable substitution is done with the syntax $ (where the end of the name is recognized sensibly) or $(), where everything between the parentheses is evaluated by Python.  So the following are all legal:

$a
$(a)
$(a.lower())
$(a.strip().lower())
$(getattr(a,"test",None))
$(os.system(a.strip()+".exe")[1].readlines()))

Those of a curious disposition may like to type “trace 3” as their first command – you can see the Python that your command expands to.

Next entry – how the damn thing works, and musings on the interesting nature of shells that have built-in interpreters… code that executes in two different namespaces.

Or not.

We Built This City On Hacks And Code

Paul Ford, whose writing should be required reading for anyone even considering the possibility of joining the blogosphere, writes on outsourcing, saying:

…mostly I’m struck by the irony that the tools, networks, and protocols built over the last 40 years by programmers are the exact mechanism that allows these jobs to move overseas. It’s fast Internet connections and carefully designed software that makes it possible for a programmer in Bangalore to work for a company in Palo Alto. The programmers are being eaten by the creature they made themselves.

It’s a neat, ourosbouros-esque, hoist-by-their-own-petard conclusion.  But it’s not true.

What makes offshoring possible (and I’m assuming here that it’s this, the cross-border-to-cheap-labour way of losing your job, to which Paul’s really referring) isn’t programming, except in the general sense that programming is important as an element in most modern technology.  It’s communications; the international direct dial phone system.  It’s convenient, cheap air travel.  It’s the international system of law that allows contracts to operate between organisations in different countries.  It’s the mobility of capital.  And, in amongst these, it’s the success of the ideas that are the Internet protocols.

If TCP/IP had not taken off as it did, and the huge unwieldy bulk of the ISO layered model had continued to balkanize the Connected Globe[1], perhaps it wouldn’t be this way.  Instead, we’d have a world of Compuserves and Prodigies, where MSN and AOL fought battles over exclusivity of content.  There would be no ISPs, merely relationships with one or other subsidiary of this or that multinational entertainment conglomerate.  But the idea of a net based on standards that everyone followed persisted, not (I submit) through any innate virtue in the engineers or device designers who built it, but from a sort of momentum – the open standards were the grit around which the oyster formed the pearl.  The bigger the pearl, the more obvious the benefits of joining.

But this is going off topic, and not in a good way.  The charmingly closured conclusion that programmers are the architects of their own demise is my target.  Let’s consider it from two angles.

First, the personal.  I know many programmers, as one might expect.  I’m not aware that any of those who are facing aspects of their livelihood going overseas have actively been part of that by writing code.  Perhaps their jobs have changed to include managerial duties that make it easier for someone else to do the coding, but that’s not the same.  In most cases, it’s the guys I know working on the big commercial systems who have suffered, and their code has nothing whatsoever to do with networks; it’s all accounts, balances and reports.  I suppose one could, with sufficient diligence, find someone working in one of the big phone equipment providers, or maybe Cisco, who’s lost out to a programmer in Pakistan or a hacker in Hyderabad, but that doesn’t make a general case.

Second, the general.  Programmers are not a community, they’re a set, with more or less fuzzy boundaries depending on how one defines a “programmer”.  There is no accepted set of agreed targets and goals to which a community might aspire; in fact, ten minutes spent trawling through the areas of USENET wherein programmers debate shows that disagreement happens as often between programmers as between politicians, and sometimes with more venom.  The image of a mass of swarming humanity building a great Cathedral to a Master Plan is wrong.

As a group, programmers write Code to make Systems do Stuff.  Engineers are another set (with a degree of overlap) who build Things that Do Stuff, possibly the same Stuff.  We can proceed to do the same for any of the groups whose contributions to the Way The World Is have made offshoring (a) possible and (b) commercially attractive.  In general, one could argue for hours over the exact details of whose contributions are more relevant than whose, but to little overall effect.  Like the Industrial Revolution ushered in the Age of the Factories and led to the general demise of the village and the cottage industry, the Network Revolution has led to a world in which many things are possible, but was itself dependent on other Revolutions.  Only a few of those are technical.

[1] I was writing Ethernet device software Back Then, and have memories of tediously reverse-engineering ICL printer protocols to try and deduce how the damn things worked.  Proprietary sucked then as now.

Book, Cover, Judge

Shrek 2, bless its little green ears, is as multilayered as an ogre[1], with the “moral” strand being in some way related to the value of appearances as opposed to the “inner nature” of whatever it is you’re currently judging.  Very noble.  On the other hand, I think humans are wired to reach initial conclusions based on how something looks.  As the blue food phenomenon shows, we have some strong inbuilt prejudices for and against certain aesthetic qualities of objects.

To which rule programming languages may not be an exception.  For instance, every so often I play with a SmallTalk environment, something like Squeak for example.  I can get over the way the thing looks, the way the workspace is laid out, the way the nonstandard controls do what they do.  But when I look at SmallTalk code – somehow I know that deep down, it and I are not destined to be more than acquaintances.

Something of the same feeling hangs, for me, around C++.  My first deep and meaningful programming language relationship, if you don’t count the adolescent flirtations with assembler and BBC Basic, was with C, and C syntax still works for me on some basic level of my soul.  But I never could get on with C++, not in the same way; I never managed to get into the zone.  Where does the ampersand go again?  How do you write that odd template thing?  It’s all a bit too arbitrary.

In recent years I’ve done my share of Perling (Perl Diving?) but found, in the end, that the sheer overloaded-punctuation-frenzy of the code began to get to me.  I used to read the Obfuscated C entries for fun, so scary mixtures of characters from the top row of the keyboard don’t frighten me, but Perl seemed to be following a path to a place where the meaning of any single character from the ASCII set became heavily context-dependent; a sort of celebration of semantic overloading.  I’ll still read most anything Larry Wall writes for the sheer pleasure of his wit, but the language that needs two characters to denote object attribute access is no longer my first choice for anything.  Brevity may be the soul of wit, but Perl is too brief – positively hysterical, my dears.

Which brings me, naturellement, to Python.  Being as cynical as anyone (and more so than most), the idea of a language that used whitespace for actual meaningful source semantics seemed… wrong to me, in the same sort of Lovecraftian way that the geometry of Ryleh did to the sailors.  But when I dug deeper, I found that the simplicity and regularity of Python spoke to me and called me on.  Perhaps it’s the call of a Siren, but the neat way in which list comprehensions reused the syntax of lists, loops and if told me that this was a language that worked in the same way I did.  An absence of baroque encumbrances.  Long may it continue.

[1]Or a cake.  If you don’t get that, where were you when the first Shrek came out?

Open-heart Surgery For Fun And Profit

A distraught instant-message comes in from the office, wherein a team of dedicated researchers are populating a vast database to do with The Playstation Project.  He’s inadvertently edited a whole bunch of entries in one table… edited the keys that relate that table to another.  Hence, suddenly, half the entries don’t relate.  How to reconstruct?  This on a live database that’s being edited the whole time.

I could fire up the mysql client and start building huge great selects and subselects to re-insert all the records from A that do not have existing entries in B… you know the kind of thing.  I used to know a guy who did everything in Windows 2000 from an interactive SQL prompt – he lived in the database, so to speak.  Or I could do it in Python, since all the scripts that process the data are Python.

I chose to use SPE, which I’ve been trying out as a sort of alternative to PythonWin.  It’s… nice but flakey, rather like an apple turnover.  When I first installed it I couldn’t save modified files; looks like that was because the version of wxWindows was a bit old.  After fixing that, the only real problem has been that when it executes long-running code there’s no output, so you can write a big for loop that takes a couple of minutes to run, including nice progress-reporting prints and you get the output all in one chunk at the end.  Sub-optimal.  But this isn’t an SPE review.

Python in interactive mode is rather nice for this sort of thing.  You import MySQLdb and proceed, using idioms like:

conn = MySQLdb.connect(host='myhost', user='myuser', passwd='mypasswd', compress=1, db='theDatabase')
cr = conn.cursor()

For basic playing around you can do everything with the one cursor.

#get a list of the keys that have been screwed up.  When building
#SQL strings, get into the habit of using double-quotes, since SQL
#usually likes single ones.
cr.execute("select distinct Key from A where Key like '0%%';")
#fetchall returns a tuple of values, so to get the actual keys, map it.
#"suc" is for Screwed Up Keys...
suc = map(lambda x:x[0], cr.fetchall())

#get a list of the keys that exist in B cr.execute("select distinct Key from B;") existing = map(lambda x:x[0], cr.fetchall())

#Find all the keys in A that don't exist in B fixers = [x for x in suc if not x in existing]

And so on.  Of course, much of this could be done in large and clever SQL constructs, but that wouldn’t give me the ability to check, at each stage, what results I have and what effect they will have.  I also get the ability to do more clever iterative processing, look things up on the fly and generally think about the way in which I solve the problem rather than the exact syntax that one needs to use to express a left join.  And when I’m done, the session tab in SPE gives me a list of all the statements I executed, so I can snarf the clever stuff into a script if I want to.

Things to remember – use MySQLdb.escape_string when building SQL queries (you never know when that rogue apostrophe will strike), use the description tuple of field names attached to a cursor after a select is executed – very handy when you don’t know the order of fields (element 0 of each entry of the tuple is the field name).  In fact, that warrants an example:

cr.execute("select * from SomeTable;")
#Fetch a row
r = cr.fetchone()
data = {} #create a dict
#iterate through the columns
for i in range(len(cr.description)):
 f = cr.description[i][0] #field (column) name
 data[f] = r[i] #set the value in the dict, indexed by the column name

Nothing fancy or groundbreaking, but it’s nice to see a real, practical value for the interpreted nature of Python.