Open-heart Surgery For Fun And Profit

A distraught instant-message comes in from the office, wherein a team of dedicated researchers are populating a vast database to do with The Playstation Project.  He’s inadvertently edited a whole bunch of entries in one table… edited the keys that relate that table to another.  Hence, suddenly, half the entries don’t relate.  How to reconstruct?  This on a live database that’s being edited the whole time.

I could fire up the mysql client and start building huge great selects and subselects to re-insert all the records from A that do not have existing entries in B… you know the kind of thing.  I used to know a guy who did everything in Windows 2000 from an interactive SQL prompt – he lived in the database, so to speak.  Or I could do it in Python, since all the scripts that process the data are Python.

I chose to use SPE, which I’ve been trying out as a sort of alternative to PythonWin.  It’s… nice but flakey, rather like an apple turnover.  When I first installed it I couldn’t save modified files; looks like that was because the version of wxWindows was a bit old.  After fixing that, the only real problem has been that when it executes long-running code there’s no output, so you can write a big for loop that takes a couple of minutes to run, including nice progress-reporting prints and you get the output all in one chunk at the end.  Sub-optimal.  But this isn’t an SPE review.

Python in interactive mode is rather nice for this sort of thing.  You import MySQLdb and proceed, using idioms like:

conn = MySQLdb.connect(host='myhost', user='myuser', passwd='mypasswd', compress=1, db='theDatabase')
cr = conn.cursor()

For basic playing around you can do everything with the one cursor.

#get a list of the keys that have been screwed up.  When building
#SQL strings, get into the habit of using double-quotes, since SQL
#usually likes single ones.
cr.execute("select distinct Key from A where Key like '0%%';")
#fetchall returns a tuple of values, so to get the actual keys, map it.
#"suc" is for Screwed Up Keys...
suc = map(lambda x:x[0], cr.fetchall())

#get a list of the keys that exist in B cr.execute("select distinct Key from B;") existing = map(lambda x:x[0], cr.fetchall())

#Find all the keys in A that don't exist in B fixers = [x for x in suc if not x in existing]

And so on.  Of course, much of this could be done in large and clever SQL constructs, but that wouldn’t give me the ability to check, at each stage, what results I have and what effect they will have.  I also get the ability to do more clever iterative processing, look things up on the fly and generally think about the way in which I solve the problem rather than the exact syntax that one needs to use to express a left join.  And when I’m done, the session tab in SPE gives me a list of all the statements I executed, so I can snarf the clever stuff into a script if I want to.

Things to remember – use MySQLdb.escape_string when building SQL queries (you never know when that rogue apostrophe will strike), use the description tuple of field names attached to a cursor after a select is executed – very handy when you don’t know the order of fields (element 0 of each entry of the tuple is the field name).  In fact, that warrants an example:

cr.execute("select * from SomeTable;")
#Fetch a row
r = cr.fetchone()
data = {} #create a dict
#iterate through the columns
for i in range(len(cr.description)):
 f = cr.description[i][0] #field (column) name
 data[f] = r[i] #set the value in the dict, indexed by the column name

Nothing fancy or groundbreaking, but it’s nice to see a real, practical value for the interpreted nature of Python.

2 thoughts on “Open-heart Surgery For Fun And Profit

  1. escape_string

    Just happened to try using MySQLdb for the first time yesterday, for a similar one-off kind of manipulation, and had same sort of satisfied experience. Regarding yr tip to use escape_string when building queries, doesn’t DB API purport to take care of quoting for you, as long as you let cr.execute fill in the parameters? In other words, in my limited experience, by doing:

    values = {‘a’:”A ‘string’ with quotes”}
    cr.execute(“SELECT * FROM tbl WHERE a = %(a)s”, values)

    all was well without any extra escaping (or quoting, for that matter).

    -Andy Boyko

    • Re: escape_string

      Yes, I believe it does, if you use parameter substitution like that. However, I have this weakness for building all the SQL commands using literals, since I like to print them out to be damn sure what’s going to be done before I invoke execute() (especially on a live system!).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s