On Patterns In Python

I open-sourced some code today. The iterstuff package for Python is a tool for working with iterables and generators, and allows for some interesting tricks. All very nice, and open-sourcing anything is good. But that’s not what I wanted to write about.

The package consists of a main class (called Lookahead) and some example recipes – bits of code that use the Lookahead. One of the recipes, batch, yields slices of an iterable. A non-Lookahead version of it would be something like this:

from itertools import islice, chain
def batch(iterable, size):
    """
    Yield iterables for successive slices of `iterable`, each
    containing up to `size` items, with the last being less than
    `size` if there are not sufficient items in `iterable`.
    Pass over the input iterable once only.
    @param iterable: an input iterable.
    @param size: the maximum number of items yielded by any output
    iterable.
    """
    it = iter(iterable)
    while True:
        batchiter = islice(it, size)

        # The call to next() is done so that when we have sliced
        # to the end of the input and get an empty generator,
        # StopIteration will be raised, and bubble up to be raised
        # in batch(), and thus iteration over the whole batch will
        # stop.
        yield chain([next(batchiter)], batchiter)

What interests me about this code is that it demonstrates a blind spot that I’ve seen in my own code several times. See where the code calls batchiter.next()? It reads the first element from the iterable, and then uses itertools.chain to ‘push’ that element back onto the front of the iterable. It’s clever, but a little clumsy.

And later it occurred to me that patterns are to blame. Many of the examples that explain generators have the same pattern: a loop that yields values. For example:

def count_from_one(limit):
    for i in range(limit+1):
        yield i + 1

This pattern says “a generator does some work and then yields each element from a loop”. But there’s no rule that says a generator can only yield once.

So we can rewrite batch in a neater way:

def batch(iterable, size):
    it = iter(iterable)

    # Instances of this closure generator are
    # yielded as iterables.
    def slicer(first_element, slice_of_it):
        # Yield the first element
        yield first_element

        # Yield the rest of the iterable
        for element in slice_of_it:
            yield element

    while True:
        slice = islice(it, 0, size)
        # If slice is empty, then calling
        # next(slice) will raise StopIteration
        # and exit the loop.
        yield slicer(next(slice), slice)

Here we use a nested generator that contains two yields, one for the first element and then a loop for the rest. It avoids the use of chain because it doesn’t try to follow the ‘usual’ pattern for generators.

Design patterns are good things, but they can lead you astray.

SQLAlchemy Magic

So I wrote a post over on the Mobify blog about reducing the memory requirements of SQLAlchemy when doing big queries. It’s interesting writing for a company blog instead of my own – I’d be a lot more quirky and use funnier examples if it were just for me. On the other hand, it’s good sometimes to have the time to dig deep into a subject and work out proper examples and actual stats, to make a better post.

Anyway, if you have to use SQLAlchemy to pull big data sets into Python code, you might find it interesting.

 

Switching to WordPress

So I moved blog hosts, and left Livejournal. This was not the result of some long soul-searching process, but the outcome of scratching one particular itch: I write posts that contain code or sometimes guitar tab and I was getting irritated with how complex Livejournal made it to do any sort of formatting.

Plus… I love markdown. You can get used to it very quickly when writing descriptions in github, and we use big, detailed markdown README files at Mobify. So I want that power when writing blogposts.

Plus… WordPress.com promised to import my old blog, comments and all.

The only downside is that some other Ben Last, somewhere in the world, has registered benlast.wordpress.com and put an empty, unused blog there, meaning that I can’t use that blog name. The name I used, Kajikazawa, is from a Hokusai picture, a copy of which I once owned, before I sold almost everything and moved to Canada. Let the new blog start be as cathartic as the new life start.

Image

Store your AWS keys in your KeyChain.

So you're on your Macbook, and you want to run some AWS utility, or reference your AWS keys in your code. Of course, you could wire them into the environment, with something like:

export AWS_ACCESS_KEY_ID=my-aws-key
export AWS_SECRET_ACCESS_KEY=my-super-secret-key

But you're security-conscious, and you don't want to do that. Enter the power of the MacOS KeyChain: you can run a command to look up the keys from the KeyChain.

First, add both the AWS key and the AWS secret to the keychain, as Passwords:

  • For the AWS key, use "AWS" as the name, "AWS_KEY" as the account, and put the key in the password.
  • For the AWS secret key, use "AWS" as the name, "AWS_SECRET_KEY" as the account, and put the secret key in the password.

Now define an alias in your .bash_profile:

alias with_aws='env AWS_ACCESS_KEY_ID=$(security find-generic-password -a AWS_KEY -w) AWS_SECRET_ACCESS_KEY=$(security find-generic-password -a AWS_SECRET_KEY -w) bash -c'

This alias lets you run any bash command in a subshell with those environment variables set, and when the command ends, the subshell exits and the values are forgotten.

To just run a subshell with the environment variables set, use with_aws bash.

If you'd rather have the variables defined for any bash session (which is not that secure, but it's your call), then add this to your .bash_profile:

export AWS_ACCESS_KEY_ID=$(security find-generic-password -a AWS_KEY -w)
export AWS_SECRET_ACCESS_KEY=$(security find-generic-password -a AWS_SECRET_KEY -w)

More Grimacing

Following on from my previous post about grimace, a Python package that allows generation of regular expressions using fluent syntax, I’ve added some more tricks.

In the previous post, I showed how one can use grimace to write regular expressions in this form:


RE().any_number_of().digits().followed_by().dot().then().at_least_one().digit()

All good, but do we really need to invoke each method and have all those parentheses cluttering up the line? Because grimace is written using functional principles, each method returns a new RE object, based on the previous one but modified. Thus in theory we must call a method because we need to do some work, not just return an attribute.

Python descriptors will save us. A descriptor is a class that defines one or more of the __get__ , __set__ or __delete__ methods. Like properties in C#, they allow code to be executed when a descriptor is accessed as through it were an attribute.

Previously in grimace, we had methods like:

def start(self):
    """Return a new RE with the start anchor '^' appended to the element list"""
    return RE(self, '^')

Now we can define an Extender class and use it to rewrite start().

class Extender(object):
    """An Extender is a descriptor intended for use with RE objects, whose __get__ method returns a
    new RE based on the RE on which it was invoked, but with the elements extended with a given
    element, set when the Extender is initialized. It exists so that methods like start() and end()
    can be invoked as attributes or methods."""
    def __init__(self, element=None):
        self.element = element

    def __get__(self, instance, owner):
        if isinstance(instance, RE):
            if self.element is not None:
                return RE(instance, self.element)
            else:
                return RE(instance)
        return None

#... and here's how start() is now defined in the RE class...
class RE(object):
    start = Extender('^')

So far so good. We can now use grimace as in this example:

RE().any_number_of.digits.followed_by.dot.then.at_least_one.digit

But what happens if a poor programmer gets confused and invokes digits or dot as methods? We can fix this easily. The result of getting digits or dot or any other Extender is a new RE object, so executing digits() will get that new RE and then try to call it. All we need to do is make a RE instance callable, by adding this __call__ method:


def __call__(self): return self

That’s all we need. It means that the execution of digit() becomes get the digit attribute, which is a new RE instance, then call it, which returns itself.

There is one more nicely functional trick. We can use an Extender to append more than just strings; we can also use it to append special objects, like the grimace Repeater that defines how many repetitions of a character are required. Because all objects in grimace are functional, they’re immutable, and that means any instance may be shared between any number of threads or other bits of code. So I can write the not_a method (which adds a Not instance to the regular expression, to invert the meaning of the next match) as:


not_a = Extender(Not())

That instance of Not() is then shared between every RE() object that uses it. Functional programming is an excellent tool.

(The documentation for grimace is now in the github wiki page)

Expressing Oneself, Fluently

I use regular expressions[1] in Python often enough that I know many of the character classes and syntax tricks.

I use regular expressions in Python seldom enough that I forget many of the character classes and syntax tricks.

This is annoying, but I have lived with it. Then I came across a little article in Dr Dobb’s Journal (which used to be great, many years ago, but is now a mere ghost of its former glory), in which Al Williams writes about (ab)using the C/C++ compiler to allow regular expressions to be written as:

start + space + zero_or_more + any_of("ABC") + literal(":") + group(digit + one_or_more)

rather than

^\s*[ABC]:(\d+)

I quite like the idea of fluent syntax (though I appreciate that it’s not necessarily so appealing if your native language isn’t English), but I spend marginally more time writing Python than C++ these days. Also, I like the idea of trying to build a fluent interface in a functional way. So, I started out by writing some examples of what I would want to be able to do:

start().end() would give the minimal empty string regex "^$". Easy enough.

Or how about

any_number_of().digits().followed_by().dot().then().at_least_one().digit()

You get the general idea: the fluent syntax describes the expression and results in a string that matches what it describes. One really good thing is that it avoids the “backslash plague” that can confuse those new to writing regular expressions in Python.

This now exists, and is on github at https://github.com/benlast/grimace and it will do the above and more. Time for some more intense code examples:

The grimace.RE object is our starting point; any method we call on it returns a new RE object. Let’s get the regex to match an empty string.

>>> from grimace import RE
>>> print RE().start().end().as_string()
^$

The as_string() call turns the generated expression into a string that can then be used as the argument to the standard Python re module. There’s also as_re() which will compile the regular expression for you and return the resulting pattern-matching object.


>>> #Extract the extension of a short DOS/FAT32 filename, using a group to capture that part of the string
>>> regex = RE().start().up_to(8).alphanumerics().dot().group(name="ext").up_to(3).alphanumerics().end_group().end().as_string()
>>> print regex
^\w{0,8}\.(?P\w{0,3})$
>>> #Use the re module to compile the expression and try a match
>>> import re
>>> pattern = re.compile(regex)
>>> pattern.match("abcd.exe").group("ext")
'exe'
>>>
>>> #We can do that even more fluently...
>>> RE().start().group(name="filename").up_to(8).alphanumerics().end_group() \
... .dot().group(name="ext").up_to(3).alphanumerics().end_group() \
... .end().as_re().match("xyz123.doc").groups()
('xyz123', 'doc')('xyz123', 'doc')
>>>
>>> #The cool example that I wrote out as a use case
>>> print RE().any_number_of().digits().followed_by().dot().then().at_least_one().digit().as_string()
\d*\.\d+
>>>

You can also split a complex regex over several lines:

# My python module
from grimace import RE

def is_legal_number(number):
    #Match a US/Canadian phone number - we put the RE() stuff in parentheses so that we don't
    #have to escape the ends of lines
    north_american_number_re = (RE().start()
      .literal('(').followed_by().exactly(3).digits().then().literal(')')
      .then().one().literal("-").then().exactly(3).digits()
      .then().one().dash().followed_by().exactly(4).digits().then().end()
      .as_string())
    number_re = re.compile(north_american_number_re)
    return number_re.match("(123)-456-7890") is not None

There is more to do: control over greedy matching, and ways to express some of the more complex tricks like backreferences and multiple matching subexpressions. And I’ll also package it properly for installation via pip. But for now, it’s available and it works.

(The documentation for grimace is now in the github wiki page)

[1] I can’t write any article on regular expressions without quoting Jamie Zawinski: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Memento Mori

A discussion at work took place about design patterns, and it prompted me to go back and look at something. One of the behavioural patterns is Memento, which (from Wikipedia) “provides the ability to restore an object to its previous state (undo)”. The implementation described uses three objects: a caretaker, an originator and a memento object that the caretaker can use to restore the originator to a previous state.

From a functional programming viewpoint, I find this overly complex. Consider a system in which an object follows the functional principle of immutability. To “change state”, a caller calls a method on that object, but the original object is unchanged: the method returns a new object with a different state. There’s then no need to have the complexity of undo – the previous state is available via the original instance.

Of course, this is already the case with immutable strings in many languages. Consider this C#:

    string s1="This String is Mixed Case";
    string s2=s1.ToLower();
    //Oops - we need to undo the ToLower() operation.
    s2=s1;  //This is effectively an "undo"

The only principle to keep in mind is to save a reference to a previous version of the object if there is a potential need to undo it.

Benefits: huge saving of complexity, and therefore of testing. I find that the testing load is often forgotten when developers assess how much work is involved in an implementation, unless TDD is in the blood of the team.
Downside: potential extra use of memory to keep hold of previous objects. With intelligent sharing of expensive resources between objects, this can be minimal.

Scales And Patterns And Modes, Oh My!

I wrote this article a while back, for WholeNote.com, but I thought I'd revisit it. I've been doing some improvisation practice recently, and I wanted to refresh my memory on this quick way to find the scales for different modes.

This little chart gives you an easy fretboard pattern that'll help you quickly remember the scale for any of the modes listed below:

  • Min – natural minor
  • Dor – Dorian
  • Ion – Ionian (major)
  • Loc – Locrian
  • Mix – Mixolydian
  • Lyd – Lydian
  • Phr – Phrygian
 
|-----|-----|-----|-----|-----|-----|-----|
|-----|-----|-----|-----|-----|-----|-----|
|-----|-Min-|-Dor-|-----|-Ion-|-Loc-|-----|
|-----|-----|-Mix-|-----|-Lyd-|-Phr-|-----|
|-----|-----|--C--|-----|-----|-----|-----|
|-----|-----|-----|-----|-----|-----|-----|

First, you need to know the major (Ionian) scale patterns – you need to know how to play a major scale anywhere on the neck, given the root note. Take your time and come back when you know how to do that. It shouldn't take long: there's only one scale to learn.

Ready? Good. What the chart gives you is a way to find out which major scale contains the same notes as the mode you want to play. Confused? Let's try and explain it this way.

The Ionian mode (the major scale) starts at the root note and progresses up in steps as follows: tone, tone, semitone, tone, tone, tone, semitone. So the Ionian mode of C contains C (root), then a tone up to D, then a tone up to E, then a semitone to F, a tone to G, a tone to A, a tone to B and a final semitone to C again. So C Ionian contains the notes C, D, E, F, G, A, B.

Conveniently enough, those same notes also form the Aoelian (natural minor) scale for A, though you'd start with A as the root note, giving A, B, C, D, E, F, G. Those same notes also form the notes of the D Dorian scale: D, E, F, G, A, B, C. Get the idea? If you want to know the notes that make up a modal scale for a given note, they'll be the same as the major scale for another note.

For example, here are some more equivalents:

  • C Mixolydian has the same notes as F Ionian (major)
  • C Lydian has the same notes as G Ionian
  • C Phrygian has the same notes as G#/Ab Ionian
  • C Dorian has the same notes as A#/Bb Ionian
  • C Locrian has the same notes as C#/Db Ionian

The chart gives you an easy way to remember these. See the C note? To find the scale for, say, the Mixolydian mode, you look to see where Mix is (it's on F, one string above). That tells you that the Mixolydian scale for C has the same notes as F major. So to play your groovy mixolydian improv, just play the notes in the F major scale.

Learn the pattern on the chart (it's easy, only two strings to think about). Remember the names (I use the order in which they occur if you played the notes ascending): "Mix, Lyd, Phryg, Dor, Ion, Loc". Or Mixolydian, Lydian, Phrygian, Dorian, Ionian, Locrian. Of course, the Ionian in there isn't that useful: it tells you that to play C Ionian, you play, er, C Ionian. I left it there because it makes the pattern easier to remember.

Of course, you can move the pattern up and down the fretboard. For example, if you do it based on D (two frets up from where it's shown), it'll tell you that D Mixolydian is G Ionian, D Lydian is A Ionian, etc.

For an extra trick, if you know the natural minor scale patterns, you can use those to give you alternate ways to play the modes. For this, you need to know how to find the relative minor for any major scale. For example, the relative minor of C is A (the Am scale has all the same notes as C). If you look at the chart, you can see the Min, which gives you a way to work out the relative minor: the chart shows that for C, the relative minor is Am.

So, if you want to play in, say C Dorian, you can play (look at the chart) either Bb Ionian, or its relative minor, which would be Gm. They both contain the same notes as C Dorian.

This plays a lot better than it reads.  If you truly want to get it, find yourself a backing track that's all on one note (if you have a keyboard, pick a note, hold the key down and let it play).  Then try different scales over the top and get used to how they sound.  Then you can go out and start playing improv jazz…

Immutability in C++

The guys who work with me know all too well that I have a bit of a thing for functional programming.  Given the slightest opportunity I can ramble on about the advantages of immutable value types and side-effect free functions. But I find that too many introductions to FP focus on abstract problems like generating to Fibonacci series. Interesting, but I don't think too many developers face that challenge on an everyday basis.

So I thought I'd write about applying some FP principles to everyday languages, such as C++. I chose C++ because it's often considered amongst the least functional languages. Many of the same principles can apply in Java or C#, but we'll start here, and we’ll start with immutable value types.

An type is immutable if its value (and for a class instance that means any values of any members) can’t change after it’s been created. This gives us several advantages:

  • No functions that takes a value type as an input can change it, so it reduces side-effects
  • There’s no need to lock an immutable object for access by multiple threads, since no thread can change the value
  • You can use the value of an immutable object as a key in a hashed data structure, since the hash value can’t ever change after creation
  • You can compare value types using simple value equivalence – if two value type instances have the same value, then for all practical purposes they are the same instance
  • Incidentally, it can be easier for C# developers to understand immutability because the System.String class is a good example. All the methods that "modify" a string return a new string – you can’t actually modify the contents of a string (unless you get very clever with reflection, but that’s just breaking the rules).

    Now let's take a C++ example and think about immutability.

    //A non-immutable Employee class
    
    #include 
    
    class EmployeeNF {
    public:
    	//The constructor body is in EmployeeNF.cpp so that
    	//there's something for the linker to use.
    	EmployeeNF(const std::string& aGivenName,
    		   const std::string& aFamilyName,
    		   const int	aNumber,
    		   const double &aSalary);
    
    	void SetNumber(const int aNumber) {
    		number=aNumber;
    	}
    
    	int Number() const {
    		return number;
    	}
    
    	void SetSalary(const double& aSalary) {
    		salary=aSalary;
    	}
    
    	double Salary() const {
    		return salary;
    	}
    
    	void SetGivenName(const std::string& aGivenName) {
    		givenName=aGivenName;
    	}
    
    	std::string GivenName() const {
    		return givenName;
    	}
    
    	void SetFamilyName(const std::string& aFamilyName) {
    		familyName=aFamilyName;
    	}
    
    	std::string FamilyName() const {
    		return familyName;
    	}
    
    private:
    	std::string	givenName;
    	std::string	familyName;
    	int		number;
    	double		salary;
    };

    Here we have a class that any C++ developer might write. It observes some good OO principles: the members are private and all access is via setters and getters. But it’s fully mutable: any code can change any value.

    Now let’s rewrite it so that it’s immutable. Here’s a first draft:

    //An immutable Employee class
    
    #include 
    
    class EmployeeF {
    public:
    	//The constructor body is in EmployeeF.cpp so that
    	//there's something for the linker to use.
    	EmployeeF(const std::string& aGivenName,
    		  const std::string& aFamilyName,
    		  const int	aNumber,
    		  const double&	aSalary);
    
    	const int Number() const {
    		return number;
    	}
    
    	const double& Salary() const {
    		return salary;
    	}
    
    	const std::string& GivenName() const {
    		return givenName;
    	}
    
    	const std::string& FamilyName() const {
    		return familyName;
    	}
    
    private:
    	const std::string	givenName;
    	const std::string	familyName;
    	const int		number;
    	const double		salary;
    };

    Some C++ points to note:

  • The member variables are now all marked const. This means that they can only be initialised in a constructor, using initialization lists. This is the most fundamental way of making them immutable.
  • The getter methods mostly return const references. There are long debates about whether this is a good idea, or prevents compiler optimizations (RVO or NRVO, if you wanted to Google), but I’ve chosen to do to illustrate that since the member variables don’t change, it’s possible to return references to them.
  • The getters return const values – whether this is appropriate depends on design intentions, but it prevents some common errors (such as use of = rather than == in a comparison). It also reflects the fact that we’re dealing with an immutable type.
  • Okay, this gives us immutability, but what if I need to change the salary for an Employee? To do this efficiently, we can write “setters” that copy most of the values for an instance, such as:

    	const EmployeeF SetNumber(const int aNumber) const {
    		return EmployeeF(givenName,familyName,aNumber,salary);
    	}
    
    	const EmployeeF SetSalary(const double& aSalary) const {
    		return EmployeeF(givenName,familyName,number,aSalary);
    	}
    
    

    (Here I’m assuming the default copy constructor will do a good enough job for us – in fact it will lead to duplication of the strings, but avoiding that would be a whole different problem and require strings with reference counting).

    So a “setter” like this allows me to get a new EmployeeF instance that is the same as the one on which I called the method, but with a new value for the number member.

    Let’s dig a little further. How would we use such an immutable type?

    //Let's create our first EmployeeF
    EmployeeF e("Ben","Last",1,1000.0);
    
    //Great, let's now change the Number
    EmployeeF e2 = e.SetNumber(2);
    
    //Now let's alter the salary too
    e2 = e2.SetSalary(1100.00);

    But if you try this… that line won’t compile. Why? Because you’re trying to modify e2, and e2 is an immutable type. The compiler will try and build you a default operator= and it’ll fail, because all the member fields of e2 are const. Instead, you have to adopt a functional programming pattern and use variables that you don’t modify:

    //Let's create our first EmployeeF
    const EmployeeF e("Ben","Last",1,1000.0);
    
    //Great, let's now change the Number
    const EmployeeF e2 = e.SetNumber(2);
    
    //Now let's alter the salary too
    const EmployeeF e3 = e2.SetSalary(1100.00);

    Every time we make a change to an immutable EmployeeF, we need a new variable to hold the new value. This isn’t actually a bad thing. A basic principle of functional programming is that you don’t modify state. Your code takes values and applies transformations that result in new values – it doesn’t modify the original ones. And that’s exactly what this immutable class forces you to do, and why I declared the variables themselves as const in that example above.

    Okay, so what have I tried to show you here?

  • C++ supports you in defining types that are immutable once created
  • A key principle of immutable types is that functions operating on them return new values and leave the old ones unmodified
  • Variables holding immutable types are also immutable
  • And that’ll do for today…

    Counting down in Python

    I have events coming up in my life; some are good and some are not so good. Anticipating the good ones is positive, and one way to do that is to have a little countdown visible on my desktop. I run Ubuntu Linux on my home laptop, and so I wanted a way to drop a countdown into the menu bar (where Unity shows indicator output).

    Unity (the Ubuntu window manager/interface) does this using indicators (yet another term to learn for gadgets). It took about 30 minutes to work out how to knock up a little indicator in Python, and here it is for anyone who needs it. I based it on a bare-bones clock indicator: the link to the original is in the comments.

    #!/usr/bin/env python
    # Unity indicator for countdown to a fixed date
    # author: ben last
    # based on Phil Ayres' clock indicator
    # (see https://gist.github.com/philayres/1309895)
    # 17 Mar 2013
     
    import gobject
    import gtk
    import appindicator
    import os, sys
    import time
    
    #Define the target time as a time_t
    TARGET=time.mktime((2013,12,24,23,59,0,0,0,0))
    
    def on_left_click(widget,ind):
        """Quit"""
        ind.set_label("")
        sys.exit(0)
      
    def on_timer(ind):
      """Update the label - we always use local time."""
    
      #Get the local time as a time_t (seconds since the epoch)
      t=time.mktime(time.localtime())
      
      #Get the offset to the target such that a positive value
      #means the target is in the future, in seconds.  Also save
      #in offset, so we can spot the end of the countdown.
      offset=seconds=TARGET-t
    
      #If we're at or past the target, reset to zero
      if seconds =0
    
    if __name__ == "__main__":
        #Create the indicator with a clock icon
        ind = appindicator.Indicator("simple-countdown",
                                     "clock",
                                     appindicator.CATEGORY_APPLICATION_STATUS)
        ind.set_status (appindicator.STATUS_ACTIVE)
    
        #Set the label
        on_timer(ind)
    
        #Creata a menu and an item
        menu = gtk.Menu()
        item = gtk.MenuItem("Quit")
    
        item.connect("activate", on_left_click, ind)
        item.show()
        menu.append(item)
        ind.set_menu(menu)
    
        #Start a one-second tick
        gtk.timeout_add(1000, on_timer, ind)
    
        gtk.main()