More Grimacing

Following on from my previous post about grimace, a Python package that allows generation of regular expressions using fluent syntax, I’ve added some more tricks.

In the previous post, I showed how one can use grimace to write regular expressions in this form:


All good, but do we really need to invoke each method and have all those parentheses cluttering up the line? Because grimace is written using functional principles, each method returns a new RE object, based on the previous one but modified. Thus in theory we must call a method because we need to do some work, not just return an attribute.

Python descriptors will save us. A descriptor is a class that defines one or more of the __get__ , __set__ or __delete__ methods. Like properties in C#, they allow code to be executed when a descriptor is accessed as through it were an attribute.

Previously in grimace, we had methods like:

def start(self):
    """Return a new RE with the start anchor '^' appended to the element list"""
    return RE(self, '^')

Now we can define an Extender class and use it to rewrite start().

class Extender(object):
    """An Extender is a descriptor intended for use with RE objects, whose __get__ method returns a
    new RE based on the RE on which it was invoked, but with the elements extended with a given
    element, set when the Extender is initialized. It exists so that methods like start() and end()
    can be invoked as attributes or methods."""
    def __init__(self, element=None):
        self.element = element

    def __get__(self, instance, owner):
        if isinstance(instance, RE):
            if self.element is not None:
                return RE(instance, self.element)
                return RE(instance)
        return None

#... and here's how start() is now defined in the RE class...
class RE(object):
    start = Extender('^')

So far so good. We can now use grimace as in this example:


But what happens if a poor programmer gets confused and invokes digits or dot as methods? We can fix this easily. The result of getting digits or dot or any other Extender is a new RE object, so executing digits() will get that new RE and then try to call it. All we need to do is make a RE instance callable, by adding this __call__ method:

def __call__(self): return self

That’s all we need. It means that the execution of digit() becomes get the digit attribute, which is a new RE instance, then call it, which returns itself.

There is one more nicely functional trick. We can use an Extender to append more than just strings; we can also use it to append special objects, like the grimace Repeater that defines how many repetitions of a character are required. Because all objects in grimace are functional, they’re immutable, and that means any instance may be shared between any number of threads or other bits of code. So I can write the not_a method (which adds a Not instance to the regular expression, to invert the meaning of the next match) as:

not_a = Extender(Not())

That instance of Not() is then shared between every RE() object that uses it. Functional programming is an excellent tool.

(The documentation for grimace is now in the github wiki page)

Memento Mori

A discussion at work took place about design patterns, and it prompted me to go back and look at something. One of the behavioural patterns is Memento, which (from Wikipedia) “provides the ability to restore an object to its previous state (undo)”. The implementation described uses three objects: a caretaker, an originator and a memento object that the caretaker can use to restore the originator to a previous state.

From a functional programming viewpoint, I find this overly complex. Consider a system in which an object follows the functional principle of immutability. To “change state”, a caller calls a method on that object, but the original object is unchanged: the method returns a new object with a different state. There’s then no need to have the complexity of undo – the previous state is available via the original instance.

Of course, this is already the case with immutable strings in many languages. Consider this C#:

    string s1="This String is Mixed Case";
    string s2=s1.ToLower();
    //Oops - we need to undo the ToLower() operation.
    s2=s1;  //This is effectively an "undo"

The only principle to keep in mind is to save a reference to a previous version of the object if there is a potential need to undo it.

Benefits: huge saving of complexity, and therefore of testing. I find that the testing load is often forgotten when developers assess how much work is involved in an implementation, unless TDD is in the blood of the team.
Downside: potential extra use of memory to keep hold of previous objects. With intelligent sharing of expensive resources between objects, this can be minimal.