Python monkey patching (for readability)

When preparing a Jupyter1 notebook for a workshop on recommendation engines which I’ve presented with a colleague, I was faced with the following problem:

“How to break a large class definition into several cells so it can be presented step-by-step.”

Having the ability to declare a rather complex (and large) Python class in separate cells has several advantages, the obvious one being the ability to fully document each method’s functionality with Markdown, rather than comments. Python does allow for functionality to be added to classes after their declaration via the assignment of methods through attributes. This is commonly known as “monkey patching” and hinges on the concepts of bound and unbound methods.

I will show a quick and general overview of the methods that Python puts at our disposal for dynamic runtime object manipulation, but for a more in-depth please consult the official Python documentation.

Bound and unbound methods

Let’s first look at bound methods. If we assume a class called Class and an instance instance, with an instance method bound and class method unbound such that

class Class:
    def bound(self):
        return "I'm a bound method"

    @staticmethod
    def unbound():
        return "I'm an unbound method"


instance = Class()

Then foo is a bound method and bar is an unbound method. This definition, in practice, can be exemplified by the standard way of calling .foo(), which is

instance.bound()
: I'm a bound method

which in turn is equivalent to

Class.bound(instance)
: I'm a bound method

The standard way of calling unbound is , similarly

instance.unbound()
: I'm an unbound method

This, however, is equivalent to

Class.unbound()
: I'm an unbound method

In the unbound case, we can see there’s no need to pass the class instance. unbound is not bound to the class instance.

As mentioned before, Python allow us to change the class attributes at runtime. If we consider a method such as

def newBound(self):
    return "I'm a (new!) bound method"

we can then add it to the class, even after declaring it. For instance:

 Class.newBound = newBound
 instance = Class()
 instance.newBound() # Class.newBound(instance)
: I'm a (new!) bound method

It is interesting to note that any type of function definition will work, since functions are first class objects in Python. As such, if the method can be written as a single statement, a ~lambda~ could also be used, i.e.

Class.newBound = lambda self: "I'm a lambda"

instance.newBound()
: I'm a (new!) bound method

A limitation of the “monkey patching” method, is that attributes can only be changed at the class definition level. As an example, although possible, it is not trivial to add the .newBound() method to instance.

A solution is to either call the descriptor methods (which allow for instance attribute manipulation), or declare the instance attribute as a MethodType.

To illustrate this in our case:

import types

instance.newBound = types.MethodType(newBound, instance)

instance.newBound() # Prints "I'm a lambda"
: I'm a (new!) bound method

This method is precisely, as mentioned, to change attributes for a specific instance, so in this case, if we try to access the bound method from another instance anotherInstance, it would fail

anotherInstance = Class()
anotherInstance.newBound() # fails with AttributeError
: I'm a lambda

Abstract classes

Python supports abstract classes, i.e. the definition of “blueprint” classes for which we delegate the concrete implementation of abstract methods to subclasses. In Python 3.x this is done via the @abstractmethod annotation. If we declare a class such as

from abc import ABC, abstractmethod
class AbstractClass(ABC):
	@abstractmethod
	def abstractMethod(self):
		pass

we can then implement abstractMethod in all of AbstractClass’s subclasses:

class ConcreteClass(AbstractClass):
	def abstractMethod(self):
		print("Concrete class abstract method")

We could, obviously, do this in Python without abstract classes, but this mechanism allows for a greater safety, since implementation of abstract methods is mandatory in this case. With regular classes, not implementing abstractMethod would simply assume we were using the parent’s definition.

Unfortunately, monkey patching of abstract methods is not supported in Python. We could monkey patch the concrete class:

ConcreteClass.newBound = lambda self: print("New 'child' bound")
c = ConcreteClass()
c.newBound() # prints "New 'child' bound"
: New 'child' bound

And we could even add a new bound method to the superclass, which will be available to all subclasses:

 AbstractClass.newBound = lambda self: print("New 'parent' bound")
 c = ConcreteClass()
 c.newBound() # prints "New 'parent' bound"
: New 'child' bound

However, we can’t add abstract methods with monkey patching. This is a documented exception of this functionality with the specific warning that

Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported. The abstractmethod() only affects subclasses derived using regular inheritance; “virtual subclasses” registered with the ABC’s register() method are not affected.

Private methods

We can dynamically add and replace inner methods, such as:

class Class:
	def _inner(self):
		print("Inner bound")
	def __private(self):
		print("Private bound")
	def callNewPrivate(self):
		self.__newPrivate()

Class._newInner = lambda self: print("New inner bound")
c = Class()
c._inner() # prints "Inner bound"
c._newInner() # prints "New inner bound"
: Inner bound
: New inner bound

However, private methods behave differently. Python enforces name mangling for private methods. As specified in the documentation:

Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by subclasses), there is limited support for such a mechanism, called name mangling. Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, as long as it occurs within the definition of a class.

We can then still access the private methods (although we probably shouldn’t), but monkey patching won’t work as before due to the above.

c._Class__private() # Private bound
Class.__newPrivate = lambda self: print("New private bound")
c = Class()
c._Class__newPrivate() # fails with AttributeError

We have defined a new method called __newPrivate() but interestingly, this method is not private. We can see this by calling it directly (which is allowed) and by calling the new “private” method from inside the class as self.__newPrivate():

c.__newPrivate() # prints "New private bound"
c.callNewPrivate() 
# fails with AttributeError (can't find _Class_NewPrivate)

It is possible to perform some OOP abuse and declare the private method by mangling the name ourselves. In this case we could then do:

Class._Class__newPrivate = lambda self: print("New private bound")
c = Class()
c._Class__newPrivate() # prints "New private bound"
c.callNewPrivate() # prints "New private bound"

Builtins

Is it possible to monkey patch builtin classes in Python, e.g. int or float? In short, yes, it is.

Although the usefulness is arguable and I strongly urge not to do this in any production scenario, we’ll look at how to achieve this, for the sake of completeness. A very interesting and educational read is available from the Forbidden Fruit Python module.

Primitive (or builtin) classes in Python are typically written in C and as such some of these meta-programming facilities require jumping through extra hoops (as well as being a Very Bad Idea™). Let’s first look at the integer class representation, int.

A int doesn’t allow bound methods to be added dynamically as previously. For instance:

p = 5
type(p) # int

We can try to add a method to int to square the value of the instance:

int.square = lambda self: self ** 2

This fails with the error TypeError: can't set attributes of built-in/extension type 'int'. The solution (as presented in Forbidden Fruit) is to first create classes to hold the ctype information of a builtin (C) class. We subclass ctypes Python representation of a C struct in native byte order and hold the signed int size and pointer to PyObject.

import ctypes
class PyObject(ctypes.Structure):
	pass

PyObject.fields = [
	('ob_refcnt', ctypes.c_int),
	('ob_type', ctypes.POINTER(PyObject)),
]

Next we create a holder for Python objects slots, containing a reference to the ctype structure:

class SlotsProxy(PyObject):
	_fields_ = [('dict', ctypes.POINTER(PyObject))]

The final step is extract the PyProxyDict from the object referenced by the pointer. Ideally, we should get the builtin’s namespace so we can freely set attributes as we did previously. A helper function to retrieve the builtins (mutable) namespace can then be:

def patch(klass):
	name = klass.__name__
	target = klass.__dict__
	proxy_dict = SlotsProxy.from_address(id(target))
	namespace = {}

	ctypes.pythonapi.PyDict_SetItem(
		ctypes.py_object(namespace),
		ctypes.py_object(name),
		proxy_dict.dict,
	)
	return namespace[name]

We can now easily patch builtin classes. Let’s try to add the square method again by first retrieving the namespace (stored below in d) and setting it directly

d = patch(int)
d["square"] = lambda self: self ** 2

p.square() # 25

All future instance of int will also contain the square method now:

(2 + p).square() # 49

Conclusion

“Monkey patching” is usually, and rightly so, considered a code smell, due to the increased indirection and potential source of unwanted surprises. However, having the ability to “monkey patch” classes in Python allows us to write Jupyter notebooks in a more literate, fluid way rather than presenting the user with a “wall of code”. Thank you for reading. If you have any comments or suggestions please drop me a message on Mastodon.