Part 1: Avoiding callback hell in Python
Part 1: Avoiding callback hell in Python

Part 1: Avoiding callback hell in Python

Friday, 19 August 2016

Here it is, my first blog post! I wanted to write about a few migration steps I've gone through with pySNMP. After I’d written the introduction to that article I noticed it became too large and I had to split it up. So more on pySNMP later.

Back in time...

At Transceptor Technology we’ve been using asynchronous code from the very beginning. Back when Node.js was still shiny new (even before I started working for Transceptor Technology) I showed it to my colleagues. One of them, Art, said to me "Ah Python has had this functionality for ages in a library called Twisted". I hadn't been programming much in Python yet, but since I like reading about differences in languages I knew the basics back then but never did anything serious with it. Most of my code was either VBA, C# or STEP7 (yep I used to program PLC's). After starting at Transceptor Technology and we’d finally decided what our main language was going to be I started to look into twisted. I think this was version 11 or 12.

First, let's talk about callback hell in JavaScript.

Callback hell in Javascript, Wikipedia defines a callback as follows:

In computer programming, a callback is a piece of executable code that is passed as an argument to other code, which is expected to call back (execute) the argument at some convenient time. The invocation may be immediate as in a synchronous callback, or it might happen at a later time as in an asynchronous callback. In all cases, the intention is to specify a function or subroutine as an entity that is, depending on the language, more or less similar to a variable.

So let's start with a contrived example:

  var modified = doSomething(data);

      if (response == 'Ok'){

Here we have two nested functions with callbacks

  • requestSomeData(function(data){ ... })
  • storeData(function(response){ ... })

We have "invented" indentation styles that try to recreate a natural program flow. "Code should read like a narrative from top to bottom". We're used to have a relation between decision logic and indentation. Reading from top to bottom and having the top things happen first and the bottom things happen last is how we've been brought up since we've learned to read.

An example solution

Having multiple nested layers helps you land in callback hell. We have already come up with alternatives in Javascript for example the Q library:

.then(function (value4) {
   // Do something with value4
.catch(function (error) {
   // Handle any error from all above steps

Python equivalents

Even though there are some differences semantics The twisted variant of a promise is a Deferred (docs here):

dfd = Deferred()

# fake a result
reactor.callLater(10, dfd.callback, 'Ok done')

That last line tells the twisted reactor to call a function dfd.callback after 10 seconds with the parameter Ok done.

So if I'd write an example equivalent to the example of the Q library I'd write:

def doSomething(value4):
    # Do something with value4

def onError(error):
    # Handle error

dfd = promisedStep1()

You could chain the callbacks if you like

promisedStep1().addCallback(promisedStep2).addCallback(promisedStep3) ... .addErrback(onError)

Nesting in Python

The equivalent of a nested function declaration as those shown in JavaScript would be the following

def apiCallX(param, someParamNeededForFormatting):
    def _cb(result):
        newResult = doSomeFormatting(result, someParamNeededForFormatting)
        return newResult

    dfd = databaseQuery1(param)
    return dfd

One could argue a lambda would be more equivalent to anonymous function in JavaScript. The point however is that one should avoid nesting these scopes. In Python the overhead that functions have is relatively high compared to other languages. Having to keep the outer function and its' variables in memory all the while you have a (perhaps) long running database query is an unnecessary burden. Just as one in JavaScript should avoid building a nest of functions function spaghetti or lasagna if you will in Python you can better avoid those as well. The following variant is cleaner and more memory efficient.

def apiCallX(param, someParamNeededForFormatting):
    dfd = databaseQuery1(param)
    dfd.addCallback(_apiCallXCb, someParamNeededForFormatting)
    return dfd

def _apiCallXCb(result, someParamNeededForFormatting):
    newResult = doSomeFormatting(result, someParamNeededForFormatting)
    return newResult

Now you're only keeping the variables that you need in memory, which are the parameters to addCallback: the function you want to call and the additional parameters to that function.

For quite some while twisted had a way around this with the inlineCallbacks. I've found an article from 2008 decorator. You could write the following:

async def apiCallX(param):
    result = await databaseQuery1(param)
    newResult = doSomeFormatting(result)
    return newResult

@inlineCallbacks does some generator magic (a more detailed article) which makes your code more concise. At first I was slow to adopt this because @inlineCallbacks has some overhead. But after a while getting more comfortable with it and realizing my application was more IO and memory bound than cpu bound I started "inlining" most stuff. Notice however that returnValue is actually an Exception that gets caught by @inlineCallbacks


Then along came asyncio and now in Python 3.5 you can write the function above with less magic. Well I'm not sure about that but now the magic is inside the Python language. Python 3.4 has @asyncio.coroutine but I have mixed feelings about that one. It feels like (and is) generator magic all over again. It's a good thing that asynchrony is supported by the language instead of by a framework. If only because then you don't have multiple async frameworks all having their different implementations. The frameworks still exist but I think in order to remain relevant these have to adopt or come closer to asyncio. Tornado has built a bridge between tornado and asyncio and twisted is working on bridging the gap.

So in Python3.5 our api call would look like:

async def apiCallX(param):
    result = await databaseQuery1(param)
    newResult = doSomeFormatting(result)
    return newResult

And since you can have a return in a generator with Python3.4 we don't need a returnValue equivalent. Keep in mind that nesting callback functions remains generally a bad idea regardless of which framework you're using. There are certainly situations to be found where it would be easier or better to do it anyway but I'm talking about the general case here.

There's another difference which is not obvious when you're switching from twisted to asyncio. In twisted as soon as you start your request (if reactor is running) the request will be running. In asyncio however the request only starts as soon as you start awaiting it or call asyncio.ensure_future.

So where @inlineCallbacks also starts the generator with asyncio you'll have to explicitly ask for it.

Stay tuned for a second and even a third part!