Tech

Staying sharp as a developer

Cocoon Labs

Staying sharp

Being a professional programmer is, for the most part, extremely fulfilling: it stretches you, challenges you and encourages you to venture to places you otherwise might not go. But sometimes work can lull you into a false sense of security where, instead of pushing you to find new ways of approaching a problem, you find yourself following the same old patterns.

In his PyWaw 2015 Keynote "Stopping to Sharpen Your Tools", Brandon Rhodes gave the advice that every now and then a programmer should take stock of the software they use — be it Vim, Emacs, an IDE, or a whole host of other supporting software — and consider what their tools are capable of and whether they are using those capabilities to their full potential.

I also believe that when it comes to using a programming language day-in-day-out it is important to keep your mind keen by constantly solving new and interesting problems.

Being paid to write code is a dream come true, but quite a lot of the time it involves maintaining someone else's code, submitting bug fixes, changing configuration files, or merging minor features. Rarely do you get the satisfaction of starting a project from scratch, or solving a problem from start to finish.

It is for this reason that I advise anyone who writes code for a living to "work" outside of work. It will remind you why you like programming and it will almost certainly improve your abilities as a programmer.

I recently followed my own advice by setting myself the challenge of solving the Project Euler problems in Python 3. To make it more interesting I limited myself to the standard library (which in Python isn't much of a limitation!), no peeking at solutions until I had one myself, use iterators and generators wherever possible, and to unittest everything! The main purpose of all of this was to become more familiar with creating a Python package from scratch with a comprehensive test suite and to use features of Python 3 that I don't usually get to use in my production code.

After a few weeks of solving problems I found, to my suprise, that the good practices I had developed for Project Euler started to seep into my work. An example of this occurred when I was writing some monitoring code. The solution included an AWS Lambda to pull access metrics from a number of log files in S3 and submit them to Cloudwatch. Each line of each log file represented at least one metric (maybe more), which needed to be processed before being submitted to Cloudwatch.

Boto, the Amazon Python client, has the ability to batch up 20 metrics into one request, which vastly improves performance by minimising expensive network I/O, but how should each batch be collected?

One approach would be to set up an empty list for the batch, take each log file, process each line one at a time, append metrics to the batch and, when the batch contains 20 metrics, send it. The code for that would look like this:

def lambda_handler(event, context):
    client = boto3.client('cloudwatch')
    s3_data = event['Records'][0]['s3']
    bucket = s3_data['bucket']['name']
    key = s3_data['object']['key']
 
    namespace = 'access-metrics'
 
    batch = []
    for event in parse_log(bucketkey):
        for metric in process(event):
            batch.append(metric)
            if len(batch) == 20:
                client.put_metric_data(
                    Namespace=namespace,
                    MetricData=batch,
                )
                batch = []
    if batch:
        client.put_metric_data(
            Namespace=namespace,
            MetricData=batch,
        )

(The parse_log and process generators are not included in this snippet for purposes of brevity, but they essentially retrieve a log file and parse each line into events, and process each event for one or more metrics respectively.)

One disadvantage to this solution is that it is very rigid. It is specific to this problem and cannot be reused elsewhere without significant refactoring. It can't be split off into a separate module or package for example. Another is just how untidy it looks. Appearances count for a lot in code and an ever increasing indent in loops and conditionals is a good sign there could be a better way. The duplication of client.put_metric_data to mop up any left over metrics in the last batch should also be a red flag that something isn't as good as it could be.

The opposite approach, and the one I went with, is to treat the log lines as a potentially endless stream of data. It doesn't matter where the stream is coming from, you just have to process it, as it arrives, and worry about batching later. The code for this approach looks something like this:

def batches_of(size, iterable):
    it = iter(iterable)
    while True:
        batch = list(islice(itsize))
        if not batch:
            return
        yield batch
 
def lambda_handler(event, context):
    client = boto3.client('cloudwatch')
    s3_data = event['Records'][0]['s3']
    bucket = s3_data['bucket']['name']
    key = s3_data['object']['key']
 
    namespace_prefix__ = key.partition('/')
    namespace = namespace_prefix + '-access-metrics'
 
    metric_data = (
        metric
        for event in parse_log(bucketkey)
        for metric in process(event)
    )
 
    for batch in batches_of(20metric_data):
        client.put_metric_data(
            Namespace=namespace,
            MetricData=batch,
        )

The advantage here is that the problems of processing log lines and creating batches are encapsulated in separate generators, which makes the approach extremely flexible. If the requirement changed and we needed all of the metrics at once it would be trivial to remove the batching code altogether. Also, the batching itself is independent of the processing code, it doesn't know anything about it; you could apply that function to any iterable and it would still do its job.

In fact, since writing this code I have found other places where batching is useful and have applied the batches_of function to them, despite the fact that they are slightly different problems.

So I guess the moral of all of this is not to expect your work to push you to all the solutions that could be out there. Go out and read about your programming language of choice, listen to talks and watch videos about it, but most of all, use the tools you work with; both in and out of work hours!

Submit a Comment