How to write readable code

Posted on February 2, 2021

How to write readable code

These are some of the things I think about when trying to write clean, readable code.

Prioritize Clarity

There are many ways to write any piece of code. Some will run faster, some will take less memory, some will be easier to test. And some will be more clear.

The first step to writing clear code is to make it a priority.

This means you have to deprioritize other aspects, like speed. There’s no such thing as prioritizing one thing without deprioritizing something else (when everything is a priority, nothing is).

Develop a sense for clarity

Writing well requires knowing what good writing looks like, and creating clear code requires knowing what clear code looks like. Reading well-regarded code can give you a sense of what good can look like.

A good sense for clear code won’t keep you from writing unreadable code, but it will tell you what parts don’t smell right.

Edit

Your first idea for how to write the code will rarely be the most clear.

It’s often easier to find a readable way to write code after you’re finished with the mental work of getting the first version written down. Reading back over what you just wrote will help give ideas for how to improve it.

Start by explaining

If you’re not sure how to organize the code, start by explaining what needs to be done as though you are telling it to another person (or rubber duck). Write it down: “Well, we need to skip it if the user is deleted, or if the order is already in progress…” Take that explanation and transform it into code.

When laying out the code, it’s better to be thinking in terms of human communication rather than machine abstractions.

Comments

Add comments that explain why the code is doing what it is doing, or is structured the way that it is structured.

Just reading the logic won’t tell you why the author thought that was the right logic. There might be some business reason you don’t know about - perhaps users outside the US sometimes put the street number at the end of the first line of the address. Or maybe there is some little technical detail - this query is structured in this weird way to convince Postgres to optimize it correctly. These are added bits of context that don’t exist in the code itself.

Code can’t self-document if it isn’t there. If you decide to not write some code and don’t leave a comment explaining why, there will be nothing left to explain what you were thinking!

Even if it is possible to understand the reasoning from just reading the code, it’s hard mental work that can be very easily prevented.

Don’t mix levels

Don’t mix levels of abstraction within a method.

This mixes levels of abstraction:

def welcome(self):
  results = db.query(
    'SELECT EXISTS 1 FROM emails WHERE kind = ? AND user = ?',
    'welcome_email', self.user.id,
  )
  if results[0]:
    return
  self.send_welcome_email()

This doesn’t:

def welcome(self):
  if not self.has_sent_welcome_email():
    self.send_welcome_email()

Mixing levels of abstraction makes the reader jump between thinking about what is being done and how it is implemented.

When you talk about what code does, you are talking about the current level of abstraction. When you talk about how the code does it, you are talking about the next level of abstraction down.

In the welcome method, what it does is send a welcome email if it has not already been sent. How it determines if the email was already sent is to query the database of past email records. Notice that the second version of welcome moves the ‘how’ to a separate method. It’s only concerned with the ‘what’, meaning it stays at one level of abstraction.

Make each function live at one level of abstraction, and delegate lower-level details to methods at lower levels of abstraction. Methods with a single level of abstraction tend to read like a story about what is going on.

Break out functions

Big functions can (sometimes!) be made more readable by breaking them up into smaller functions.

Sometimes that function acts like a series of steps, in which case it works well to extract a function for each step. Other times, there are different decisions to be made, each of which could be made in a different function. Perhaps there’s parts of the function that act like making a decision and parts that act like taking action. There’s a lot of different dimensions you can use to split up a function. It takes practice to get good at seeing the right one to use.

Smaller functions have some advantages:

Each bit of logic is given a name. This makes it easier to know what each bit of logic is for and helps you find where a bit of logic lives.
There are fewer variables in scope.
It’s easier to tell what the program was “thinking” when you look at a stack trace or run a debugger.
The small functions can be tested separately.

Computers would work fine with no functions at all. Functions exist for the sake of programmers, so make good use of them.

Don’t break out functions

The Don’t Repeat Yourself (DRY) idea is often taken too far.

Now, it’s a very good idea to extract magic numbers to constants and have one copy of the logic for making a particular decision. Repeating those bits of code is a bad idea.

DRY starts to go too far when two functions that happen to share a handful of lines become a target for deduplication. Completely avoiding duplicated lines means that you’ll end up with confusing, nonsense abstractions that exist only to hold those few shared lines. This makes the code weirdly hard to change because the structure of two unrelated pieces of code will be tied together.

The test for whether some pieces of code should be deduplicated is simple: would anything bad happen if one was changed without also changing the other? If the answer is yes, then make a single source of truth for it. If not, consider leaving it alone.

The point of DRY isn’t to run a manual compression process on the codebase, it’s to avoid a dependency where two parts of the code need to be manually kept in sync. Remember, deduplicating code is not the same thing as creating an abstraction.

Avoid configurable functions

Prefer many functions to a few, configurable functions.

I’m sure you’ve seen stories like this one: you start off with a clean function that’s called in three different places. You want to use it in a fourth place, but it needs to do something slightly different, so you add a configuration parameter. Then the first caller gets a new feature, requiring two more configuration parameters. A fifth use case is added with its own special parameter. Caller #2 is too slow, so you add yet another parameter for skip part of the work.

Somehow, your clean function that started life doing one thing now has 5 configuration parameters and does potentially 2^5 = 32 different things (or more)!

It’s much better to have multiple functions, each of which does just one thing.

Once you have separate functions, there will of course be duplication. When those shared parts need to be kept in sync, apply DRY and extract them to shared functions. This is easier if the function is already broken down into subfunctions for decisions and steps.

Remember, a few duplicated lines is fine! If each of the separate functions has its own for-loop over a list, that’s very acceptable duplication.

One advantage of this approach is that when one use-case goes away, you can easily delete the relevant function. You don’t have to dig around in the logic of a complicated function to tease out the parts that were for that particular set of options.

Readers of a special-purpose function will find it much easier to understand what it does.

(Note that this is only the right approach if you control all the callers of the function. If you function is part of a public API, then the reasoning here doesn’t apply because you don’t know what all the usecases are or will be.)

Don’t prematurely optimize

Race cars go faster than normal cars, but at the expense of having hard seats, making lots of noise, and lacking A/C. If you don’t know that your function is going to need to be a race car, don’t strip out the A/C yet. Leave the creature comforts in - focus on writing code that is easy for humans to read instead of easy for computers to run.

The same is true about premature generalization. You wouldn’t buy a dump truck if you don’t need to haul huge loads of things, so you also shouldn’t make your code able to serve all kinds of needs that may never happen.