Tech debt gets worse before it gets better

Posted on January 29, 2022

Projects to pay down tech debt tend to start by making more tech debt. The level of tech debt gets worse before it gets better. The new, cleaner code is just adding to the mess until it allows you to finally delete the old code.

This has two implications. First, don’t wait too long to start paying off tech debt. You don’t want to be in a situation where temporarily making it any worse is unbearable. Second, don’t stop clean-up projects in the middle. If you do, you may be left worse off than you started.

It gets worse before it gets better

Have you ever tried to reorganize the stuff on some really messy shelves? You probably found it hard to put any item where it should go because another object was intruding on its home (and that object’s place is also occupied, and so on). Before long, you’ve resorted to pulling everything off of a shelf, moving it to a pile on the floor. This gives you a clear shelf to start filling with the things that belong there. Now progress can begin, and you can put that one shelf in order.

As you started the work of organizing the shelf, things initially got worse, not better. What started as a messy set of shelves became a messy set of shelves plus piles on the floor. It wasn’t until near the end that things started to take a turn for the better. If you graph the amount of mess over time, it might look like this:

The very same thing happens when you take on a project to start paying down tech debt. As you start to replace the old thing with the new thing, you initially start making the mess worse, not better. It’s only as you get towards the end of the project that the amount of tech debt finally goes below the original level.

It is reasonable to ask: Why does this happen?

Rebuilding a car while driving it

What about tech debt makes it get worse before it gets better?

It helps to start with thinking about why the tech debt is there in the first place. Systems tend to accumulate tech debt as they are changed. With each tweak or new feature, the old structures become slightly less correct. Hacks and work-arounds creep in. Like adding layers upon layers of paint, each change introduces only a small amount of tech debt, but it adds up.

People don’t change systems frequently for no reason. You don’t go out in the garage and rearrange the box of Christmas lights several times a week. You only make changes to things when you expect to get some value out of those changes. Frequent changes imply that there’s a lot of value to be had (otherwise it wouldn’t be worth the effort). There is a correlation between the rate at which a system accumulates tech debt and the value that the system provides¹.

That value depends on the system continuing to work. The fact that you need the system to keep working means you can’t just delete it and write a new clean piece of code from scratch. The business wouldn’t be too happy if the signup system was just gone for six months while you work on writing a shiny new one. Other developers would suffer if you deleted code that’s needed to build and test their systems. You have to keep the old thing working while you are busy making it cleaner.

The siren song of the ground-up rewrite calls to us, trying to make us forget the fact that it’s much harder to replace an existing system than it is to create one from scratch². In reality, rewrites feel more like trying to rebuild a car while driving it down the road.

You have to keep supporting old APIs until every client has moved to the new APIs. Existing data has to be migrated to new models. You have to research (and often replicate) the quirks and undocumented features of the existing system. All these different issues have a complicated web of dependencies between them, making it hard to even know where to start.

Without a map

Any time you are paying down non-trivial tech debt, you are probably restructuring the system to make it a better fit for today’s needs. The good news is that you should get a system that’s cleaner and easier to reason about. The bad news? The structure is different, so the components of the old system - the classes, database tables, services, pipelines - won’t map 1:1 to components in the new design.

Say you are redesigning a system involving products and prices. In the old system, there’s one giant messy class with all the price calculation logic. In the new system, each product’s class is responsible for price calculations for that one product. While you are working on writing those new product classes, you’ll still have to use the old messy class for price calculations (at least when working with the products that you haven’t migrated yet). You can only safely delete that old code when every single product is ready to calculate prices for itself. In the meantime, you have a lot of duplication between the new product classes and the old price calculation class. This duplication happened because the components of the new system didn’t map 1:1 to the components of the old system - you couldn’t just drop in a new price calculation class with the same interface as the old one. Instead, you had to wait until all the different things it mapped to (all the different product classes, in this case) were in place before you could finally delete it.

You may often find yourself in a situation where there’s basically no piece of the old system that you can safely delete until nearly all of the new system is in place.

So, tech debt accumulates due to changes. Changes are made because the system has value. That value still needs to be provided while the tech debt is being fixed. That makes life difficult and you end up with duplicated systems.

Now there are two of them

That duplication of systems is why the amount of tech debt gets worse. You are literally adding the tech debt in the new version of the system to the tech debt that already existed in the old version. It’s like pulling the contents of a shelf out into a pile on the floor.

Many things become more difficult when you have these duplicated systems. For example, what if you need to redact some data? Now you have to remove it from two tables. If there’s a security update to be done, you probably have to do it twice. To find out why we incorrectly sent someone an email, you have to look at how both the old and the new system might have sent that email. This additional tech debt makes it harder to debug problems, harder to make changes, and harder to operate the system.

Fight the entropy

What can you do about it? Don’t wait too long to start paying off technical debt, and once you’ve started paying it off, don’t stop in the middle.

As tech debt accumulates it goes from beneath notice to annoying to intolerable and finally show-stopping. If you wait until it becomes intolerable - or even close to intolerable - before you start paying it off, I have some bad news for you. The process of paying it off will (temporarily!) make it even worse. It might push things over the edge from annoying to intolerable or intolerable to completely show-stopping. If you wait too long, you’ll find yourself between a rock and a hard place. Either you start a project to pay it off (which makes it worse), or you leave it alone, and it will keep getting worse by itself.

Tech debt becomes more difficult to pay off the worse it is. Projects for paying off a big piece of tech debt take longer and incur more temporary debt than projects to pay off smaller amounts of tech debt. It’s hard to make changes in systems with a lot of tech debt - and that applies to changes intended to fix that tech debt just like any other changes.

You also really, really don’t want to stop in the middle of a project to pay off tech debt. You don’t want to start with the “things get worse” part and never get to the “before they get better” part!

The importance of not stopping in the middle has a few implications. First, try to get the tech debt fixed as quickly as possible so that new priorities don’t have a chance to override it. Second, if you do end up stopping in the middle, remember that there’s now an extra large amount of value to be hard from restarting the project! Restarting a stalled migration project can be better than starting a new project since it will both take less time (part of the work has already been done) and will have a bigger payoff (The [re]starting point includes the tech debt of the partial new system).

Now if you’ll excuse me, I see a bookcase that needs rearranging.

There’s an interesting double effect here: increasing the rate at which a system is changed both increases the amount of tech debt that accumulates and increases how often that tech debt is encountered. A corollary is if people never need to touch a system, they probably won’t care about how much tech debt it contains.↩
Joel on Software, Things You Should Never Do↩