Technical debt, part 1:
Definition and causes
If you have worked in the industry for a few years, there’s a good chance that you heard the term “technical debt”. It is a very important concept that can be somewhat hard to define. However, it is important to understand it, so it can be recognized and dealt with – because otherwise, it could ruin your project and cause great harm to the business.
Author
Miroslav Lazovic is an experienced software engineer from Belgrade. Over the past 15 years, he worked on many different projects of every size – from small applications to mission-critical services that are used by hundreds of thousands of users on a daily basis, both as a consultant or part of the development team.
Since 2016, he is focused on building and managing high-performing teams and helping people do their best work. He likes discussing various important engineering and management topics – and that’s the main reason for this blog. Besides that, Miroslav has quite a few war-stories to tell and probably far too many books and hobbies. There are also rumors of him being a proficient illustrator, but the origin of these rumors remains a mystery.
– Backend TL@Neon and Dev. Advocate @Holycode
Definition of technical debt from Wikipedia is: “…the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer”. This definition is attributed to Ward Cunningham (who is known for creating the first wiki). While certainly true, I think that this definition does not take some very important factors into consideration – it is too focused on the technical side of things (although Ward does hint at some much larger issues in his explanation). Also, we have to keep in mind that the landscape of the IT industry has changed significantly since this definition was created. Traditionally, technical debt was attributed mostly to the application code. Today, the code is everywhere – it’s not only your application anymore, but also your testing frameworks and various other tools, your CI/CD pipelines, even your infrastructure. This means that the problems caused by choosing a quick or sub-optimal solution can surface on any of these levels (or to be blunt – you can mess up in so many ways). But, that’s just one part of the story.
One of the more fitting descriptions of technical debt that I came across in recent years is this: “…the incremental cost and loss of agility to a company as a result of prior decisions that were made to save time or money when implementing new systems or maintaining existing ones” [1].
“…technical debt is really the project or even organisational debt – it is the result of the actions and processes of the company in general.”
There are several reasons why I believe that this definition is more fitting. First of all, it focuses on the decisions. This is very important, because there is much more to technical debt than just bad or poorly maintained code. All kinds of technical decisions are made during the application lifecycle (architectural and design decisions, choice of tools, approach to deployment and monitoring, etc.) – and all of them combined may (and probably will) have an impact on the final outcome. Second, it takes into account that the company will feel the consequences of large technical debt. That is because technical debt is really the project or even organizational debt – it is the result of the actions and processes of the company in general [2]. In every company, there is always some tension between the business and the engineering. The existence of this tension is natural, because business wants to increase the speed of delivery, while engineering wants more stability [2]. The way this tension is being managed can actually lead to a lot of bad decisions being made, which in turn can produce a high level of technical debt.
Let’s now discuss the conditions under which the technical debt is created. Martin Fowler identified a concept he called Technical Debt Quadrant [3].
– Deliberate and reckless: This type of debt happens when team shows complete disregard for good design choices and makes decisions that will create technical debt. It is very likely that many important topics have not been taken into consideration: the long-term impact of decisions, plan for addressing the issues later or limiting the damage resulting from these particular choices. This type of debt is very likely to happen when speed is the only important factor.
– Deliberate and prudent: This is a situation where knowledge about the impact of technical decisions exists – the team is creating debt (and knows it), but there are also plans for addressing the possible consequences. Usually, this means that once the goals or deadlines are achieved, the team will pay off the technical debt (by fixing the production issues, refactoring the code, modifying the architecture, etc.)
– Inadvertent and reckless: This type of debt happens where the team does not have the required level of knowledge or experience, but still implements the solution (completely unaware of the consequences of their actions). This can lead to very large issues later in the project lifecycle, because it is often assumed that team members should possess certain level of knowledge and when it becomes obvious that this was not the case, the damage has already been done.
– Inadvertent and prudent: This type of debt actually happens despite the team being knowledgeable doing their best to come up with a properly designed solution. In this case, the debt originates from the fact that the team learns about the new (and better) ways to do certain things over the project’s lifetime, or has to react to some completely new (and unforeseen) use cases.
Let’s now address the cost of technical debt. Here’s a chart [4] that shows the cost of developing a new feature over time, with or without the proper design:
“Bad design compounds and if you keep working on a foundation made of bad decisions and neglect, you are eventually going to end up in a place where adding any change is going to be a Sisyphean undertaking.”
This means that with proper (evolutionary) design, adding features later in the project lifecycle will be relatively easy (and it will keep the cost low), while if you simply keep building without any plan and never actually go back and address the existing issues, you may end up in a situation where making any addition to the existing mess is going to be very expensive in every possible way. That is because (as stated in “The Economy of Software Design” talk by JD Rainsberger): “…having difficult design to work with makes everything you do with that design more difficult which then actually encourages you to do the wrong thing, which adds more stuff that becomes difficult to deal with”[4]. Bad design compounds and if you keep working on a foundation made of bad decisions and neglect, you are eventually going to end up in a place where adding any change is going to be a Sisyphean undertaking. I will present an example of this later in the article, but it should not be hard to imagine the impact that this situation would have on any company that relies on software in order to conduct its business activities.
If you wonder just how expensive technical debt could be, there’s some data that can give you a pretty good idea. There is an organisation called CISQ (Consortium for Information & Software Quality), that provides open-source standards for measuring software risk. They publish some technical reports as well, and one of them is the report called “Cost of Poor Software Quality in the US ” [5]. At the time of writing, the report for 2022 was not fully available, but let’s take a look at the data for 2020:
As you can see, the estimated cost of technical debt (as the report specifies, the cost that resides in defects that have to be addressed) is 1.31 trillion USD – which is ridiculously high. This is a good indicator of how devastating the poor quality of software could be for businesses that rely on software.
So far, we have covered the definition, the origin and the cost of technical debt, but how do you recognize it? What are the usual signs and symptoms – are there any clues that might lead you to believe that your team/project is suffering because the level of technical debt is too high? There certainly are some tell-tale signs: technical atrophy, Inability to react and anxiety and depression.
Technical atrophy
The most common manifestation of this issue is gradual increase in time required for development of new features or bug fixing. Slowly, things that were done in 3 days are now done in a week, what was done in a week now lasts two weeks and larger modifications can take months to complete. The main reason for this is a codebase that is not properly maintained – there are no coding standards, no project structure, no internal architecture. Tests may not exist or they may not be maintained at all (and yes, tests are also code and this means they become technical debt at some point as well). The other way this problem manifests itself is actually the inability to add any new component or feature. You may also have issues with old libraries that can’t be updated anymore without significant modifications to the existing code, or even worse – can’t be updated at all (usually, because they are not maintained/supported anymore). This may also translate to old tools used for specific operations (like deployment pipelines) or even whole software solutions written in unsupported languages.
Inability to react
How you react when production issues show up is very important – and your business may suffer greatly if this is not handled properly. Usually, once you detect production issues, you need to fix them and then issue updates or release new versions of the application (containing the corresponding fix, of course). If you can do this (reasonably) quickly – that’s great, but if that is not the case you might suffer monetary losses, customer dissatisfaction or even reputation damage. For example, what if you do not have proper monitoring and alerting in place, so you are not even aware that the issue exists? Then, once you actually learn that something is wrong (maybe from customer support or by some other channel) you start working on the issue, but you cannot replicate it easily – maybe because the application does not have proper logging in place (or maybe there is some logging, but without sufficient level of details). So, you lose a lot of time trying to figure out what the hell is wrong, but now comes the fun part – the actual fixing of the issue. If the codebase is in a poor state, the development and testing may be quite a challenge, for the reasons explained in the previous section. And when the fix is finally finished, you need to release new version of the application and because your release/deploy pipeline is in a bad shape (or maybe doesn’t exist), or requires planned outages and many manual steps to complete, you are again faced with losing precious time and even facing the risk of failing the process completely. And then you need to repeat this for every single production issue.
Problems within the team
At some point, the state of the project will affect the mental wellbeing of the team. Having some issues is OK – but having many (or even worse, most) of the listed issues on a project you are working on is simply a horrific scenario. Such teams would suffer whatever they try to do – be it development of new features, extending existing systems, maintaining codebase or resolving production issues. Whatever motivation or enthusiasm there is will be soon drowned in the ocean of problems. If no effort is made to improve the situation (at least partially), engineers will simply leave (or may refuse to join your company if they suspect that they will end up working on such a project).
Conclusion
This concludes the first part of the story about the technical debt. This article tried to provide better explanation of the term itself, as well as cover reasons and circumstances that lead to the creation of the technical debt. Our next article will cover the anatomy of debt and discuss strategies for managing technical debt (as well as provide some real-world examples).
References:
1. The Financial Implications of Technical Debt: https://www.toptal.com/finance/part-time-cfos/technical-debt?_hsmi=61728738
2. Tecnical debt and why it will ruin your software: https://labcodes.com.br/blog/en-us/development/tech-debt/
3. Technical Debt Quadrant: https://martinfowler.com/bliki/TechnicalDebtQuadrant.html
4. The Economics of Software Design (J.B. Rainsberger): https://www.youtube.com/watch?v=TQ9rng6YFeY
5. The Cost of Poor Software Quality in US (2020 Report): https://www.it-cisq.org/the-cost-of-poor-software-quality-in-the-us-a-2020-report/