Note: Ryan Kent recently published a related article at LinkedIn. I’ve included excerpts from that article herein with Ryan’s permission and involvement. Ryan's contribution to this article includes an editorial eye and some of his own text.
Sub-title: Not Every Defect is a ‘Bug’
“Name It to Tame It.” Dr. Dan Siegal coined that phrase as a way of handling difficult emotional responses. Applying a name to an emotion can help a person manage or conquer it. In the same way, the labels we apply in work environments, the language we use, can help teams conquer complexity. James Shore wrote succinctly on this topic in The Art of Agile Development. Paraphrased below:
Speak the language of domain experts to avoid miscommunication, delays, and errors. To avoid mental translation between domain language and code, design software to use the language of the domain. Reflect in code how users of the software think and speak about their work. The ubiquitous language is a living language. Users and domain experts influence design and code, which in turn influences users and domain experts. Discrepancies are an opportunity for conversation and joint discovery.
This article is the story of such a discovery.
I met with my colleague, Ryan, to discover his experience with Buggy Definitions (his term) in his environment. Ryan is a Scrum Master currently working with Agile teams in a multi-national insurance enterprise.
Me: I know you have recently worked through a great challenge with your team. You’ve coined the term ‘Buggy Definitions’ – can you describe the circumstance? How did you know there was a problem?
Ryan: The problem arose when reporting to one of our key business stakeholders. We were asked to report on “bugs” each Sprint. I was at odds with this. Flaws at a Sprint level (problems found and solved within a Sprint) are part of the work and reporting these details could give the wrong impression if context isn’t understood. Through discussion, we learned that our stakeholders weren’t concerned about the problems we’d find and solve within a Sprint — they were only concerned about quality gaps in the product. Terms like ‘bugs’ and ‘defects’ were being thrown around interchangeably and it was creating a lot of confusion. If we were confused, it would stand to reason that our stakeholders were too! We didn’t have a shared understanding of what a bug is.
Me: As Scrum Master, what did you do to help your team?
Ryan: Well, not all deficiencies are equal. A few helpful terms are needed to name and tame the complexity of product development.
Me: I understand. Let’s take ‘bug’ for example: given that your team has grown more careful with their word choice, how do they define ‘bug’ now and what do they do when they find one?
Ryan: The team settled on this definition.
- A problem found in production that prevents software from functioning as we encoded it.
- How do we handle it?
- Study it immediately!
- Document the undesirable behavior.
- Bugs are added to the Product Backlog and prioritized in consultation with the Product Owner. If it’s critical, we may address it right away and it is understood this will displace other Sprint work.
- Size it (or decide a suitable timebox for a Spike) and treat it like Sprint work.
- Why was it important to define it?
- Bugs are flaws found in pilot or production that prevent a product from working and cannot be labelled ‘Defect’ or ‘Debt’.
‘Bug’ – In Depth
In digital product delivery, the term ‘bug’ is often used to describe every issue or flaw. Broad use of the term is a problem. But the pattern starts naturally and understandably. As a team works through technical difficulty and, not thinking too much about it, they apply the term to everything from minor typos to major malfunctions. When a function produces an unpredictable result, when a test fails, even when implementation of a requirement is deliberately deferred for perfectly good reason, all these issues may be reported as bugs.
Because they work closely together, individuals within the team can classify and distinguish on-the-fly. They easily use one word for a variety of cases. And the problem goes unnoticed.
The trouble arises as stakeholders gradually become anxious about the quality of the software. Their concern is inevitable! After all, the word ‘bug’ is so frequent in conversation, what else are they to think? Even if the quality of software is high, frequent and broad use of the word ‘bug’ will cause confusion and misinformation.
To begin taming this problem: what is a bug, exactly?
The most famous ‘bug’ is on display at the Smithsonian. In 1946, the term was used by computer pioneer Grace Hopper who documented a malfunction in a Mark II computer at Harvard’s Computation Laboratory. Operators of the Mark II traced the problem to a moth trapped in a relay (literally, a bug). The moth was carefully removed and taped to their logbook which is preserved by the National Museum of American History in Washington DC.
This account is not the first use of the term ‘bug’ to describe technical glitch or misbehavior, but it certainly illustrates that the word ‘bug’ is most applicable when a glitch occurs not at the hands of human error. Here are three examples:
Artificial Intelligence often produces undesirable, unpredictable results. Of course, those misbehaviours are part of the training process, but they may be as mysterious and uncontrollable as Ms. Hopper’s moth-infected Mark II.
Imagine the experience using an app on a mobile phone with intermittent connectivity: latency is unpredictable; synchronous processes may never complete; data cannot reliably be transmitted. The result is the users may describe the app as “buggy”. These complications are not the result of human error – rather, the software is being used in conditions so unreliable that all contemporary engineering practices fail in unpredictable ways.
The calculation of large numbers, such as prime numbers or pi, require computer hardware which push the limits of industrial design. The challenge is not “How shall we calculate very large numbers and verify their primacy?” The math is not a mystery. The challenge is instead, “How can we build a computer with plastics and precious metals that won’t melt before our math formula concludes?” Heat dissipation, electrical current, melting points of various plastics and metals – these factors are complex and can cause software to behave “buggy” in extreme use cases.
These examples each have (or will have) solutions as we make technological advancements in the field of software design and computer science. But each example illustrates how our systems sometimes behave as though a moth has fused itself to a circuit: unpredictable; impossible/hard to reproduce; impossible/difficult to overcome. We ought to reserve the word ‘bug’ for these conditions.
It is amazing how few bugs (of that sort) we encounter. Modern systems are designed to overcome tremendous strain. For example, data centres around the globe operate without incident at remarkable rates. Truly, one of many unsung human achievements!
I checked in recently with Ryan to hear more about language usage in the team.
Me: Ryan, what other terms were troublesome?
Me: How did you help the team and stakeholders through this?
Ryan: I started by looking at various terms used in the industry. I discovered a great article by Katy Sherman with clear definitions for distinct types of product flaws. You may disagree with the actual definitions and may want to use different terms for your team – these terms work for our team. In addition to ‘bug’, our team has established new agreements around the terms ‘defect’, ‘escaped defect’, and ‘technical debt’.
Ryan then described each term as follows.
‘Defect’ – In Depth
- Ryan’s team’s definition?
- A flaw found mid-Sprint while testing functionality that prevents a Product Backlog Item from being released.
- How they handle it?
- Fix it! Immediately.
- Minimum documentation is required.
- No sizing (the work to fix it is already reflected in the Product Backlog Item’s estimate).
- Why it was important to them to define this?
- Ryan explains, “Part of the value proposition of Agile & Scrum is that we own quality assurance within the development team. We are building brand new things each Sprint and we may not get it right the first time. When a defect is discovered while testing, we can address it immediately. With this common definition, we know as a team that defects are an expected part of the process and we know how and when to address them.”
As Ryan’s team explained their terminology to me, I observe two sub-types of ‘defect’.
One category of defects is where faulty code leads to undesirable behaviour. And please note: faulty code does not imply human fault; rather, human fallibility – making mistakes is often the best strategy for learning.
The 2nd category of defects is where code handles some but not all possible conditions. In these cases, the code is not faulty; rather, it’s incomplete – a requirement was missed. Requirements are missed either by mistake or by the limits of current knowledge/information.
‘Escaped Defect’ – In Depth
- Ryan’s team’s definition?
- A ‘defect’ which escaped noticed until after the team had thought it was ‘done’ (e.g. discovered after the Sprint or discovered in production).
- How they handle it?
- Analyze it! (Just enough to Prioritize It!)
- Label as ‘Escaped Defect’.
- Perform root cause analysis.
- Address root cause through quality, communication, technology, etc.
- Why it was important to define this?
- Ryan explains, “These types of flaws are nasty. Ideally, we want to catch these flaws within the Sprint but for some reason we missed it. When flaws escape notice during development and testing, we may have problems not only with our quality but with our workflow. Escaped defects are usually found by end users when we miss a test or when we inadvertently break something (like a regression).”
When defects escape notice, it’s often evidence of a knowledge gap or a procedural flaw. These quality problems are excellent seeds for retrospective discussions. Why was this missed? What does this tell us about our process – what might we have done differently to have caught this? Does this reveal a skills deficit in our team? Have we learned something that will help us catch similar issues in future?
‘Technical Debt’ – In Depth
- Ryan’s team’s Definition?
- A choice to implement something we believe to be satisfactory in the near term that we know will require rework. (e.g. A loan to buy a house is a debt we accept consciously, knowing it will need to be repaid.)
- How they handle it?
- Negotiate/collaborate with Product Owner.
- Report ‘code health’ to stakeholders.
- “Leave the code better than you found it.”
- Why it was important to them to define this?
- Ryan explains, “Our stakeholders needed to understand that some deficiencies in our code are deliberate, strategic, and necessary to produce timely business outcomes; and, other deficiencies are the result of stakeholder pressure to go (artificially) fast.”
Unlike other quality problems, stakeholders play an important role in the creation and/or repayment of technical debt. Accountability is expropriated if technical debt is called ‘bugs’ or ‘defects’.
When I met Ryan, I observed that his team was using term ‘technical debt’ almost interchangeably with ‘bug’ and ‘defect’. It seemed to me the word ‘bug’ had caused unfortunate expectations among stakeholders who, reasonably, were questioning the quality of the software. The team, I feel, began using the term ‘technical debt’ as an attempt to reset those expectations. As if to say to their stakeholders, “we all know there are quality problems, we all had a hand in this, and we must commit to fixing things”.
The term, ‘technical debt’, has utility. It serves well to help the team buy some time to fix/rework things. But like many, Ryan's stakeholders still didn’t understand how the term differs from ‘bug’, ‘defect’, ‘glitch’, or others.
The history of the phrase can help us understand it better. The term ‘technical debt’ is widely attributed to Ward Cunningham, facilitator of one of the internet’s oldest Wikis (where you will this page on the topic of Technical Debt). Ward explains “shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with refactoring.”
Notice, his statement acknowledges that benefit can be realized early by taking on small amounts of debt. Real-life examples of this are everywhere: a small loan to buy a car; shipping ‘beta’ software to attract early adopters; staying up late to watch a movie. In each case, some amount of benefit is borrowed from the future, and until the loan is paid or a good night’s sleep, there is interest to pay.
Ward warned, however, “the danger occurs when the debt is not repaid. Every minute spent on code that is not quite right…counts as interest on that debt.” (An excellent essay about ‘Technical Debt’ including input from Ward Cunningham is online at AgileAlliance.org.)
So long as a team and their stakeholders are open with each other about the costs and benefits of their technical debt, stakeholders can understand that their teams are not building buggy code but are responsibly taking calculated risks. Why have we taken on this debt? Do we know the interest rate? Are we borrowing responsibly?
Me: Ryan, how’s the team doing now?
Martin Fowler’s Tech Debt Quadrants. (See original blog article.)
Ryan: It is amazing, the power of simple definitions. Our team now speaks more carefully when it comes to flaws. We have these definitions up on the wall, poster format, and we reference them when we get stuck or when misunderstandings occur. We have also found the “Technical Debt Quadrants” model published by Martin Fowler helpful – the model helps us negotiate with our stakeholders and minimize reckless debt loads:
Me: And beyond the team, have you observed any change in the stakeholder community?
Ryan: Yes. We now know what our stakeholders want to see in terms of reporting, and we are speaking the same language. Having our definitions displayed in poster format also helps whenever we get a bit lost in our discussions.
Me: Last question, can you share any advice with the readers?
Ryan: Do you find yourself challenged by ambiguous terms? Consider naming and taming the problems so everyone is speaking the same language.
Unclear language leads to incomplete transparency and ambiguity. The careful use of words like ‘bug’ will improve decision-making in your organization.
While I was a Product Owner with Orium (formely Myplanet) in Toronto, we discovered that each of these terms [‘bug’, ‘escaped defect’, ‘technical debt’] could be used to clarify the balance of accountability and liability between us and our clients. Our team adopted a policy of “zero defects” — which meant we promised our clients we would never deliver defective software. While this may sound insane to many, this promise to our clients helped us then develop explicit policies with respect to escaped defects, technical debt, and bugs.
An escaped defect would be a breach of our “zero defects” policy; and so, we would remedy the situation at no cost to the client. No questions asked; no further negotiation required. “Our mistake, we’ll correct it as soon as possible.”
Technical debt was different. We did our best to be transparent with our client, both with respect to code quality and their influence. “[Client], you're pushing pretty hard toward a deadline; we’ve been clear with you we cannot release [x features] by that date without cutting either scope or quality. We advise you to cut scope or move the date; but, the team has creatively devised a way to derive the benefits of [x] early; so long as you know there’s a hidden tax that must be paid asap. We can go artificially fast for a while, but we’ll have to go slow later to backfill the gaps in quality.” In other words, we were clear with the client that they were borrowing from their future and they were accountable to pay the relevant interest.
Bugs were extremely rare, though we certainly found a few. (Or they found us.) Our clients learned that we would pay all costs to correct our escaped defects; so, when we would explain that a particular problem was not a defect (and the sort of problem no human could have avoided) they trusted our assessment and would understand that solving the problem (if desirable) was their cost to bear. “This is truly a bug. It’s nobody’s fault. Complex work is unpredictable and we are tackling problems nobody has faced before in all of human history. If solving this problem has business value, then we’re ready to begin as soon as you make a commitment to finance the work.”
When we applied discipline (and policy, in the examples above) to these terms, we achieved more clarity and clearer accountabilities between the teams and their stakeholders.
This article also appears at Scrum.org.