Dangers of the Habit of Ignoring Developer Error Messages

Being aware of and mitigating the risks of Habituation in System Maintenance and Design

Jefferey Cave
15 min readMar 14, 2023

I am going to give multiple examples from my career, demonstrating the dangers of error message fatigue and habituation leading to ignoring vital signals. Further, they will show how easy it is for humans to fall prey to habituation. Finally, I will conclude with specific techniques and modern tools that can be used to reduce the frequency of its occurrence.

I have three cases rolling around in my head, one from 2004, one from 2012, and one from about three months ago; so it seems I’m doomed to relearn this lesson about once a decade.

As the newly promoted Principle Developer at my first company, I inherited a product that was successful but suffering from growing pains. The previous Lead had been a graphic designer and had done a wonderful job of building a usable and popular interface, but some of the more engineering aspects had been forgotten along the way. This was reaching a point it was hampering product growth.

I brought a new perspective focusing on quality and reproducibility, turning the tool from a “website” into an administrative “product”. This change in focus between us meant I had years of engineering neglect to catch up on.

A few months later, it wasn’t a surprise to me when one day the owner of the company stormed into the development department angrily and started to rant about a serious defect where data was being dropped. It was a real edge case, and the scenario he was talking about was something I had seen myself, but only rarely and never reproducible. Focused on known issues, and bringing the system into a state capable of expanding, I had easily written it off as a ghost in the machine, thrown it on the bug list and pushed it way down. Unfortunately, a customer had seen it this time and was able to reproduce the issue reliably enough that it had become an embarrassment to the owner, so now it was at the top of the priorities list and became the focus of my exploration.

44 thousand messages. I wonder if one is relevant (Original Content)

While I don’t remember the exact defect, what I do remember was following the defect into the Web Server’s logs (Windows Event Viewer). Previously, my focus had been on the system’s external user behaviour, but here in the logs were thousands upon thousands of lines of warnings … every minute. Warnings about uninitialised variables, unsafe typecasts, and … well a little bit of everything. I had never really given much thought to it because it was so overwhelming as to be meaningless, but when I managed to isolate an interaction with the form in question and filter the logs down to just that time frame, I was able to consistently see one warning that seemed to relate.

I’d found my needle in the haystack, but it wasn’t an error it was just a warning amounts thousands of warnings I had been ignoring. As I dug through the logs, I could see this message appeared regularly when that form was accessed (not always but regularly) and it went back to the very founding of the system (before I had even started my programming education, I was still a nurse). This error had been getting reported for almost a decade, and nobody had seen it for two reasons:

  1. it was a warning, not an error
  2. the signal had been lost in a sea of noise

I learned a valuable lesson that day. “No Warnings, No Errors” became my mantra.

Naturally, I first fixed the issue that had been spotted, but this defect had been signalled by a warning in the system logs that if someone had addressed we would have fixed it almost a decade before. So I started to address all the warnings in the logs.

Most were relatively benign, identifying (perhaps) that a variable had not been explicitly initialised before use, but since null was treated as a zero or empty string, it didn’t impact the behaviour of the system. But as I made minor corrections, and the log volume got reduced, some of the warnings started to take on more ominous tones. More system defects were identified (and corrected), and most significantly to me, an actual error started to present regularly … One that had been missed in the excessive volume in the logs.

So what was the core of the lesson?

Seeing a large volume of errors can make us insensitive to them. When we ignore significant messages, we train ourselves to not pay attention, and that’s when bad things happen.

(Wikimedia, CC)

The term “habituation” is used in several related contexts in the medical, social, and psychological contexts, but the general summarizing description would be the loss of recognition of negative stimuli due to repeated (habitual) exposure to the stimuli [Wikipedia].

We see this all around us and in our day-to-day lives. People get habituated to getting yelled at by a peer, becoming numb to the exposure. Physically, a carpenter may become desensitized to getting slivers, simply pulling them out at the end of the day rather than immediately flinching. I like a hot shower, but it usually takes a moment for my skin to get used to the hot water. It has even been documented in plants; the Mimosa Pudica is known to “flinch” when touched but will stop flinching with repeated touching.

This helps us get on with life.

Flinching is an important reflexive reaction to protect us from bad things happening. Stubbing your toe should produce an immediate “protect your toe” response, getting an unexpected cut on your hand is dangerous, and I should jerk my hand back from scalding water; but sometimes the cut is minor and expected (slivers) and for the most part just part of the job … life has to go on. A hot shower is a big temperature change, but it isn’t harmful and is rather pleasant, once I get used to it. The process of habituation allows us to maintain our high-alert state, while at the same time learning to moderate it under various conditions.

Humans are biologically queued to become habituated. It is part of our survival strategy as a species. It’s built into you.

Therefore you cannot ignore the risk habituation poses to our systems.

Receiving an error signal (error messages, warnings, failing tests) regularly, evaluating it as being “safe to ignore”, and not taking action, psychologically prepares you to ignore the signal at a later date. It begins to habituate you to the error signal, perhaps placing it on a pile of things we can ignore.

Ignore at your peril (Wikimedia, CC)

Common Examples of Programmer Error Habituation

At every organisation I have ever worked at, in every role I have filled, I have found examples of error habituation. They do not always present the same way, but they are pervasive throughout the industry, even presenting themselves as “Best Practices” to the untrained eye.

Errors and Warnings

This is the most obvious since it's right in the name, but that makes it a good place to start.

  • Compiler warnings
  • System Log Warnings
  • Pager notifications

This was my first exposure to this. We learn through practice at school that compiler errors prevent us from submitting our assignments, but warnings do not. With the short intensity of student life, ignoring warnings becomes a habitual survival strategy. As we become mature professionals, we learn that each of these messages was put in place to convey meaning to us, and offer us protection.

Known Software Defects

Defects in software are discovered, and discovering and correcting them is the art form. In the words of Robert Glass:

43. Maintenance is a solution, not a problem

Facts and Fallacies of Software Engineering

It is important to realise that defects are to be expected and must be worked on. At the same time, there is always more work than time so some form of prioritisation is necessary. This means we must ignore them for a while (even if it is just the time it takes to fix them).

The problem is that the more defects we acknowledge are present, the more we tend to ignore them as irrelevant. The more we defer fixing bugs, the more we get into the habit of deferring bug fixes.

TODO and Change Comments

TODO comments within code were a way to identify an item that needs to be addressed; sort of a “come back to this later”.

There is a strong likelihood that we are ignoring the problem because we are busy with something else. Certainly, it is not possible to split ourselves into two to address both problems simultaneously, so making a note of the secondary problem while we address the primary one just makes sense.

The problem arises when we don’t come back to it.

Accumulating TODO notes through code can become excessive noise to the point we start to ignore the message. Further, as these are usually listed along with the warnings and errors, they represent noise that drowns out more important signals.

Do not become habituated to seeing useless comments.

A header in an individual file containing a list of every change ever made to the file is a common pattern that has become an anti-pattern. The purpose of these comments is to offer a log of changes that have been made to the code in their contextual place.

Unfortunately, this (good) habit was started decades ago with coding styles that were different. The pattern is based on the assumption that a single file is self-contained to all its changes, and that it does not interact with other entities (since recognised as a faulty assumption). There is also the problem that decades of messages accumulating at the start of the file means there is an impenetrable WALL-OF-TEXT that must be scrolled past before anything meaningful can begin. This immediate “scroll past” habituates us to perceive large blocks of comments as meaningless, when in fact we should consider large explanations in the code something that is important and meaningful.

What started as a good idea on small files, over short time-frames, has evolved into a bad idea with better alternatives.

Mimosa Pudica (Wikimedia, Public Domain)

Relearning the Lesson (Twice)

A decade later I found myself on contract with a major corporation that had terminated their previous contracting company due to poor quality performance. My team had been hired to not only deliver, but to do it with an eye to quality.

On my first day reading opening the regression test suite, I naturally glanced at the warning list to see how many warnings were listed in the code. I immediately found myself staring at a list of hundreds of warnings, but also thousands of TODO messages. Naturally, I tried to ignore them … they were things that needed to be done in the future, not immediately … but as I cleared the significant backlog of “warnings” I started to come across the actual locations that the TODOs were in.

It was horrifying.

@Test
public void ReallyImportantThing() {
//TODO: implement this
Assert.assertTrue(true);
}

In many (most) of the cases, the note suggested a person should implement the test for real. (There is a similar story, I thought told by Spolsky, of a notorious function implementation in MS Office … the same thing)

In my case, I suspect the previous team, under pressure to perform and deliver, had been masking gaps for a long time. Many of the regression tests were simple stubs that returned a success no matter what. This allowed them to claim the job was done, while promising themselves they would fix it … later … when they had time. That time never came.

It was a hard conversation with the client to explain I was taking a week to re-evaluate how much testing was actually being performed. When I reduced their test count by more than half, I needed to remind them they had hired us specifically because they knew there had been a problem and identifying those problems and giving honest assessments is where our value came from.

TODOs got added to my list of things that were not permitted in code bases I was involved in.

No Warnings, No Errors

Another decade has passed, and recently (weeks), I had to catch myself again.

On a new system we are working on, I implemented a basic continuous monitoring alert system. It periodically scans the system for invalid states and immediately notifies the team of the bad state (OK, it notifies me and one other, and we notify the larger team … baby steps). The idea is that if an alert is issued we must act to save the system.

I ignored a message.

In this case, the alert was to notify us that we had stopped receiving signals from a remote source, and I had ignored it. As a batch process, it is not uncommon for the source process to take longer than anticipated, this isn’t a big deal since usually, it delivers shortly after we check and we just pick it up on the next pass.

Except it is a big deal because I ignored it.

My colleague, just returning from vacation, called me and asked if I had noticed that the system was erroring, she didn’t see a ticket and wondered if I was dealing with it. I told her it was “no big deal, that one fails regularly” … and as the words came out of my mouth I heard what I had just said.

Sure enough, we looked closer, and the failure had occurred for three cycles; the source was not transmitting data and I, through habituation, had ignored the failure.

I hope I remember to come back and share my error habituation story in 2033.

See you in 2033 (Wikimedia, CC-0)

Preventing Habituation

There is really only one solution to preventing error habituation: address every error or warning or notice and treat it immediately, and with the highest priority.

While I say this there are some subtleties to how we achieve this:

  1. Always fix defects before implementing new features
  2. Never ignore a defect message

There are various tools at our disposal to help us address this and various mentalities.

(Wikimedia, Public Domain)

Errors that can be ignored

There is no such thing.

  1. the system is in an invalid state and needs to be fixed immediately, or
  2. the error notification system is flawed and needs to be fixed immediately

Incorrect notifications could be a log monitor that puts an alert out when an invalid state is encountered. Upon inspection, the state is determined to be “undesirable” (not “invalid”) … we’ll ignore the error, it will correct itself later.

NO! Change the log monitor to take into account the new information. Maybe it needs to be run less frequently, maybe it needs to count how long the error state exists (waiting before alerting), but whatever gave you a reason to think it can be ignored, needs to be incorporated into the official rules for alerting.

Failing Tests

As previously mentioned, you can’t be in two places working on two problems at the same time. One problem must be set aside while you focus on the other. Unfortunately, this leads to ignoring errors which becomes habitual.

To avoid this, the first step is to immediately create a task in your backlog, this immediately gives us a record that the issue exists. Secondly, we should immediately generate an automated test that can give us a way to reproduce the error. The problem here is that the test will be in a failing state, constantly reporting an error to us. This is a failure signal that we want to ignore (probably using a SKIP) until we get the defect fixed, we immediately mark it as skip, an ignore status.

This is a problem.

We can resolve this by having a team rule that all SKIP tests MUST have a ticket number associated with them, and also addressing every skip during every planning meeting. For me, this often takes the form of reporting skips without ticket numbers as fails, and fails must be addressed immediately. Skips with a ticket link directly to their ticket in their reporting.

Lastly, there MAY be a zero defects policy within the team. This is an agreement with the business that defects will be fixed before new features are implemented.

🗒 NOTE

I have never been satisfied with the reports generated by testing systems, and have always written custom visualisations to track defects. This has had the side effect of my introducing concepts like adding extra states to TestNG’s default reporting (known, manual, feature), with active links to the repository and issue tracking software. I should really write an article showing my collection of testing reports … or show how to use various Project Management software reports (ADO, GitLab, GitHub).

A note to my boss … don’t worry, these always get worked on in my own time ;)

Comments Calling for Action

TODO comments were a classic way to express something you need to come back to and finish something off, and they still have their place, but fundamentally, they are a call to ignore the problem (but only “for now”).

The problem here is the same as with all the others, we need a way to prevent “for now” from becoming “forever”.

One simple way of handling this is to put a Version Control hook in your repository that prevents check-ins of TODO comments. Generally, you should only put this on protected branches. This allows you to put them in your code to allow you to continue working but prevents you from submitting it to the official branch by accident. Forcing you to finish the job you planned on doing. If you can’t get to a TODO for some reason, don’t leave it in the code, register it in the backlog as something that still needs doing. This leaves the alert list available for things like warnings and errors so they don’t get hidden.

Those massive headers at the beginning of the code only work to mask issues. They get in the way of text searches and require a lot of visual space to scroll past. All that is for something that is based on an old paradigm: changes are constrained to a single file.

Modern VCS tools assume that a change may require the context of several locations in code to be meaningful, and have logging built into them. Keep your changes in the Change System database, reducing the visual noise by placing the changes in a contextual list that is hidden until you need it. When you need it, the list is optimally indexed for what it is.

(Wikimedia, Public Domain)

Conclusion

As humans we make mistakes. Each decision we make is a totally new decision that we must make injecting the opportunity for error, this opportunity for error can be compounded by biases introduced from our experience. Habituation of errors represents a biasing of our behaviour that we are biologically predisposed toward, and can be dangerous to our work.

It is important, as professionals, that we work to overcome these dangerous biases through constant diligence and self-appraisal.

As software developers, our work captures decision-making in advance of the stimulus and action, and defects can have catastrophic effects. Teaching ourselves to ignore benign errors can mask more catastrophic issues that have significant effects on people's lives:

Further Reading

I hope I’ve made the case that it is easy to teach ourselves to ignore errors because, at the end of the day, we are humans and humans are fallible. Addressing this is hard, but not new

  • Facts and Fallacies of Software Engineering (Robert L. Glass) is a great read that opened my eyes to how common these issues are and how we all want things to be true, even when they aren’t
  • Downfall: The Case against Boeing (Netflix) discusses an important event in computing history. Remember that in 1969 Software saved an aircraft with a bad attitude sensor while in 2018 Software killed 318 people due to a bad attitude sensor.
  • Any video on Aircraft crash investigations (my wife is a fan). Observe how it requires multiple people to ignore warning signs for a long period of time for a problem to occur. Note how easy it is for dangerous behaviour to become habituated.

Consider reading the manuals of your favourite tool suite to get a better understanding of why the software was developed, how it is meant to help you, and how it can replace some practices you may have thought were a good idea

  • Test Suites can help you identify errors methodically
  • Project Management tools can help prioritise and track outstanding issues
  • Version Control Systems can help to understand why historic changes were made, offering a significant amount of context when you need it. I actually recommend reading SVN’s manual as it brought a significant change in paradigm, at the time it was introduced, that needed to be explained (Use Git, read about SVN)

… and always pay attention to your own emotions and biases… your own mistakes are always available for you to learn from.

(Wikimedia, CC-BY)

--

--

Jefferey Cave

I’m interested in the beauty of data and complex systems. I use story telling to help others see that beauty. https://www.buymeacoffee.com/jeffereycave