How I Became a Test Coverage Believer

Back when I was in university in the early 2000s, I took a software engineering class. This class taught us all the latest essentials of building software: OOP design patterns, extreme programming, UML diagrams, and, of course, adding test coverage. The class culminated in a big group project that took half the semester, theoretically simulating what it’s like to work on a large project in a company. We were required to write a bunch of JUnit tests as part of the assignment, and I remember thinking that it was the stupidest thing I’d ever heard. I’m already writing code and I can see that it works, why do I need to now waste a bunch of time writing more code to test that code? Is this just some dumb thing corporations make you do because bureaucracy? I have better things to do, I thought, and proceeded to not write any tests until the day before the assignment was due, at which point I filled in a bunch of JUnit tests with nonsense just so that I wouldn’t get points taken off.

Flash forward a few years to when I had my first job, and I discovered pretty quickly why having tests is nice. The codebase I was working on didn’t have many tests, and it broke constantly. At least once a week we’d push out some code, watch our email flood full of error messages from users hitting some broken codepath we hadn’t noticed in code review, followed by us frantically rolling back the deploy. Our users essentially became our automated tests. Releasing code like this was a bit stressful (but exciting!), and it was embarrassing when something broke, but usually we’d get the bugs fixed within a few hours and then push out the fixed build and things would be fine. It felt bad for users to see broken pages for a few minutes, but it wasn’t the end of the world, and nobody died, and, anyway, move fast and break things, right? No tests == moving even faster!

Then, one Friday afternoon at 5pm we pushed out a big refactor (as is best practice on Fridays at 5pm) and discovered that sometimes there’s broken code that you can’t just rollback, fix, and push out again, followed by everyone high-fiving in slow-motion. Sometimes 30 minutes of broken code being live means sending out thousands of apology emails and canceling your Friday plans to restore corrupted rows in your database while angry users scream at you in the help center.

Basically we had accidentally switched an if-statement during the refactor and all the code that was supposed to run on sign-up was instead running on every action except sign-up. That meant a bunch of fields in the DB were reset to their default values, and welcome emails were sent out. As this happened on every action, some users received hundreds of welcome emails within a few minutes despite not actually being new users. A bunch of users freaked out and thought their account had been hacked, because why else were they getting all these emails thanking them for signing up?

None of this threw any errors. It was all perfectly valid code - it just had some flipped logic. As a result, it took us awhile before we even realized anything was wrong. By the time we did realize that something was wrong from some confused users writing in, it was all too late. We spent the rest of the night with the site down frantically trying to figure out how to undo all the damage we’d done. The worst part was the realization that even an extremely basic test on the sign-up flow would have caught this bug and saved us and our users a lot of pain.

At this point, I thought back to that Software Engineering class in university. Oh, yeah, maybe tests are kind of nice after all. Over the next 6 months or so I made it a priority to get decent test coverage on the codebase. It was painful at first, but after writing the first few tests it got easier and easier as we gradually built up a set of testing helpers and examples. Soon we started asking for tests before merging pull requests. Before long, an amazing thing happened - broken code deploys that needed to be rolled back gradually disappeared altogether. If tests passed, we could be confident that the deploy didn’t break anything.

In the end, test coverage allowed us to actually move faster, because we didn’t need to live in constant terror that we’d broken something. It made me realize that I’d been coding in fear before, always tiptoeing through code changes to make sure there’s not some Indiana Jones booby trap lying behind every line of code. Adding test coverage made it so that I could sprint through code changes, and know that the tests have my back. I didn’t need to spend lots of time manually testing parts of the app to make sure they still worked, then inevitably forgetting something, breaking the app in production, shamefully rolling back, fixing it, pushing it out again, and apologizing to everyone. Now, when I work on a code base that doesn’t have decent test coverage I find it really scary. I don’t want to go back to those dark days where every code push feels like a game of Russian roulette.

I don’t believe that you necessarily need 100% test coverage of every line of code in your app, or that you need to practice TDD religiously, or that you should have unit tests that mock every possible thing testing every possible input. But, you should have tests on all the main functionality of your app at a level that makes sense. If unit tests make the most sense to test a new change, go for it. If it’s easier to test as a functional test that sets up some data in a database, wonderful. If you’d rather write the test after the code, that’s fine too. But everything should be tested somehow. And if you fix a bug, add a test to make sure it doesn’t come back. Plus, it helps code reviewers feel confident the code does what it’s supposed to. If you’re currently working without tests and feel like it’s a waste of time - trust me, it’s not. And once you get in the habit of writing tests with your code you’ll find that you don’t want to go back.

Happy testing!