Practicality Beats Purity - Intro and Test Pass Rates Topic

2018-02-17 engineering testing Cristian Medina

A few hours later, I find myself sitting in the “comforts” of my cubicle. The discussion replaying over and over in my head: “An interface with this behavior will integrate with most common language libraries, with no special client code”, I said. The response was: “But then it’s not a design, and the company already decided that’s the route we’re taking.”

I’ve spent many years of my career involved in buzzword dogma discussions. It’s present at all levels of software development, from basic principles, to scheduling, to implementation, its interfaces, its tests, the execution, the infrastructure that runs it and its release mechanisms. Most of the time, people lose track of why or what they are building in favor of claiming they are using some common buzzword, regardless of the effects on architecture, ease of use, customer experience or maintenance costs. My experience shows they don’t even know why the buzzword technology does things a certain way or why someone chose it in the first place. Factual or data-based counterargument results in an almost “religious” discussion and even shaming.

Given today’s ease of communication and the ability to share our experiences, it’s great that we try to educate other folks on the problems we typically face throughout our lives and careers. Especially the principles used in managing their solutions.

Unfortunately, some of those principles often become commercialized. This incentivizes sharing them as absolutes, even though it’s not the case: “if you do X then Y will always / never happen”. In reality, every situation is different and we can’t blindly follow these patterns. You have to consider their assumptions, environment, end goals, restrictions, etc. This is the basis of engineering.

Engineering is the application of scientific principles and practical knowledge to invent, innovate, design, build, maintain, research, and improve structures, tools, processes and technologies, while still satisfying a set of time and resource constraints.

I find that the Zen of Python sums that up pretty well in one of sentence: “Practicality beats Purity”.

A great example is in Raymond Hettinger’s PyCon 2017 talk on the history of dictionary optimizations. During implementation, many argued that the changes broke with floating point number specifications. But Raymond makes the point that it can be ok to give up what affects a small number of cases, in favor of large improvements for the majority. This varies between codebases and goals, but it’s a real sticking point of mine. Put together the solution that best solves your problems, not the one that solved someone else’s problems.

I’m putting together a series of articles, each with a different focus, to discuss tradeoffs when making technology and design choices. This will span a few months of posts, so I’m open to suggestions on any specific topics you want covered.

Pass Rates and Test Execution

As I’ve bounced around many organizations in my career, some in completely different technology fields, this theme seems to keep popping up. There’s always an “official” goal written down, and referenced by management, that says our testing is complete once we reach a 90%+ pass rate. Having spent much time in some form of a test organization, I’ve never quite understood where this number comes from.

I don’t particularly have a problem with the number. At first glance, it’s actually not a bad idea, you’re saying that we don’t need to be perfect, but we want to strive for an excellent outcome. However, as any statistician (or politician for that matter) will tell you, percentages can be deceiving. Let’s look into this a little bit with some examples.

When I was in college I took a course on Engineering Statistics. In it, I learned quite a few interesting things about the application of statistics in real life situations. It wasn’t just about calculating net-present-value of investment dollars, but went further into the areas of mean-time-to-failure and warranty costs. Pairing that knowledge with a decent class in engineering ethics, you realize that even if there’s a 0.000000001% chance of a system failure, if we qualify that failure by something that occurs in the seal of a fuel tank in a rocket engine, you will wind up with a horrible disaster in your hands.

The point is that test execution results need qualifiers to describe the types of failures, the systems in which they occurred, the sample size (how many times the tests were executed), plus some analysis that identifies all the possible subsystems that may be responsible.

It’s easy to write a test plan that executes 1000 individual test cases, all of which verify a particular software’s functionality, including its graphical user interface and its extended functions. Now, let’s say that only 1 of those cases is to test the installation of the application and its repeated 3 times in 3 different operating systems. If one of those tests failed execution, what do the metrics say: Ran 1000 tests, 1 failed, that’s a 99.9% pass rate. Woot! We’re done! Ship it! Oh wait, but it turns out that we cannot install it in 1 out of 3 operating systems, so installation will fail 33% of the time? Maybe that’s not too bad, I mean it works in the majority of cases, right? Let’s find out more: How many customers are using that OS? Oh, 80% of our customers use that OS! We just delivered a lemon that works great in 20% of our customer installs with 99.9% tests passing! The other 80% of our customers can’t even install it. In this particular case, qualifying the results with the subsystems verified by your test plan would’ve generated enough doubt to warrant further investigation that may have changed the decision to ship.

Another interesting example along the same lines is bad path testing, or error injection as I’ve come to call it. What could fail? What happens when something fails? How do you recover? What are the most common failures? Usually, people completely forget to do this type of analysis on large systems, and when they do, these situations occur in a small percentage of the total number of possible tests, so they are lost in the crowd. It’s hard for folks to wrap their heads around it, until you’re on the phone with a customer who runs into a problem 100% of the time that there’s a failure. Now what?

Summarizing

Generalizations and abstractions, while practical, cannot provide us with much information to make decisions unless we base them on sound engineering. Don’t blindly follow the herd because you read in some book or online blog somewhere (mine included) that this is best practice. Always be suspicious of philosophies with principles written in “absolutes”, usually they have a set of prerequisites that you will rarely meet.

As an engineer, you own the solutions. Treat these design patterns and techniques as systems that aid you in your analysis. It’s not acceptable to “blame” the methodology for your problems, it’s on you to adapt it to your particular situation.

practicality