Bugs In The Tests

Tests can have bugs too

[from CodeComplete, chapter 25 (quoted here on CompleteCoverageIsExpensive)...]

That's because people are really lax when writing tests, and tests are boring, and tests aren't normally elegant. Some test harnesses are so complicated, they need UnitTests for themselves. So, you can't even be sure your tests are correct, let alone whether your implementation is correct.

This is somewhat academic. In XP we use unit TestsAsScaffolding. This is different from trying to find all bugs, so you end up with a different set of tests.

There are several perspectives on this:
If the test prints an error message, but you find the application code is correct, then you fix the bug in the test.

That happens to me pretty often, but it doesn't bother me; I just fix the test code and go on. (I fixed a bug; it just wasn't in the place I'd expect it to be, if I thought my tests were perfect.) -- JeffGrigg

This must be taken into consideration too: Tests have to be simple!

McConnell, Steve. Code Complete. Redmond, WA: Microsoft Press. p.612 (1993); McConnell in turn cites the studies in

Weiland, Richard J. The Programmer's Craft: Program Construction Computer Architecture, and Data Management. Reston, VA: Reston Publishing. (1983)

Jones, Capers. Programming Productivity. New York: McGraw-Hill. (1986)

I'm confused. Explain how a test can be wrong, yet not fail, if the code isn't wrong also. Do you perhaps mean that the tests are inadequate?

If there is an error in how the test is written, it may falsely "all OK" when should give an error message. Say, a misunderstanding of the interface that suggests a function runs non-zero on success when it really returns the number of characters transmitted. Thus, if you are trying to verify that the function "fails" when the socket is closed, the function may correctly report that 0 characters were sent without flagging the error (which may come through another interface). That's a constructed example, but the point is that it is not logically impossible nor out of my experience that happens.

It is theoretically possible that both the test and the code will be defective, in such a manner that running the test will signal "all OK". The question is, how likely is such a situation? I don't think it's very likely.

Disagree. If the code is constructed to make a faulty test pass, then it will naturally have a built-in defect exactly compensating for the error in the test. This is a weak point in any write tests first methodology. There is help available: keep test code simple; test redundantly via different algorithms; check that tests map properly into requirements via reviews of some sort; when you find a faulty test, treat it as a bug and write a new test for the code that will print an error if the bug ever comes back.

There is the more probable eventuality that the test doesn't test enough of the functionality of the code being tested; the test is correct, but incomplete. Therein, perhaps, lies the reason why XP advocates that one TestEverythingThatCouldPossiblyBreak. -- LaurentBossavit

It's not likely, but it's happened to me. It's not likely on the 'is-the-string-the-right-length?' tests, but tests that have large complicated setups are more likely to have bugs.

My experience is that bugs in tests causing them to print an error at first run, even though the production code is correct, are quite often. But this takes seconds to find them and fix and gives me more confidence in my production code. There was exactly one test from more than 200 (currently, much more) in my previous project which succeeded while production code was wrong. -- PP

Just a reminder. In XP UnitTests, one starts with a failing test. This would seem to eliminate the likelihood of a false positive. Before I have written any of the application code, I know that my test will report failure. I then write the test code and verify it will also report success. [As an aside, this assumes the test and the code are deterministic. It is really up to the programmer to avoid causes of non-determinstic code, such as uninitialized variables.]

The scenario where a bug in the test masks a bug in the code is likely when there are bugs in the programmer's brain. I've written tests based on faulty understanding of some aspect of the domain or the technology, then written code that passed the tests. The test said the code worked, but the test and the code worked together to mask a misunderstanding. -- EricHodges

''That is why it is much better to DoBothUnitAndAcceptanceTests.

In most domains, the quest for certainty will make you (and ultimately, your customer) miserable. Even in trial for murder we do not ask for certainty. So do not ask that the tests give you certainty of having no bugs. Ask instead that they give you reasonable doubt that there are bugs.

When you ask for reasonable doubt, it's not as worrysome that your tests are sometimes wrong or incomplete.

As mentioned above, you know your tests are bogus when you need tests for your tests. At some point, you have to stop writing tests and rely on human inspection. The earlier you do this, the better. So, keep your tests simple. Actually, this is a good practice for normal code. The value of writing tests for simple code like

 public Enumeration children() { return children.elements(); }
is far, far lower than writing some for the networking kernel.

I think I agree with you, especially if you permit me to twist your words into this: "Tests are a re-runnable form of human inspection. The earlier you do this, the better."

Agreed. That's exactly it. Why use eyeballs when you can use code. That allows you to slack off whilst the other rubes are running manually regression tests. "Why aren't you testing, Sunir?" "What are you talking about? I hit F5."

Yes, but does BugsInTheTests matter that much?

Types of BugsInTheTests:

BugsInTheTests don't matter that much. -- RonJeffries

Erm, can you back that up RonJeffries? Plenty of comments on this page (and common sense) suggest such bugs matter a great deal.

Hmm. Until I started looking at test-first test suites, I'd always been convinced that any test suite with a certain degree of complexity had silent horrors - that I would be able to find them. The problem was that programmers who are accustomed to having testers check their work fall into a nasty trap when they build test scaffolding: no-one checks their work! So I think BugsInTheTests are a big deal, in general. But they may not matter much with TestDrivenDevelopment - assuming that the test harness is also built test-first. -- BretPettichord

We have discovered several tests that have been falsely succeeding.

Naturally, lack of coverage in the tests is also an issue for false positives.

Maybe I'm just an idiot, but it's not unlikely that complex tests for complex subsystems return true instead of false. It's only off by one, after all. The best solution, I've found, is to write multiple tests covering the same production code from different angles. That way, they should either all fail or all succeed. In this way, the tests essentially test each other. -- SunirShah

But since UnitTests need to have neither false negatives nor false positives, you could say that the burden for correctness for UnitTests is possibly even higher than in regular code. As Ron writes, this isn't so much a problem if you practice TestFirstProgramming, or at least when you first write the tests. But if you ever want to refactor them, you can't check against false positives without crippling the underlying code, which seems really slow and cumbersome.

See RefactoringTestCode.

Well, with VersionControl, you've still got the version of the target that didn't pass the tests the first time around.

This page is a little confusing to me. I think this page assumes that the reason you write UnitTests is to reduce the number of bugs in your code. That is probably true for a lot of people. But that is not the primary reason I use UnitTests.

My code still goes to the user with about the same number of bugs in it as before I started using XP. The reason I use UnitTests is to reduce the time I spend fixing bugs, which means I can code new features faster. Especially time spent fixing bugs that occur when I try to add a new feature to brittle code I or others wrote many months ago.

So my code can go to the user sooner than it did before I started using XP. If I started testing my UnitTests, I am guessing I would lose this advantage.

Software Engineering 101 says that out of the threesome of quality, schedule and budget, you can pick at most two to control. Budget is usually in the hands of someone higher up, so you get to pick between quality and schedule. In your case, you seem to prefer to control schedule, while others writing here may be working on quality. Should be no need for confusion on this point.

Here's an example for you. This is code from a real test suite. Can you find the silent horror or inadequate test? Actually, it contains two. One was from the original code, the second was introduced when the code was first extracted to make an example of it. Which only proves how easily these kinds of bugs can happen.

 int main()
   AddressBookEntry * abEntry = 0;
   THANDLE hDataFile; 

hDataFile = sp_OpenExcelFile(AB_DATA_FILE); if (Check(hDataFile, NULL)) { if (Check(sp_GetABDataFileEntry(hDataFile, "2.9", &abEntry), 0)) { switch (abEntry->Test.iSubStep) { case 1: sp_StartTest(TextBuffer, &abEntry->Test); test1(); sp_EndTest(&abEntry->Test); break; case 2: sp_StartTest(TextBuffer, &abEntry->Test); test2(); sp_EndTest(&abEntry->Test); break; case 3: sp_StartTest(TextBuffer, &abEntry->Test); test3(); sp_EndTest(&abEntry->Test); break; } } } sp_CloseExcelFile(hDataFile); return GetTestStatus(); }
No default in the switch statement ("silent horror"?), closing an Excel file that may not be open?

Why do you have 'if's without 'else's in your tests?

CategoryBug CategoryTesting

View edit of July 23, 2006 or FindPage with title or text search