Production Code Vs Unit Tests Ratio

The reciprocal of UnitTestToCodeRatio

Is the ratio of number of lines of production code to the number of lines of test code a useful metric?

Is a large value for "production / test" lines a bad sign?

Is a large value for "production / test" lines with 100% production code coverage a bad sign?

Is a low value for "production / test" lines a bad sign?

Assuming 100% production code coverage, would a small value for "production / test" lines indicate something good or bad about the tests? ...or the design of the production code?

Maybe some metrics would be helpful here. How large is your project? What's the "production lines of code /divided by/ test lines of code" ratio? Do you have 100% code coverage? Was the code written TestDrivenDevelopment?

One data point: I happen to have a small ant script that I happen to have written TestDrivenDevelopment style, using AntUnit: And I'm "cheating" on the tests too: So I'd guess that if I restricted each test to doing only one assert (which some think is a good practice) I think I'd end up with something around a 1:5 ratio of production to test code lines.

I kind of think that the range of "good ratios," if there is one, may depend on implementation language. For ApacheAnt, being a DomainSpecificLanguage gives it expressive power within it's domain. But DomainSpecificLanguages can be cumbersome and/or verbose outside their target domain, and being an XML ScriptingLanguage could tend to make that problem worse. -- JeffGrigg

A flavour of the code (design?) smell: as mentored by the TestDrivenDevelopment, optinal ratio of ProductionCode vs. UnitTests lines of code is about 1:1. If it deviates at more than 2 or 3 times it's probably a sign of some code/design problem.

Sadly this page has turned into a pedantic and pointless argument, made largely by someone who has chosen not sign his name. I think this discussion should be deleted, and replaced with something useful; the hypothesis about the optimal ratio of application to test code, and a discussion about that. Obviously this hypothesis cannot be applied to all software or methodologies; it clearly relates to the AgileProgramming, and only defines a CodeSmell (which is by definition *not* authoritative). It cannot define any absolute metric. -- IanBicking

Non-sense. The arguments to the contrary are not pointless, and not pedantic. If somebody's gonna define that by some essentially arbitrary criterion (indeed no argument whatsoever has been presented for the 1:1 ratio) large bodies of code can be seen to suffer from CodeSmell, the claim should either be defended by reasoned arguments, practical empirical evidents, etc, or deleted altogether.

It stands to reason that if there's any relevant criterion to judge/detect the adequacy of a set of tests, than that criterion is code coverage, see TddCodeCoverage for example. Depending on so many particulars test can cover 100% of the code base in ratios of 1:100, 1:10, 1:2, 1:1, 2:1, 10:1, 100:1. Any prescribed ratio, is absolutely useless, unless someone is able to show a preponderence of evidence that the correlation between the ratio and the 100% code coverage (which is the real target) follows some normal distribution centered around 1:1 value.

If not, people who follow the advice on this page will follow a useless advice while measuring code coverage is almost trivial these days and is a way better measure of the adequacy of any tests.

Indeed, watch this meaningless transform

 double Foo() { return 2.0 }

(1:1 code to test ration)
 void testFoo(){  assert(2.0 == Foo()); }  

(et voila an 8x change in the code to test ratio)
  void testFoo()
     double expected;
     double gotten;
     expected = 2.0;
     gotten == Foo();
     assert(expected == gotten);

Perhaps there is a hidden assumption in the original post that the unit tests and the production code are written in the same style. If you write the production code in the same style as the test code:

 double Foo()
     double return_value;
     return_value = 2.0;
     return return_value;

The ratio is now 8:6, which is not very far from 1:1 for such a small sample.

(Just for the fun of it:

 double Foo()
   double part_a;
   double part_b;
   part_a = 1.0;
   part_b = 1.0;
   return (part_a + part_b);

for 8:8 = 1:1

 double Foo()
   double part_a;
   double part_b;
   double part_c;
   part_a = 1.0;
   part_b = 1.0;
   part_c = part_a + part_b;
   return part_c;

double testFoo() { assert(2.0 == Foo()); }

For a 1:10 ratio :P

-- JeromeBaum? )

Unadulterated non-sense. Some mr. Ischenko issues by the way of handwaving a pearl of wizdom on software engineering and somebody repeats it on wiki justifying with a link to handwaving. Wow. Wiki is really making progress.

And your contentless pointless post is an example of progress?

Do you have any better idea? Do you know any reason why it's wrong? I want to know how this ratio was measured and how do they know that is the right one.

MaxIschenko: I agree that "should be close" or "most probably" may be a bit strong here. But still, what's your arguments? As for how this ratio was measured: just count LinesOfCode in production code and test modules. No? My statement about "ideal" or "proper" ratio is based on some TDD/XP figures. Non-authoritative, yes, but still reasonable. See, for example,

Not reasonable at all.Reasonable means based on reason, i.e. on logical arguments of some kind. Since you are making the strong claim here, it is you who should bring a supporting argument. 1 to 5 is not unreasonable. 1 to 10 is not unreasonable. If you refer strictly to UnitTest as in "TDD", as opposed to more gneral types of tests (like acceptance tests, etc) 0 to X is not unreasonable either. Just go about studying some SoftwareMasterpieces out there before you impart your invaluable wisdom on us.

See also: TddCodeCoverage

Much of the above is ContentFree? (other than a few flames) and is need of ReFactoring.... at any rate, there is a good question posed on this page. What is the appropriate ratio (and what tolerance)? Obviously, the boundary conditions of 1 to 0 (no tests at all) or 0 to 2 (no product) aren't good; but what should the optimal ratio be? I've no idea. There is a data-point in support of 1 to 1; anyone have other datapoints? Since this is a highly qualitative (and thus not falsifiable) argument, telling other parties to "prove it" is inappropriate--and the numerous AdHominem attacks seem unwarranted.

That was the point, that it is ContentFree? with no improvement in sight. Demanding a "supporting argument" (which is distinct from proof -- therefore be mindful of who is flaming whom) is entirely appropriate in the case that somebody makes a strong claim. Your "obviously" doesn't bode well either for the content of this page. What the hell is so obvious ? 0 -- no UnitTests at all obviously cannot be qualified as "no good" since we have plenty counterexample in SoftwareMasterpieces. What is your data point and what claim does it support ? And what is your methodology for cherry-picking data-points ?

Easy, Costin... I wasn't the original poster. At any rate, there is no need to be rude (which was the point I was trying to make). As far as various SoftwareMasterpieces go... you don't seriously propose that these have gone untested? While they may not come with a suite of UnitTests all bundled up in the source tarball, each and every one of them has likely undergone rather thorough testing of some sort.

Many have been in the public view for a long time and have undergone the extensive peer review afforded popular OpenSource programs. The closed-source masterpieces (such as VMS) most certainly underwent thousands of hours (if not more) of "traditional" formal QA efforts. Many of these programs were written before UnitTest-heavy methodologies such as XP became fashionable. And a couple of them were written by DonKnuth. :)

I think there's an interesting discussion to be had on this subject--one that is perfectly acceptable to conduct informally.

After all, this is wiki, not Communications of the ACM--rigorous standards of research and peer review need not apply to opinions held here. While your objection to the 1:1 ratio proposed above is duly noted--I have no particular reason to either believe or disbelieve that figure--I propose that additional suggestions from others be collected. Premature demands for "proof" are ConversationalChaff. Again, I think you are dreaming about premature demands for proof.

I am not sure how to determine what a reasonable ratio of test code to production code would be. On the one hand, when using a unit test framework, a single assert statement may be sufficient, but the amount of set up code can vary. Also, the amount of stub code required can vary from zero to quite a bit. Emotionally, I would start to be concerned if the quantity of test code were to be come equal to the production code, because it would appear that the test code may become a higher risk area than the code itself. Has anyone done any measurements on what is being experienced?

OK, I have removed this page from a list of CodeSmells. Would the following formulation be acceptable: if the project aimed to have 100% of code coverage by UnitTests and you want to quickly check this, just check the ratio. If it's far from 1 than you may have problems. If it's close to 1 than you may have not problems with code base.

Guess than it is really not much sense here. -- MaxIschenko

Somebody raised the very valid issue that if the ration is 1:1 then on needs to be at least as suspicious of unit tests. Who tests the unit tests?

In TestDrivenDevelopment...

I'd like a more nuanced view, using opensource projects as a metric. Does test ratio increase with project's maturity? Is there a difference between the ratios of buggy crap vs. secure stable software? Etc etc.

Only small fraction of projects are committed to serious unit tests. And among those, I believe open source ones are minority. I'd also want to make clear that "no unit tests" doesn't mean buggy likewise "100% unit tests" doesn't mean "buglessness". Unit tests would (may?) yield other important benefits as well, like cleaner design, reduced development time, reduced time for system testing and deployment.

The ratio make sense only for the projects that actually **have** tests and it allows to make a gross evaluation of it's quality. -- MaxIschenko.

Re: "I achieved 100% code coverage with 1:5 ratio between lines of tests and lines of code"

I, for one, would be very sincerely interested in learning how production code can be effectively tested with tests that are only 1/5 the size, in lines of code, than the production code. Is there lots of boilerplate production code here? -- JeffGrigg
So far, the above content seems to concentrate on lines of production code versus lines of unit test code. I think this is a false measure. What you ought to measure, instead, is control flow points in production code versus function points in unit test code. You should have no less than one unit test per production control-flow construct. In no uncertain terms, every if statement needs two unit tests (one each to exercise the consequent and alternate paths). A case construct will need no less than n tests (where n equals the number of cases in the construct). Etc.

A ratio close to 1 should be the minimum to strive for. You may not achieve exactly 1, because of Beck's rule, "Test only what can fail." Some production code might not need unit testing, if you can prove it cannot fail (and there damn well better be proof-annotations in the production code to back your claims, too). However, ratios significantly greater than 1 are clearly a red flag, clearly indicating that there exists significant quantities of branch points which have yet to be exercised. (Use a code coverage tool, at this point, to determine precisely what needs to be exercised.)
How does this affect SystemSizeMetrics ?
WRT to "Is the ratio of number of lines of production code to the number of lines of test code a useful metric?"

Here's the thing about metrics - be careful to measure the desired outcome, and not the mechanism that achieves that outcome. In this case, I'm assuming that your goal is not really to have a certain test-to-target ratio for lines of code. That's just a means to an end, which is likely to have a high-quality system, or a great user experience, something along those lines.

What happens when you measure the mechanism instead of the desired outcome is that people will "game" the system. They'll satisfy the metric, but often at the cost of the real goal.

But let's say that you want to achieve a great user experience. That's much more work to measure, isn't it? And it seems a lot less scientific, but that's what you really want. Organizations shy away from measuring the right thing because it's hard. But it can be done. A survey with numeric rating on various questions, for example, might measure what you want. What's important isn't the absolute total, but how it changes over time. Getting better? Getting worse?

You'll get what you measure. Be sure you measure the right thing. -- DonBranson

View edit of August 23, 2012 or FindPage with title or text search