When To Stop Refactoring

Under what conditions, in the absence of new requirements (stories, bugs, etc.) do you stop refactoring? Are these conditions sufficiently well-defined that a team would never go into a loop with one "side" refactoring this way and the other "side" refactoring back again (or possibly through a larger loop).

This is probably more of a theoretical than a practical concern as people are very good at reacting to needless work. But it could happen at a very low frequency. (That's why a piece of software can in theory exhibit an infinite number of bugs: you just need a steady stream of "fixes" that cause a different bug elsewhere and a poor record of history. Of course, you only get that kind of situation when you avoid refactoring.)

Is there anything like a CanonicalForm of a refactored piece of software so that someone can say "This software is in CanonicalForm, so I don't need to refactor any more (until the requirements change)!"?

Presumably OnceAndOnlyOnce is a necessary condition for CanonicalForm, but what other conditions are necessary and sufficient?


Under what conditions, in the absence of new requirements (stories, bugs, etc.) do you stop refactoring?

I think this question misses the entire point of refactoring. A better question would be, "Under what conditions, in the absence of new requirements (stories, bugs, etc.) do you start refactoring?"

Refactoring should be viewed as a means to an end, not an end in itself. One needs to have a purpose for refactoring and not use the term as a euphemism for "I don't like Joe's code, I'm going to rewrite it."

One should not start refactoring unless he has a clear purpose in mind. Once the purpose has been accomplished, one is done. There is probably not an explicit 10 point check list to tell you when you are done, but most people can determine if they are being productive or just playing.


In the context of ExtremeProgramming, CanonicalForm doesn't exist and isn't even relevant.

XP is a continual process. There is always more work to do, so the program is never "complete". The project ends when the source code dies of entropy (XP doesn't claim to stop entropy, just to slow it down by a huge factor), or when all the major features are done and the customer can't write a UserStory that they consider important enough to spend time and money on.

In the same way, you don't refactor until you are in CanonicalForm. You refactor until the code you are refactoring doesn't smell or hurt anymore, and you then go to the next UserStory.

XP code doesn't have a CanonicalForm. It just either smells or doesn't (see CodeSmells). If it doesn't smell, there's no point in refactoring.


The program is never "complete". The project ends when the source code dies of entropy (ExtremeProgramming doesn't claim to stop entropy, just to slow it down by a huge factor), or when all the major features are done and the customer can't write a UserStory that they consider important enough to spend time and money on.

I'm not quite sure I agree with XP slows down entropy. In fact, if you see entropy, it means, information per byte/bit/SLOC. Xp embraces OnceAndOnlyOnce. Since the program specs can't change, it means that the information stays constant. On the other side, the amount of text is being reduced, so you could say Xp raises entropy. So entropy shouldn't be seen as a bad thing, after all we're getting a much higher SignaltoNoise? Ratio...) -- ChristophePoucet? This is a different sense of the word "entropy".


Fair comments. I'm interested in when to stop refactoring during a particular iteration, i.e. in the absence of fresh user stories.

Sounds like "doesn't smell" and "doesn't hurt" are two criteria for when to stop refactoring. Are these sufficient? Are they sufficiently well-defined that everyone will usually agree on when these conditions have been met? -- gn

I use "boredom" as a criteria. If I don't feel the effort of further refactoring and the associated testing is worth the benefit, I stop. Trust yourself to know when to stop and don't try to find a rule carved in stone. -- WayneMack


I'm often working on nearly unmaintainable LegacySystems (often ones that some pack of idiots just finished a few months ago ;-) ...where one could literally refactor for several man years before the software was in really good shape.

So I use the rule to "spend about equal time refactoring and adding new functionality." Otherwise one's progress in adding new functionality comes to a halt. -- JeffGrigg


I like MartinFowler's idea (paraphrased here): If you want to add a new feature to a program, but the program is not structured conveniently to add the feature, refactor the program to make it easy to add the feature, then add the feature.


PrematureRefactoring? is against YouArentGoingToNeedIt.

YouArentGonnaNeedIt is for deciding when to add new features. Refactoring is a different hat. Fowler's advice is good.

If the code is extremely unlikely to change in the future (that is, before the system itself would become obsolete), would it be possible that 'you arent going to need that refactoring', especially if the refactoring is not trivial could introduce bugs in a mission critical and stable system?

The answer is completely different depending on whether the thing is unit tested well or not. If not, you have more basic problems to solve (i.e. writing a complete suite of unit tests) before you refactor. If that's out of the question, you have a typical untested legacy system that everyone's afraid to touch and sits around and rots, i.e. a good example of the what drove people to come up with XP.

I've done experiments with toy applications to see how long I could refactor them. Eventually they converge on an oscillating state where the only refactorings that make sense are bi-directional and benefit of refactoring either direction (extract vs inline, for instance) is balanced by the cost. -- EricHodges

Extract vs Inline is a perfect example of when to stop refactoring. There are often many stages in which one could introduce explanatory variables instead of fully inlining things. This can increase comprehension for complex portions of the inlined code. Too many variables can cause excess vertical scrolling and slow down comprehension. Too few can cause complexity and excess horizontal scrolling. So it is mostly subjective.


"Breaking the system down into a lot of very simplistic pieces (as RefactorMercilessly will do) adds far too many trees and makes the system harder for someone new to understand what's going on."

Refactoring is something you do in order to make your design better, it is not the act of breaking it into as many parts as possible. Too little and too much granularity can both suck. So there's a sweet spot of the right amount of abstraction, the right amount of meta-ness, and the right amount of coupling. This spot is something that is fairly subjective and left to the experienced programmer's sense of TheCraft.

Refactoring is not only about breaking methods down into smaller methods, but breaking classes down into smaller classes. The smaller classes have fewer methods keeping everything manageable.

Classes or methods are simply code organization tools. As the number of these code organization tools increases, the number of things I must keep in my brain to fully understand a program also increases. If the classes and methods are fewer, but larger, before I RefactorMercilessly, each one may be larger and harder to understand than its refactored counterparts, but I don't need to remember as much thus avoiding the SevenPlusOrMinusTwo problem.

Once I understand a method or class, regardless of how large or how small it is, that method or class becomes 'one thing' in my SevenPlusOrMinusTwo cache. When I see that method or class again, I reach into my SevenPlusOrMinusTwo cache and pull out my understanding. If there are a lot of methods / classes, my SevenPlusOrMinusTwo cache gets filled very quickly and the probability of a 'cache miss' the next time I see a particular method/class goes up. Arguably, if the actual methods or classes are small, refilling my SevenPlusOrMinusTwo cache is easier. Of course, something gets aged out of the cache at that point.

I think the larger context of this page is not understanding any particular piece of code (be it a large or small method or class), but it is understanding an application as a whole.

If so, then you would agree that you must juggle as many different concepts when reading big methods as when reading small ones - with the rather important difference that with larger methods, these concepts are not "labelled" for easier reference (or, more usually, labeled in a language - comments - different from the one the "bits of code" themselves are written it).

If we are not concerned with the quantity of a particular kind of thing (classes, methods, lines) but with the quantity of different "bits of code" - and it does seem to be proper not to be concerned with a particular type of entity over another, since it makes the argument more general - then surely it is better, other things equal, to have the same number of bits labelled rather than unlabelled.

Would you say that an article without section titles is easier to understand than an article with them?

The complementary question is: would you say that an article with a section title before every paragraph is easier to understand than an article with a section title every page or two? I think that many points around the middle are better than either extreme.


Breaking one thing into two is never a problem. Breaking four things into eight is usually a problem. This is because of the magic number SevenPlusOrMinusTwo. Once a collection gets large enough, it is hard to learn. We have to break up the collection into smaller collections. And when we get a tree of collections, it takes a long time to learn enough about all the parent collections to understand a particular item at the bottom.

Epiphany! It's a Recoding problem! And, of course, the answer is in Miller: When there is a story or an argument or an idea that we want to remember, we usually try to rephrase it "in our own words".

In other words, when you and I look at a method that is long overdue for refactoring, we cheat, discarding irrelevant bits here, recoding there, until we each understand the method - but we certainly haven't broken it down the same way. So now if I refactor the code, its going to be harder for you to understand initially, because my refactoring will prevent your previous recording of the information.

The other thing that happens when we refactor is that smaller units get promoted in importance. This is where people really get screwed over. We start with one method, which does eight different pieces of work. The overall system is easy to understand, as it has but one entity, but the method is complicated. If we simply refactor, we end up with a system of nine entities - and these eight new entities, which were once subservient to the original method, are now promoted in importance. Now readers are cranky, because it is difficult to pick out from this collection which is the important one and which are subservient - the refactoring has destroyed the hierarchy which people have been depending on for their own recoding.

In short, the refactoring isn't over until you take your new entities and sink them back down out of sight where they belong. --DanilSuits


Perhaps it's an oversimplification to say that the mind can only cache SevenPlusOrMinusTwo classes (modules, whatever). Certainly the complexity and opaqueness of each individual unit is relevant, as well? If my classes are globbed up and difficult to understand, each one is going to bear more weight in my mind to think about. But if my classes are so well-factored that I get an immediate sense of what they do just by thinking of their titles, surely I can store more in my head at once, can't I?

I think not. From what I remember of cog-psych, the SevenPlusOrMinusTwo is relatively stable and universal. I think you've missed something ... see above, "Once I understand a method or class, regardless of how large or how small it is, that method or class becomes 'one thing'"; if a concept acts "large" by behaving as a bunch of entities, then it takes up more than one of the 7+/-2 ... and, of course, if it acts "small" it does so by acting as a single entity, taking up a single slot, and comes to mind intact.

DanilSuits wrote: In short, the refactoring isn't over until you take your new entities and sink them back down out of sight where they belong. and MarkAddleman asked: How do I take my new entities and sink them back down out of sight where they belong? My first thought is that the tests do that, because I believe WellFactoredProgramsCannotBeUnderstoodStatically, but I'd love to hear other ideas.

Well, I approach this problem with a loose concept. (Coding in java)
  1. The position of methods ordered due to their publicity. Public methods stacked at the beginning of the .java file, the rest afterwards.
  2. public methods keeping comparably small and delegating to the rest. The rest varies in length and complexity. And with a favor for private class methods.
  3. The rest arranging into semantical sections, marked by section comments.
  4. Variables keeping in an own section altogether at the very end of the .java file. They are always private. Access to variables via simple getter/setter methods. The getter/setters have the necessary privileges (preferably private, though everything is allowed).

With this I hope to prevent class refactoring as long as possible. It is supposed to keep classes a) intelligible. The "important" stuff is at the beginning and should be understood easily. The rest is ordered into sections and thus understandable in a section wide context. And a side effect is that b) it seems to make splits into more classes very easy - if really necessary.

To answer this topic somehow I'd sybillically say, refactoring is finished when the project is closed. And too much refactoring closes the project, too. ;-) -- MartinSchwartz


I believe refactoring is never "over". I also believe MartinFowler states this in his book. The code is never complete, it is always undergoing changes, therefore there is always a need to refactor. Likewise the developers are never complete, they are always learning new and better ways to do things and express themselves in code. Therefore, there are always opportunities to refactor. Don't worry about having a hard and fast rule when to stop refactoring. Trust yourself (and others) to be aware of when you are being productive and when you are not.


Does YouArentGonnaNeedIt ever apply to refactoring?

Yep.

If there's no pressing need to change it right now, go and work on the job you're supposed to be doing. You can always refactor an area when you come to work on it more directly. Refactoring is never finished. You refactor the bits that are getting in the way of the current issues facing you when working with the application.


I think that Refactoring is never over if you EmbraceChange: this is a premise of all agile approaches. According to KentBeck�s idea that you should let code do the talking and listen to what it says, you will realize how much Refactoring is needed on any given iteration. -- Gast�nNusimovich


See Also: CircularRefactoring, RefactorSlack.

---

I can't help but think that this page needs refactoring.

:-)

..there, I started, then stopped after one edit ...oops, the second one -> if you think more is required, YouSureCan?
CategoryRefactoring

EditText of this page (last edited November 12, 2014) or FindPage with title or text search