Long Function Examples

A random collection of LongFunctions, taken out from exemplary open-source programs.

The collection is not intended to show master-pieces or otherwise perfect code, but to show that writing LongFunctions is a common and normal occurrence in the life of most skilled and professional programmers. That's why they are really random, no careful study was done to select those examples who can serve as models or masterpieces. In comparison, I'd think anyone would be hard-pressed to select any model open-source projects of comparable complexity (JUnit may qualify but is nowhere near comparable in complexity) that will follow LotsOfShortMethods religiously.

I don't think any of the short method advocates suggested that long methods weren't common and normal. Lots of bad practices are common and normal. That doesn't make them better.

Says EricHodges that they are bad practices. Well, your mileage may vary, very reasonable people beg to differ from such opinionated nonsense.

Why are your opinions just opinions while my opinions are opinionated nonsense? -- EricHodges

Besides, Eric's right, they are bad practices, at least in many people opinion, just go read CodeSmell, this has always been on the list, and rightly so. LongMethodSmell, very badly.

Well, guys, your opinions are opinionated and nonsensical, because despite overwhelming evidence in many kinds of ways, despite repeatedly failing to challenges to argue your position correctly and coherently, which means a hell of a lot more than "I like it better this way therefore everybody should do the same". Well despite all that, you clutch at straws but maintain the same indiscriminate and exclusivist message (my way is the only right way - LotsOfShortMethods is the only way to write professional code), instead of simply accepting that different people have reasonable different points of views, and some people, not the least of which a whole series of GrandMasterProgrammers , prefer a more subtle and flexible attitude. Not every method that exceeds EricHodges' or AnonymousCowards power of comprehension can be qualified as a bad practice example.

In spite of having that minimum courtesy and respect for some people fellow programmers, who have shown more than cheap talk on wiki, and therefore allow for a different point of views, you insist on calling what other people consciously decide to do "bad practices", "bad software engineering", "piss poor quality", etc. Well, boys, you are way out of your league. Nobody wants to stick LongFunctions down your throat, but when you want to stick your illusions of what's really important in SoftwareEngineering down everybody else's throat, you are way out of your league. -- CostinCozianu

Perhaps function length isn't "really important" in your definition of software engineering, but imagine that you owned a company that developed and owned software as intellectual property. The sales of this software would provide income for shareholders and employees as long as you could keep it competitive. Would you (and your shareholders and employees) rather have methods that were easily comprehended by GrandMasterProgrammers or EricHodges? -- EricHodges

We are arguing past each other. I believe that your perception of those arguing against you is much more extreme than reality. The point being made is that short clear methods are better than long involved methods, and that when presented with a long involved method, it is almost always possible to improve the code by splitting it into shorter methods (do we agree to this point?).

First of all, I hope I'm not confusing distinct opinions, but there at least some proponents who expressed something that cannot be qualified as other than extreme and extreme nonsense at that. For example that all the code below is "piss poor", or an example of bad practices, etc.

I've added just one more example, that can be qualified for sure as a display of mastery in the art of programming, by somebody who is undeniably GrandMasterProgrammer, furthermore a piece of code who was designed on purpose to be a long function, a very long function at that (while the other examples , being taken at random maybe objected to on the grounds that the developers may have put the algorithm together in a hurry and not have enough time to refactor, may contain subtle bugs, etc, etc), but this later example cannot be objected on any such grounds.

What you appear to be arguing is that this improvement is marginal in comparison to many other improvements that could be made. Nobody has denied this. Short methods don't solve everything - they just solve some things. What we are advocating is that if you have a choice between writing one long method and writing two short methods, try to go the route of the two short methods. -- DanielSheppard

I can agree with whatever you think you are advocating providing that you can reach a consensus on what is a short method (is 20 lines of code too much to you? how about 50? how about 100?), on what exactly influences the choice ( I hope you're not arguing that in general a 4 liner is better refactored to a two line method invoking two others 2 liners, neither, for example that 12 is better than 6 + 6 + 2 or any other nonsense like that, and provided that you can think of reasonable exceptions to such a rule. And provided that you add yet another provision that the best judgement of a competent and responsible programmer in the particular context of a project always overrides whatever LotsOfShortMethods dogma promoted on wiki.

Then maybe we are in agreement after all.

It is not - as some people have interpreted - intended to show that LongFunctions is always the way to go. The intent is to show that LongFunctions are just a normal option like many other options, and making too much fuss that a method is too long (by some standards >10 lines or > 1 screen) is much ado about nothing.

I can provide many examples of wars in which civilian casualties are high, yet there are relatively few examples of wars with low civilian casualties. Does that mean that advocating low civilian casualties is "making much ado about nothing"? For each of these examples, it can be argued that breaking down the methods will make them easier to understand. The fact that such change has not occurred is beside the point. Everybody is in rampant agreement that the writing of long methods occurs.

In an industry where something like 90% of projects fail, you can't make too much fuss about losing bad practices like this. I'm very surprised that someone actually tried to defend the code. While it may be common looking code from your average open source project, it certainly could use a lot of improvement. There's plenty to be fixed in just about every sample shown that could help clarify the intent of the code for future reading of it. Just because code works doesn't mean it makes its intent clear, and clarity is far far more important if the code is expected to live for a while. Unmaintainable unreadable code just gets rewritten rather than reused. While any one of these could be understood with a little effort, a program full of these would be absolute hell to maintain and extend, let alone grow through a few versions. Code should require far more than simply working, to be kept around for the long haul, it should be clear, readable, well factored, dare I say "agile". I'm really beginning think these agile method guys are the only ones who really know what they're doing anymore. Page after page of people defending bad practices like this, what's going on?

Well, if it is true that >90% of the projects fail, you may consider it somehow related to the following factors:

I'd have to enumerate a ton more of other crappy things that happen in the industry before I'd even start considering that people writing unnecessary LongFunctions may be a significant factor, while I can serve you plenty of examples of LongFunctions where ItJustWorks.

And besides, it is easy to recognize for anybody remotely experienced with the level of programming out there in the wild, that if some team or half that team ( probably including myself in this sample, I know for a fact that one guy is a much better programmer than me ), is on the same level of programming capacity with some of the guys who produced the code examples on this page, that team would not have absolutely any worry for the next deadline.

See LongFunctionsInLisp
In Prolog from Flora:

                    +NamedHeadVars?, +NamedBodyVars?)
 Assert a rule with the given Head and Body and define the head for
 undefinedness checking.  If the head is a hilog predicate, wrap it
 or WRAP_TDYNZ_HILOG, and update flora_not_tabled_registry if the head
 is not tabled
 flora_insert_rule(InsOp?,Head,Body,BodySig?,BridgeRule?,HVars,BVars) :-
        ( var(Head) ->
            flora_abort('uninstantiated rule head')
	    ( Head=FLLIBMODLIT(_F,_A1,MName) ->
                ( var(Wrap) ->
                    flora_abort('uninstantiated rule head')
            (((Type==hilog);(Type==flora)) ->
                flora_abort('invalid rule head')
	    ( (Type == hilog) ->	
	        ( InsOp? == FLLIBINSERTRULE_A ->
	        NewHead? =.. [Wrapper,Pred|Args],

runtime_get_fingerprint(Pred,Args,Funct,Arity), ( flora_check_tabled_registry(Module,Funct,Arity) -> conget(flora_global_tabled_dynrule_num,RN), NewRN is RN+1, conset(flora_global_tabled_dynrule_num,NewRN), flora_tdyn_hilog_user_module_predicate_symbol(Module,TWrapper), TabledHead? =.. [TWrapper,RN,Pred|Args], TabledRule? =.. [FL_IMPLYOP,NewHead?,TabledHead?], assert(TabledRule?), NewRule? =.. [FL_IMPLYOP,TabledHead?,Body], assert(flora_rule_signature(NewHead?,BodySig?,[TabledRule?,NewRule?],BridgeRule?)) ; flora_enter_not_tabled_registry(Module,Funct,Arity), NewRule? =.. [FL_IMPLYOP,NewHead?,Body], assert(flora_rule_signature(NewHead?,BodySig?,[NewRule?],BridgeRule?)) ) ; ( is_invalid_flogic_head(Pred) -> flora_abort('invalid rule head') ; NewRule? =.. [FL_IMPLYOP,InstHead?,Body], assert(flora_rule_signature(InstHead?,BodySig?,[NewRule?],BridgeRule?)) ) ), assert(NewRule?), (Pred==WRAP_OBJEQL -> (flora_trailer_registry(Module),!; ( flloadtrailer(BASIC,Module) -> assert(flora_trailer_registry(Module)) ; flora_abort )) ; true ), flora_define_predicate(InstHead?) ).

In OCAML, taken from camlp4 :

 value check_use nl el =
  let ht = Hashtbl.create 301 in
  let modif = ref False in
  do {
      (fun e ->
         let u =
           match e.name.expr with
           [ <:expr< $lid:_$ >> -> Unused
           | _ -> UsedNotScanned? ]
         Hashtbl.add ht e.name.tvar (ref u, e))
      (fun n ->
           let rll = Hashtbl.find_all ht n.tvar in
           List.iter (fun (r, _) -> r.val := UsedNotScanned?) rll
         with _ ->
    modif.val := True;
    while modif.val do {
      modif.val := False;
        (fun s (r, e) ->
           if r.val = UsedNotScanned? then do {
             r.val := UsedScanned?;
               (fun level ->
                  let rules = level.rules in
                    (fun rule ->
                       List.iter (fun ps -> mark_symbol modif ht ps.symbol)
           else ())
      (fun s (r, e) ->
         if r.val = Unused then
           Pcaml.warning.val e.name.loc ("Unused local entry \"" ^ s ^ "\"")
         else ())

In C from Apache:

 * apr_gcvt  - Floating output conversion to
 * minimal length string

static char *apr_gcvt(double number, int ndigit, char *buf, boolean_e altform) { int sign, decpt; register char *p1, *p2; register int i; char buf1[NDIG];

p1 = apr_ecvt(number, ndigit, &decpt, &sign, buf1); p2 = buf; if (sign) *p2++ = '-'; for (i = ndigit - 1; i > 0 && p1[i] == '0'; i--) ndigit--; if ((decpt >= 0 && decpt - ndigit > 4) || (decpt < 0 && decpt < -3)) { /* use E-style */ decpt--; *p2++ = *p1++; *p2++ = '.'; for (i = 1; i < ndigit; i++) *p2++ = *p1++; *p2++ = 'e'; if (decpt < 0) { decpt = -decpt; *p2++ = '-'; } else *p2++ = '+'; if (decpt / 100 > 0) *p2++ = decpt / 100 + '0'; if (decpt / 10 > 0) *p2++ = (decpt % 100) / 10 + '0'; *p2++ = decpt % 10 + '0'; } else { if (decpt <= 0) { if (*p1 != '0') *p2++ = '.'; while (decpt < 0) { decpt++; *p2++ = '0'; } } for (i = 1; i <= ndigit; i++) { *p2++ = *p1++; if (i == decpt) *p2++ = '.'; } if (ndigit < decpt) { while (ndigit++ < decpt) *p2++ = '0'; *p2++ = '.'; } } if (p2[-1] == '.' && !altform) p2--; *p2 = '\0'; return (buf); }

In Java, from Kawa :

  public SyntaxRules? (String[] literal_identifiers, Object rules,
		      Translator tr)
    this.literal_identifiers = literal_identifiers;
    int rules_count = LList.listLength(rules, false);
    if (rules_count <= 0)
	rules_count = 0;
	tr.syntaxError ("missing or invalid syntax-rules");
    this.rules = new SyntaxRule? [rules_count];
    Pair rules_pair;
    Macro macro = tr.currentMacroDefinition;
    java.util.Vector capturedIdentifiers = macro.capturedIdentifiers;
    for (int i = 0;  i < rules_count;  i++, rules = rules_pair.cdr)
	rules_pair = (Pair) rules;

Object syntax_rule = rules_pair.car; if (! (syntax_rule instanceof Pair)) { tr.syntaxError ("missing pattern in " + i + "'th syntax rule"); return; } Pair syntax_rule_pair = (Pair) syntax_rule; Object pattern = syntax_rule_pair.car;

String save_filename = tr.getFile(); int save_line = tr.getLine(); int save_column = tr.getColumn();

try { tr.setLine(syntax_rule_pair); if (! (syntax_rule_pair.cdr instanceof Pair)) { tr.syntaxError ("missing template in " + i + "'th syntax rule"); return; } syntax_rule_pair = (Pair) syntax_rule_pair.cdr; if (syntax_rule_pair.cdr != LList.Empty) { tr.syntaxError ("junk after "+i+"'th syntax rule"); return; } Object template = syntax_rule_pair.car;

// For the i'th pattern variable, pattern_names.elementAt(i) // is the name of the variable, and // (int) pattern_nesting.charAt (i) is the nesting (in terms // of number of ellipsis that indicate the variable is repeated). StringBuffer pattern_nesting_buffer = new StringBuffer (); java.util.Vector pattern_names = new java.util.Vector (); if (! (pattern instanceof Pair) || ! (((Pair)pattern).car instanceof String)) { tr.syntaxError ("pattern does not start with name"); return; } literal_identifiers[0] = (String) ((Pair)pattern).car; pattern = ((Pair) pattern).cdr;

Pattern translated_pattern = translate_pattern (pattern, literal_identifiers, pattern_names, pattern_nesting_buffer, 0, tr); String pattern_nesting = pattern_nesting_buffer.toString ();

this.rules[i] = new SyntaxRule? (translated_pattern, pattern_nesting, pattern_names, template, capturedIdentifiers, tr); /* DEBUGGING: OutPort? err = OutPort?.errDefault(); err.println ("{translated template:"); this.rules[i].print_template_program (template_identifiers, err); err.println ('}'); */ } finally { tr.setLine(save_filename, save_line, save_column); } } int num_identifiers = capturedIdentifiers.size(); calculate_maxVars (num_identifiers); macro.templateIdentifiers = new String[num_identifiers]; capturedIdentifiers.copyInto(macro.templateIdentifiers); macro.capturedDeclarations = new Object[num_identifiers]; for (int j = num_identifiers; --j >= 0; ) { String name = macro.templateIdentifiers[j]; Object binding = tr.environ.get(name); if (binding instanceof Declaration) { Declaration decl = (Declaration) binding; if (! decl.getFlag(Declaration.IS_UNKNOWN)) { decl.setCanRead(true); decl.setFlag(Declaration.EXTERNAL_ACCESS); } } macro.capturedDeclarations[j] = binding; } }

In C, an example of the mother of LongFunctions, also a display of programming art, from OCAML source code, the bytecode interpreter:

 static void intern_rec(value *dest)
  unsigned int code;
  tag_t tag;
  mlsize_t size, len, ofs_ind;
  value v, clos;
  asize_t ofs;
  header_t header;
  char cksum[16];
  struct custom_operations * ops;

tailcall: code = read8u(); if (code >= PREFIX_SMALL_INT) { if (code >= PREFIX_SMALL_BLOCK) { /* Small block */ tag = code & 0xF; size = (code >> 4) & 0x7; read_block: if (size == 0) { v = Atom(tag); } else { v = Val_hp(intern_dest); *dest = v; if (intern_obj_table != NULL) intern_obj_table[obj_counter++] = v; dest = (value *) (intern_dest + 1); *intern_dest = Make_header(size, tag, intern_color); intern_dest += 1 + size; for(/*nothing*/; size > 1; size--, dest++) intern_rec(dest); goto tailcall; } } else { /* Small integer */ v = Val_int(code & 0x3F); } } else { if (code >= PREFIX_SMALL_STRING) { /* Small string */ len = (code & 0x1F); read_string: size = (len + sizeof(value)) / sizeof(value); v = Val_hp(intern_dest); if (intern_obj_table != NULL) intern_obj_table[obj_counter++] = v; *intern_dest = Make_header(size, String_tag, intern_color); intern_dest += 1 + size; Field(v, size - 1) = 0; ofs_ind = Bsize_wsize(size) - 1; Byte(v, ofs_ind) = ofs_ind - len; readblock(String_val(v), len); } else { switch(code) { case CODE_INT8: v = Val_long(read8s()); break; case CODE_INT16: v = Val_long(read16s()); break; case CODE_INT32: v = Val_long(read32s()); break; case CODE_INT64: #ifdef ARCH_SIXTYFOUR v = Val_long(read64s()); break; #else intern_cleanup(); failwith("input_value: integer too large"); break; #endif case CODE_SHARED8: ofs = read8u(); read_shared: Assert (ofs > 0); Assert (ofs <= obj_counter); Assert (intern_obj_table != NULL); v = intern_obj_table[obj_counter - ofs]; break; case CODE_SHARED16: ofs = read16u(); goto read_shared; case CODE_SHARED32: ofs = read32u(); goto read_shared; case CODE_BLOCK32: header = (header_t) read32u(); tag = Tag_hd(header); size = Wosize_hd(header); goto read_block; case CODE_STRING8: len = read8u(); goto read_string; case CODE_STRING32: len = read32u(); goto read_string; case CODE_DOUBLE_LITTLE: case CODE_DOUBLE_BIG: if (sizeof(double) != 8) { intern_cleanup(); invalid_argument("input_value: non-standard floats"); } v = Val_hp(intern_dest); if (intern_obj_table != NULL) intern_obj_table[obj_counter++] = v; *intern_dest = Make_header(Double_wosize, Double_tag, intern_color); intern_dest += 1 + Double_wosize; readblock((char *) v, 8); #if ARCH_FLOAT_ENDIANNESS == 0x76543210 if (code != CODE_DOUBLE_BIG) Reverse_64(v, v); #elif ARCH_FLOAT_ENDIANNESS == 0x01234567 if (code != CODE_DOUBLE_LITTLE) Reverse_64(v, v); #else if (code == CODE_DOUBLE_LITTLE) Permute_64(v, ARCH_FLOAT_ENDIANNESS, v, 0x01234567) else Permute_64(v, ARCH_FLOAT_ENDIANNESS, v, 0x76543210); #endif break; case CODE_DOUBLE_ARRAY8_LITTLE: case CODE_DOUBLE_ARRAY8_BIG: len = read8u(); read_double_array: if (sizeof(double) != 8) { intern_cleanup(); invalid_argument("input_value: non-standard floats"); } size = len * Double_wosize; v = Val_hp(intern_dest); if (intern_obj_table != NULL) intern_obj_table[obj_counter++] = v; *intern_dest = Make_header(size, Double_array_tag, intern_color); intern_dest += 1 + size; readblock((char *) v, len * 8); #if ARCH_FLOAT_ENDIANNESS == 0x76543210 if (code != CODE_DOUBLE_ARRAY8_BIG && code != CODE_DOUBLE_ARRAY32_BIG) { mlsize_t i; for (i = 0; i < len; i++) Reverse_64((value)((double *)v + i), (value)((double *)v + i)); } #elif ARCH_FLOAT_ENDIANNESS == 0x01234567 if (code != CODE_DOUBLE_ARRAY8_LITTLE && code != CODE_DOUBLE_ARRAY32_LITTLE) { mlsize_t i; for (i = 0; i < len; i++) Reverse_64((value)((double *)v + i), (value)((double *)v + i)); } #else if (code == CODE_DOUBLE_ARRAY8_LITTLE || code == CODE_DOUBLE_ARRAY32_LITTLE) { mlsize_t i; for (i = 0; i < len; i++) Permute_64((value)((double *)v + i), ARCH_FLOAT_ENDIANNESS, (value)((double *)v + i), 0x01234567); } else { mlsize_t i; for (i = 0; i < len; i++) Permute_64((value)((double *)v + i), ARCH_FLOAT_ENDIANNESS, (value)((double *)v + i), 0x76543210); } #endif break; case CODE_DOUBLE_ARRAY32_LITTLE: case CODE_DOUBLE_ARRAY32_BIG: len = read32u(); goto read_double_array; case CODE_CODEPOINTER: ofs = read32u(); readblock(cksum, 16); if (memcmp(cksum, code_checksum(), 16) != 0) { intern_cleanup(); failwith("input_value: code mismatch"); } v = (value) (code_area_start + ofs); break; case CODE_INFIXPOINTER: ofs = read32u(); intern_rec(&clos); v = clos + ofs; break; case CODE_CUSTOM: ops = find_custom_operations((char *) intern_src); if (ops == NULL) { intern_cleanup(); failwith("input_value: unknown custom block identifier"); } while (*intern_src++ != 0) /*nothing*/; /*skip identifier*/ size = ops->deserialize((void *) (intern_dest + 2)); size = 1 + (size + sizeof(value) - 1) / sizeof(value); v = Val_hp(intern_dest); if (intern_obj_table != NULL) intern_obj_table[obj_counter++] = v; *intern_dest = Make_header(size, Custom_tag, intern_color); Custom_ops_val(v) = ops; intern_dest += 1 + size; break; default: intern_cleanup(); failwith("input_value: ill-formed message"); } } } *dest = v; }

[Moved from LongFunctions]

I don't find these that "long". They don't fit in a single screen, but they are hardly what I call long.

The metrics on LongFunctions, the genesis of this page, range from "5 to 9 lines", to "one screen." To a procedural programmer, these functions certainly may not look over-long, but to a SmallTalk programmer, they would. See the discussion on LongFunctions (which needs more refactoring after it cools).

All perfect examples of why long methods are bad. Cryptic variable names, poorly named methods, complete lack of abstraction, you can't even tell what these methods do without reading and executing the code in your head. This code was clearly hacked up, regardless of the projects they came from, they are all examples of how not to write code. No one is claiming good projects can't have bad code, obviously they do.

I agree these aren't examples of "good/readable" code, but their flaws don't seem related to just being a long function. They seem more like unrelated problems. Cryptic variable names and poorly named methods have nothing to do with whether a function is long or not.

I agree, they have many problems, being overly long, at this point, is just one of many many bad things about these samples. If there were bugs in this code, it'd be very hard to find for someone who didn't write that code.

This page is becoming borderline ridiculous. Who is exactly qualifying these examples as piss-poor code? The more likely explanation is that you guys are pissing in your pants when you see a little longer algorithmics.

First of all, these examples are taken from successful and highly non-trivial projects, therefore a little bit of deference and presumption of professionalism and skillful programming should apply to the authors of these code pieces, some of which may have extra credentials through their significant contribution to ComputerScience. As opposed for the coward commentators that didn't sign I'd presume them as mediocre or average programmers just by the rush with which they jumped to throw their negative judgements. More experienced programmers, having confronted with situations in which they wrote similar code, and being aware of the effort and the many issues involved are much more reluctant to easily condemn other people code, especially code in successful projects.

But let's look over the real or imaginary issues, shall we?

There's this issue that you do not understand what these functions do "at a glance", among other things because of bad naming. Boys, and girls, these examples are extracted from their context in complex projects, involving complex algorithms (with the exception of C code which is rather simple). The authors could have named those functions anything that you wished for other than putting the description of the whole thing in the name. As a matter of fact there's no dubious naming at all in the Java code. You still wouldn't have much of a clue what's happening!

It is not like those "get this byte from here move it to there and store it in the database". That's the danger of being brainwashed by MartinFowler books who only operate with kindergarten examples. Boys and girls this is about real code with real complex algorithms. The way to understand this code is by having an introduction to the context, anm introduction to the algorithms involved, and as most open source projects are low in documentation, presumably by asking their authors. ut this is an independent problem of writing long functions. You can write some of these algorithms in Smalltalk if you will, you won't make the code self-documenting, because the complexity of the algorithms involved is just so much more than any naming convention may handle. Once you master the algorithms either from a formal documentation or from direct interaction with the author (as this is open source after all), you won't likely to have any major objection to the understandability of the code above (well, except for the C snippet), but that's a separate issue. But such understanding cannot come from LotsOfShortMethods in these cases, or any cases involving more than trivial algorithms.

There's the raised issue of bad naming. Frankly speaking, I can recognize bad naming only in the C code, and I'm not at all willing to jump in and be judgemental of that guy, because purely and simply this is is the prevailing naming scheme in C culture to begin with, and even here on Wiki we have had a little controversy with PanuKaliokolski? over his thesis that in some context very short names make for better readability. So whatever, that particular code is kind of simple anyways, the extra effort to accommodate to the naming scheme is quite minimal. The names used in the other examples are pretty normal I think, and I can't quite understand at all what the hell anyone can object to the names used in Java examples. Add to that the fact that we discuss about the length of these functions, we're not discussing naming, and let's move over.

There's the last "issue" raised, about the "complete lack of abstraction". That's extraordinarily laughable. Hey guys, welcomem to the real code of writing code and being productive and efficient. If you write a function that will output the minimal length string representation of a floating point number, you have to do just that. Having more abstraction that a simple function that iterates through the decimal places, is by any reasonable standard just unnecessary GoldPlating and is something not commendable at all. IOt looks to me that all these examples are just about at the right level of abstraction. For the guy who is probably not accustomed to OCAML: please pay attention that it has anonymous inner function. That doesn't make them less abstract than if you'd have taken them out at the top level and given them a bogus name. The accusation of "complete lack of abstraction" is completely bogus. All except the C example have plenty of abstractions used, while the C example, well, you cannot more concrete than this kind of concreteness and introducing unnecessary abstraction when you have to write some very simple and concrete thing is the most exemplary AntiPattern I can think of. -- CostinCozianu

Let's go one step further, though. No code is perfect so there always can be improvements. Let's say that the examples above could use some improvements (however it will not be about those ridiculous claims made above). However, these examples represent real working code from real working projects. Quite exemplary projects and code at that! The thing related to LongFunction? though, is that most programmers including the examples we have from GrandMasterProgrammers, will write complex algorithmics just like the code above, at least during the initial ram up of the project (the first write of such a code). Most of us will not write towards LotsOfShortMethods from the get go, and there's no real argument put forward so far that writing algorithmics towards LotsOfShortMethods is going to help in any way shape or form, other than the alleged advantages for later maintenance.

Then you'd probably agree that if it is reasonable for programmers to write code like this in open source, does it seem reasonable if we go back to them with the observation: hey refactor this code because we discussed it on wiki we think that it can "look better"? That would be just about ridiculous for open source software, not to mention absolutely ridiculous to tell it to somebody like DonaldKnuth. Hey, the code is working, the guy who wrote it understands it, with just a tiny bit of effort and good will you can understand it also, the only decent conclusion is that the code is more than GoodEnough for the purpose and the context. The effort involved in GoldPlating such pieces and bits to "look better" for whatever is your subjective evaluation of "better" is just ridiculous to be wasted compared to the many issues that are faced by a participant in a complex open-source project. There are much better things to do with his/her time.

Is there situation significantly different in commercial settings? Well, I don't know on what planet you work on, and what kind of projects you develop, but all I can tell you is my experience. As an "architect in charge", team lead, code reviewer, and code writer on significant scale, if anybody on my team would have code like the above, first of all I'd be proud, but if anybody would spend a couple of hours or more to try to refactor any of the code samples above into "better looking" code, I'd say his nuts. Simply put on most projects I worked where the results were needed yesterday, there's no way you can justify such GoldPlating activities. Yep, you can do it as a sport on wiki, but in most commercial settings you can. You can only justify refactoring activities in largely two cases:

Just GoldPlating the code so that we can reach the illusory heaven of LotsOfShortMethods of 5-9 lines of code is a terrible waste of the time of programmers, and it may be different for others but I have better things to do with that kind of time, including long posts on wiki during my lunchtime, and frankly my wife will kill me for having skipped another lunch. -- CostinCozianu

Personally I can't understand most of this but I don't know why I would be expected to, as somebody who has thought very little about the nuts-and-bolts implementation of, say, Apache. Certainly we can find examples of long functions in successful open source projects. But correspondence does not imply causation. Are these projects successful because of their long functions, or in spite of them? -- francis

Francis, why do you think you are qualified to raise the hypothesis that it's inspite of them? There are also plenty of LongFunctions in commercial projects, by plenty competent programmers. You can't get any more competent than DonaldKnuth, NiklausWirth or EwDijkstra from whom we have plenty of examples that are way beyond the Smalltalk gold standard (therefore considered long function). The logical conclusion of this discussion can only be that what is unacceptably long varies with the context of the code, with the personality and the preference of each programmer and the larger team, also the background culture that applies.

But in any case we should regard as totally unacceptable the fact that some hot heads look down on anybody or any example that doesn't conform to their illusory ideal for LotsOfShortMethods.

Maybe it's a manifestation of TheBestIsTheEnemyOfTheGood pattern, or maybe it just shows that what we think are rules aren't necessarily the rules that determine how good software is. Actually I predicted this threadmess would happen as soon as someone posted any real code. -- LayneThomas

Costin, I agreed with you when you were arguing against the dogma that long functions are always bad. But your more recent arguments have shifted to say that those who prefer reading short functions must be stupid and incapable of handling complexity. These examples are not algorithms that are especially complex, elegant, or subtle; they are simply harder to read than they need to be. Yes, many professional programmers write code like this (I have myself), but none of us should be satisfied with such results.

If the above examples work, I wouldn't change them just for the sake of changing them. However, if I am asked to fix a bug, or to change their behavior, then I would have to start by breaking them up into easier-to-grasp pieces. I would not consider this activity to be GoldPlating nor a waste of time; it is my way of analyzing code and presenting this analysis to future readers of the code.

Are you arguing that none of these examples could be improved by breaking them up a little? You are fond of AdVerecundiam arguments: can you find some quotes from your beloved heroes where they argued against breaking down problems into simpler pieces?

-- KrisJohnson

Kris, I'm not arguing against breaking problems into simpler pieces, I'm arguing against the prejudice that the gold standard for "simple piece" is 5-9 lines of code. A "simple piece" should be no more nor less than what's GoodEnough for that situation (including factors like the personality of the programmer, the overall culture related to the project and the language used, etc). For example I might do some renaming of variables in the C code, but I wouldn't feel any significant improvement by breaking it into smaller functions. It's bloody obvious as it is. I'm arguing that we are not in a position to judge that the above examples can be essentially improved by breaking them into smaller pieces.

Another thing, Kris, you say you'd refactor it if you were to fix a bug or change their behaviour, and that's obviously because you don't understand them as they are now. But have you weighed the other factor, the fact that you're not familiar with the algorithms and the context in which they are used? Aren't you trying to judge this pieces of code against an unrealistic standard: that somebody reading them oput of context in wiki should feel comfortable? For example, I don't have a more than superficial comprehension either for anything but the apache C example, but even the language scan level understanding of what's in there allows me to predict quite confidently that if I was to jump into these projects, and being familiar with the context and the algorithms, I wouldn't have any major objection to the shape that the code samples are in right now. Please note with regards to the OCAML example: it is very well factored in small inner functions that just do not have names, and make the top level function look long, but structurally it's made of lots of little pieces that are really functions.

I guess a more relevant question would be: assuming you'd have written such code yourself, and having a full grasp of it, are you sure you'll want to unconditionally refactor it the next time you touch it? I, for one, am inclined to respond: most likely no. -- CostinCozianu

I'm sure I'm more likely to refactor than you are, but I'll have to give a wishy-washy ItDepends answer: I'll refactor whenever I need to. Refactoring is something I do for myself, not for others. I don't try to guess what will make sense to other people; I figure I am not much smarter and not much stupider than anyone else, so if it makes sense to me, it will make sense to them.

If I immediately see that changing a couple of lines will achieve the desired effect, then I won't bother refactoring it. But if it won't all fit in my brain, I'll break it up until it does.

Things sometimes go the other way: when trying to figure out what LotsOfShortMethods do, sometimes I'll inline them together into one function to get the big picture. If I think the long assembled function makes more sense, I'll leave it that way. Again, this refactoring is not GoldPlating or wasted time, in my opinion, and I don't hesitate to do it if I feel the need, nor will I condemn others who do it.

We agree that functions should be as long as they need to be, and that is often longer than what the Smalltalkers-who-write-books have decreed to be good. What I disagree with is your apparent contempt for those who desire less cryptic code and who are willing to spend time de-cryptifying it. -- KrisJohnson

Well, I guess I din't make myself properly understood. I don't have a contempt for those who write LotsofShortMethods?: let them be happy, it's their style, not exactly my style. But I do have a contempt for those who are judgemental on these code examples without understanding the context of the project, whether they fit the bill or not, how the guy who wrote them originally understands them. The bottom line is that if these code examples are working and fit the bill than there's absolutely nothing to sneeze at, even related to their length. In my judgement they are perfectly OK, and even if I had some doubts, they'd have to break some significant threshold of unpleasantness, before I would make a judgement, in absence of context and understanding, that the initial developer has screwed up. It is simply a matter of collegial respect shown to the guys who wrote this code in projects that are really complex on one hand, and on the other hand projects that really work. And that should prevent us from using phrases like "these are piss poor code examples" or "no professional code should look like that". My disdain was directed towards those who uttered such nonsense, not towards all the developers who like LotsOfShortMethods.

I guess you could agree with me that the examples above whatever the improvements may be added to it, are not examples of "piss poor code", and those improvements (we can always improve our code) would have to be judged whether they make sense in a particular context related to the situation of the project.

From a quick glance at the Kawa java code, you could probably save a few lines and increase readability by using exceptions rather than using the tr.syntaxError method + return. I don't know the surrounding scope, but it's setting a lot of instance variables before throwing out the syntax error. If this extra context isn't needed, I'd move all the error checking to the start and break it out into a separate method(s). The comment in the middle describing the structure of a string seems to be crying out for the string to be replaced by an object. I'm too lazy to be bothered figuring out what's going on at the end of the method - it could probably do with some cleaning up.

Some of these decisions may have been made for performance reasons, and if that's the case they should probably be commented as such. The code is definitely not an example of something that is easier to understand by being crammed into a single method. -- DanielSheppard
There seems to be a lot of problems with the Kawa code - at least as far as my tastes are concerned. The argument rules comes in as an Object. Isn't there some common interface that all rules can implement? Things like #length or #isPair? This line seems odd:

 int rules_count = LList.listLength(rules, false);

What does it mean when I pass false to a method asking the length of a linked list? There's a lot of stuff like this. Things that should be factored into methods on higher-level objects (capturedDeclarations should be something other than an array that has an #addBinding method) or sections of code that can be factored into sub-routines (good candidates are all the sections of code delimited by nested try...catch blocks or in for loops). Might be an interesting exercise to refactor these code snippets. But for me, one of the problems with the code is that it is indeed too long. Rarely can I easily understand code this long. -- RobertDiFalco

And for what it is worth, I've written plenty of code like the examples above that I've left in projects because it worked or because I didn't have time to figure out the best way to refactor it. However, that does not mean I would hold it up as an example of great programming. They are the things we try to sweep and hide in corners of our code. I don't understand this concept of GrandMasterProgrammer as it is being used here. Who cares? Programmers at any level create code that they are not pleased with and (at least for the Java example), well I can't imagine the author would want to point to this particular snippet and say behold its beauty. That doesn't mean s/he isn't a great programmer, it just means sometimes you have to get things done and move on. So I'm finding the huge and rambling posts of Costin on this page to be a sort of StrawMan argument. I mean, are we talking about ideals to shoot for or the messes you sometimes have to create when being productive. If we are talking about the former, then these clearly are not good examples. If the latter, why consider them, if they work then fine. -- RobertDiFalco

Robert, at what point those ideals that you are talking about, become utopic? I mean I can give you much more worthy ideals to shoot for, but hardly realizable in the current context of our profession, than this pseudo-ideal of 5-9 lines of code.

Utopic ideals are very dangerous. Let's take the Java, but we can take the Apache C example as well, and let's say you refactor it into LotsOfShortMethods style. In the grand scheme of your typical programming project, is this a significant difference, not from an abstract beauty point of view but from a cold economic perspective? I'd rather doubt. However, I find in our profession we confront too many times with unwarranted dogma, utopic ideals that are not that worthy of our attention, but pushed by religious zeal and militantism they make up so much noise to the point where people have already begun tio smear these code examples as "piss poor quality". I wonder when it will come the time when people will acuse me of incompetence or unprofessionalism for writing something that is greater than 10 lines of code.

You recognize such "messes we sometimes have to create when being productive", but why call them "messes" then, why not let them have a proper role as part of the normal software engineering activity? Some of the messes above were created by what we can call GrandMasterProgrammers (meaning programming guru if you will, with accomplishments beyond the reach of ordinary programmers, creme de la creme). They would not draw your attention to say "behold it's beauty", but they most likely will not be ashamed either. I rather suspect that the OCAML C snippet can even be commended for its beauty. If you, me, some of these master programmers, and many many other competent programmers do not write such algorithmics in LotsOfShortMethods style, maybe it is because writing in such style is not feasible or has other drawbacks, it doesn't make sense in the context of a particular project. Then they are not "messes" after all, they are what it takes to get the project done.

Costin, most things in life have flaws. Just because they are part of a working, even great, piece of software does not mean they don't exist and should be called otherwise. It is interesting to note that you don't see your argument as being just as hard-core as those who say functions should only be 5-9 lines long. I tend to not take such black and white points of views. I attempt to make my methods as small, readable, and maintainable as possible. Sometimes I succeed, sometimes I don't. Every one who has every worked on a large scale project has code in there they are not happy with. One just has to sometimes learn to live with things they are unhappy with and always strive to improve when one can. I know many programmers that you would call GrandMasterProgrammers and I know very few if any that are not hard on themselves, much harder than you are being on them. -- RobertDiFalco

I think we are largely in agreement except for minor nuances. I wouldn't be as "unhappy" as you would be with functions like the examples on this page and doing a quick search on my current project I have long functions, but not quite as long as these, but I won't loose a minute sleep over it anyhow. Why do you think I'm hard on those GrandMasterProgrammers? It's the guys who call these examples "piss poor quality" without having much of a clue that are hard on them, it's most definitely not me.

In particular the OCAML C example is a very beautiful (in my opinion) exercise in optimization, and it's the reason why OCAML bytecode interpreter is the fastest one around, also the reason why I can run interpreted OCAML code in my current project without bothering to go to native code compilation. Not only I wouldn't be hard on XavierLeroy?, but I would commend him for being able to write such code where warranted. I am a beneficiary of his efforts. -- Costin

So we basically agree. I don't know enough about OCAML to comment, but from what I can tell, it doesn't look unnecessarily long. It has closures so it would be hard to factor out that could in any beneficial way. I can't tell if there are reusable chunks in there that could be called as routines from the closures. I would only be qualified to comment on C/C++, SmallTalk, Python, Eiffel, or Java examples. -- RobertDiFalco

I was referring to the C example which is part of the OCAML intepreter implementation. I'm sure you can give your 2c on that. It is obvious that it can be written in a more structured way, even without gotos, but it's also obvious that it is a decision taken with the purpose to optimize the interpreter. And I think the code is still to be admired as a good exercise in optimization. -- Costin

Well...let's just say that I would be happy to use the implementation, I'm sure it's solid and fast, but I wouldn't want to work on it. It's not really my style. The gotos don't bother me as much as the use of hex literals without descriptive constants, repetitive stuff that could be macros, etc. I don't know what happens in certain cases of intern_object_table is NULL. Some places it seems like it is okay (the assignment is just skipped), in other places it would cause an assert to fail. It is easy to get confused as to what's a global, what's local, and what's static. It would take me a half-hour or so to understand this code. Mainly makes me (only me, not speaking for anyone else) glad that I don't program in ANSI-C anymore. -- RobertDiFalco

In particular the OCAML C example is a very beautiful (in my opinion) exercise in optimization, and it's the reason why OCAML bytecode interpreter is the fastest one around, ... -- Costin

Optimized code it may be, beautiful it is not. In general, optimizing code means that you have to sacrifice beauty and readability in order to get the best out of your compiler. I assume that this is the reason for such a large, hideous piece of code as the sample given, though I am not convinced it is necessarily true --- modern compilers are very good at inling functions, and optimizing across function call boundaries, etc. In particular, I find the #ifdefs grating --- take the code that includes the #ifdef and put it in a meaningfully named function, such as "convertArrayToCorrectByteOrder". Not only does this make the original code better express intent, since the function name explains what is being done, but it moves the #ifdefs out of the way of the main function --- you only see them if you really need to know how the low level function is implemented. If this is an example of a "good" long function, I am glad I advocate small functions. Even the Apache sample (which is shorter than this one) strikes me as too long, with clear opportunities for splitting.

Yes, my code sometimes contains long functions, but I am not proud of it, and I split them into smaller parts whenever I get the opportunity. Though I can see upwards of 70 lines on screen at once in my editor, I prefer functions to be less than 30 lines; the shorter the better.

-- AnthonyWilliams

Yea, what he said! -- RamonLeon
The apr_gcvt C example is interesting. The code seems to fall into distinct steps. For example, there is a bit which turns a number like 345 into the string "+345". I think the code would be easier to read if there was at least a comment explaining that's what that part did. Then, if I wanted to change the code (eg to make the '+' optional) I could find the place to change more easily.

That part would also make a possible sub-function. It is self-contained with few arguments or results. Extracting it would reduce the burden of the main function. Repeatedly extracting chunks like that would lead to the main function being about 6 lines long, and written at a higher level of abstraction.

The downside is that the little sub-function would lose context. For example, it assumes its number has no more than 3 digits. Even though it looks general, it's rather specialised. We could give it a very special name, and add assertions to make its assumptions explicit, and hopefully give it local scope so that it couldn't be called from places where those assumptions were invalid. Alternative we could generalise it to any number of digits, turn it into atoi() and allow its reuse elsewhere. If the latter we'd have to solve the harder problem and we might not be able to use the same efficient solution. We might also need to parameterise it a bit more, eg for the caller to say whether the '+' is optional.

Either way the code would become more complex. We'd be going from one large function with a complexity of about 1, to, say, 6 small ones each with a complexity about 0.3, so the total complexity would double. However, that refactoring would arguably be easier to work with because we'd usually be dealing with just one sub-function at a time. The complexity in the other functions doesn't matter because we'd ignore it. --DaveHarris
CategoryPerpetualArgument <snicker>

View edit of December 18, 2005 or FindPage with title or text search