Binary Search Commentary

See CommentingChallengeTwo for context.

This page is for discussions about readability. The "why would I write such a function?" discussion belongs in NeedingBinarySearch. -- AlistairCockburn
I'd think I didn't need much in the way of comments to explain Alistair's first method. But I might be wrong, I frequently am. -- RonJeffries
Alistair's first function: I'd replace the loop invariant comment with the relevant assertion:
    ASSERT( Array[lower] < value && value < Array[upper] );
I wouldn't have the "exits" loop comments as they just repeat what the code says.

This code seems more complex than I remember similar things being. I'd try to rewrite it so that the extra tests weren't necessary - if that succeeded, it'd remove some comments, and would be an example of comments being a sign of improvable code. If it wasn't possible, I might try to explain why. Is it faster this way?

I'd be tempted to replace the first test with:
    ASSERT( top >= 0 );
-- DaveHarris

The first test (top >= 0) isn't actually necessary (the loop condition already tests for that), and the loop invariant assertion is kind of unnecessary too, except for the purpose of documenting the assumptions made, and possibly catching bugs in the function or unsorted lists being passed from other buggy code.

I wrote a simpler C implementation based on Alistair's, but removing everything that I didn't deem necessary. Actually, it came out quite the same as Alistair's second version, but using pointers and an exclusive interval instead of inclusive indexes, which saves one subtraction ;-)

After reading Don and Tom's MethodObject implementation, it seems that our two algorithms are the same, just factored differently (yours using an object state instead of a few local variables). And I find it slightly disturbing that your isTargetAtMiddleOfRange is also responsible for setting the middleOfRange instance variable - wouldn't it be more clean to just have middleOfRange as a function instead?

 middleOfRange
   ^(lowIndex + highIndex // 2)

index self doesNotIncludeTarget ifTrue: [^notFoundBlock value]. self isTargetAt: self middleOfRange ifTrue: [^self middleOfRange]. self adjustRange. ^self index

isTargetAt: index ^(array at: index) == target

(with reservations for illegal smalltalk code as I've never used it, only read it)

And, as already covered on BinarySearchCodeOnly, the index function could use a loop instead, but looking at that code, loops don't seem to be idiomatic constructs in Smalltalk.

--SimonBrenner


The MethodObject code from Don and Tom is Sweet! I'd consider implementing a method for middleOfRange, lazy-initing the instvar. As it stands, in principle the adjustRange could be looking at a nil. (In practice, of course, it can't.) Sweet! -- RonJeffries
Personally, I found the one-method recursive version the hardest to come to understand.

I find the MethodObject more complex than the single method. It replaces 11 lines of code in a single method with 36 lines containing a class, 6 instance variables, 6 methods, one of which is recursive. When I look at it, I don't know whether we are seeing different ideas of what well factored code looks like, different ideas of what is "simple", at the top or bottom of the bag, or an example of ComplexIsBetter. From my background, I think of iteration as simpler than recursion, making a class reaches deeper than making a method, that 6 methods are more complex than one.

I will say I found the method "index" quite easy to understand (after I found it), being just "doesNotIncludeTarget, isTargetAtMiddleOfRange, adjustRange, self index". And I found the termination condition clear. I also recognize that Smalltalk code looks different in a browser compared to in a text file. -- AlistairCockburn

Issues of complexity or not relevant here. The question we are asking is "does this code need comments". Besides, the MethodObject pattern did not add the complexity, you added the complexity when you reached to the bottom of your bag of tricks and pulled out BinarySearch. -- DonWells


Let me help you look again, Alistair.

Please note that I didn't write it, though I'd be proud to have. However, I can read it and see that it works. It does have more lines of code than the inlined version. Well-factored code often does. But isn't the fair comparison 29 lines vs 36, rather than 18?

Also note that many of the lines are the object definition, constructor and parameter methods, which aren't comparable to the C subroutine. I haven't worked with Don or read a line of his code for a couple of years, but those are so standard that my eye didn't even see them. The operational code that does the binary search is only 14 lines, counting the method names! So maybe the fair comparison is 29 lines vs 14?

So while the code is physically larger, the operational code is smaller, and, I assert, it is in fact simpler. It's very well factored and to my eye quite clear.

The additional methods such as isTargetInMiddleOfRange, adjustRange, and doesNotIncludeTarget encapsulate code whose equivalent in the original needed comments. Because the names are there, those chunks of code are much less in need of comments.

If as you suggest, the code were written as a single method under SortedCollection?, it would have a half-dozen temps and lots of references to them. What's the standard refactoring when you have a large method referencing temps and with no real relationship to the rest of the class? MethodObject. Create an object, make the temps be its instance variables, send a single operational message to it, refactor the big blob until it's small. Applying MethodObject to the big method in SortedCollection? would regenerate the BinarySearch object.

Finally, I'd admit that it's a ways down in the bag of tricks, but binary search by recursion is pretty much the standard approach. Of course I was weaned on recursion so maybe I'm not a fair test. Tell ya what, I'll have some of the C3 guys look at it tomorrow. -- RonJeffries
Re. MethodObject. Interesting point about the new object with 6 instance variables vs 3 locals. But look again - there are actually 6 locals when you include the parameters of the original. So the number of variables really didn't increase. And we did start out implementing as a single method within Array. The problem was, however, every time we tried to factor out the more complicated conditions within the method we ended up having to pass 3, 4, or 5 of those flippin' local variables around. Really ugly! So it left us with a couple choices; add instance variables to Array (not particularly a well received practice ;-) or create a MethodObject. Secondly, because of the choice of names of methods (which, by the way, took the most time to agree upon) most of the details of the actual instance variables used becomes almost invisible. Let's "think Smalltalk" for a moment. The way one would find how the search is implemented would be something to the effect of:
        look at presortedAscendingIndexOf:ifAbsent:
        look at implementors of indexOf:in:ifAbsent:
        look at the BinarySearch implementation of index
In virtually all cases, this would be all the farther one would need to go. So in practice, of all the code that is written above, one would only see 3 methods, 2 of which simply cascade to the final one which implements the algorithm in a very concise way which no longer needs comments (the goal of this whole exercise). So, yeah, I think this is "simplest".

Finally, the really simplest thing, as we said at the top of our example, would be to use the method already provided. So it then becomes 1 line of code vs 19. Can't get much simpler than that. -- TomKubit
The notFoundBlock is a bit of creeping cleverness, not called for in the original test cases (what, there were no original test cases?) That knocks off one i.v. and simplifies the constructor, once you just return -1. Not so, the original challenge was to "write or structure this so the comments wouldn't be necessary" the "creeping cleverness" is there so that you don't have to document what the hell the -1 is returned for. -- jdw

A minor bug - the target is an object, not an integer, but that's just the names of a couple of parameters.

The target must respond to < or a binary search is not possible -- jdw

The original did have the target as an int. We took it at face value. -- tk

In the final analysis, whether you use a commented method or a method object comes down to what you, the writer, imagine about the state of mind of the reader at some point in the future.
Example of friendly brick-wielding back-and-forth presumably typical of amicable PairProgramming sessions:

Tom we never agreed on couldStillIncludeTarget. I was thinking maybe stillSearchingForTarget. -- DonWells

However, my esteemed colleague, couldStillIncludeTarget better implies a question being asked, where stillSearchingForTarget sounds more like a wait condition for some other operation to finish...(Hand resting on brick) -- Tom

Tom, I feel you may be misguided, the fact is that there is an operation waiting to finish. It's the search itself. For crying out loud. couldStillIncludeTarget that has no meaning as the condition of a while. I've got your brick right here. -- Don

Misguided, my ass! Why should I have to I have to tell people I'm waiting on the operation I'm already INSIDE OF! Take that! (brick descending over head) -- Tom

Tom, you maggot ridden turkey carcus, [self stillSearchingForTarget] is OUTSIDE of the while loop. -- Don

Don, you ignorant slut! That's my point exactly. Once you are outside the loop you're no longer waiting for it, now, are you? When I said "inside of" I was referring to the method itself. Thanks for agreeing with me. (How many more lumps do you want?) -- Tom
I consider myself satisfied with the answers. Thanks. -- Alistair
I disagree that the Smalltalk code is "simplest", although I would call it "better". I would use "simplest" for picking algorithms. Once the algorithm is picked (here it was constrained), the three code quality gradients I tend to use are "works", "easy to change", and "understandable", in that order. The Smalltalk code "works", and is optimized for "ease of change". I think if you put the comments back in the C code, it would be locally more "understandable", because it is shorter and all in one place.

Once code is optimized for "ease of change", the methods are so small they aren't improved by comments. In my opinion it doesn't mean that the code is clearer or simpler than hard to change code with comments [just easier to change :)]. See RavioliCode. -- StanSilver

Clarify my understanding, Stan. Are you saying that in your opinion one big Smalltalk method would be easier to understand than lots of little methods? If so, please display the code. -- rj (was added, above)

Well, to be honest, yes. Even though I prefer Don and Tom's code, and wish everybody coded that way (because it is easier to change), I find code like the following a little clearer. To me, the trade off is (80% clear and 100% easy to change) vs (100% clear and 40% easy to change). -- ss
Simplest doesn't necessarily mean all in one place, or even shortest, for that matter. See WhatIsSimplest and DoTheSimplestThingThatCouldPossiblyWork. -- TomKubit
A wonderful simplification in Don and Tom's approach is not worrying whether lowIndex or highIndex points to anInteger. Still, I find their MethodObject to be clearer. -- KielHodges
After catching up on all the fun people were having on this page, I have one question. Do some people equate "Digging Deeply into the Bag of Tricks" with something complex? Just because you have to ponder something and think about patterns which may exist that have already solved similar problems, does not mean that you are creating something complex.

It is often the simplest answer which is the most difficult to find. Sometimes a simple elegant solution is so basic, that people think there has to be something mysterious and magical going on underneath. On the contrary, it looks too simple because that's just it - it is simple, and I guess people are not all that used to seeing simple solutions!

Alistair, in response to your question about which solution people prefer, I prefer MethodObject to the one big method. It follows the standards described in Kent's Smalltalk Best Practice Patterns. And why not use a "best practice"? -- JeanineDeGuzman
I'd like to add a vote for Ron's "minimum lines but not maximum clarity" as the easiest for me to understand. It's probably partly to do with my never having seriously programmed in Smalltalk, but I was appalled when I first saw all the crowing about how 'sweet' Don and Tom's Smalltalk version is. I still can't make head nor tail of it! It's RavioliCode to Me! Ron's "minimum lines" on the other hand has nothing to get in the way of understanding the algorithm. It's obvious it's recursive, and it's also obvious (to me at least) how similar the algorithm is to the classic method of traversing a binary tree. Let's see who can get some Java here then. -- FrankCarver

See ReadingRavioli. -- RonJeffries
I hate the name doesNotIncludeTarget. To me, it implies it's the negation of doesIncludeTarget, which it isn't. canNotIncludeTarget would be better, or maybe isRangeEmpty.

Apart from that, I like the MethodObject version too. My main problem is that it's not obvious it's correct. The unit tests help, and I hope those 10 lines are being included in the complexity counts. However, there need to be more tests for the #( 1 3 ) case - eg search for 0, 1, 3 and 4 as well as 2. I'd also want to see tests for the 3-element case before I'd have any confidence that bugs weren't lurking in weird boundary conditions.

Even with those you are only testing a finite number of values. Reading index I don't get enough sense of how the algorithm flows, to know that, for example, it always terminates. Thinking about it, the culprit is adjustRange. If it were renamed to reduceRange (or even halveRange) then termination would become obvious (to me). It would make the O(log) performance clearer.

If you are going to replace comments with method names, it's really important to get the method names right. I've appended my modified index method to BinarySearchCodeOnly. -- DaveHarris

On your last point on naming, you're absolutely right. Naming methods correctly is extremely important. Somewhere around the middle of this page is a bantering session between myself and Don on just what a particular method should be named. This is exactly the kind of discussions that need to take place (without the name-calling, of course) to keep the system clear and simple. I kinda like your "halveRange" name. It's a definite improvement. To link this all back to XP, you would be in the pair that comes along after our code was released and change the name. No permission or ownership was needed and the system is better off for it. -- TomKubit

Yes, I noticed the bantering session. I thought it interesting that you went to so much trouble to find a good name, yet didn't really get it right (in my view). That's not to say that you were stupid, but that naming is hard. Writing good comments is also hard. For both it is sometimes easier for a fresh pair of eyes to truly get the best, i.e. client's, view. -- DaveHarris

EditText of this page (last edited May 2, 2011) or FindPage with title or text search