When you point your finger 'cause your plan fell thru
You got three more fingers, pointing back at you --MarkKnopfler?, "Solid Rock"
was one of the patterns found during Ward and Kent's Squeak/PairProgramming
BOF at OOPSLA '97. For me it was one of the those moments where I knew exactly
what was being said because I knew the pattern so well myself.
When I was an undergraduate student, one of my assignments was to implement a version of a ModelViewController
(MVC) in C++. I'd never used Smalltalk before, didn't know anything about MVC and only had the MVC Cookbook to refer to. I worked hard with my group for a day and we got a concrete-level version of MVC running (basically, no abstract classes but a concrete object split into view, control and model parts with observer-like behaviour). From these we abstracted Model, View and Controller abstract classes but when we came to re-instantiate them we hit a real problem - for some reason polymorphism stopped working. We tried everything we could think of to find out what the problem was but eventually came to the conclusion that there was a problem with the virtual tables the compiler was constructing (we were using the GnuCpp
compiler and distrusted it because it was free).
Anyway, my supervisor suggested that I sit down with an uninvolved friend and explain to them what I was doing (a case of DebuggingInPairs?
) (actually an example of DebugByDescribing)
. 10 minutes of explaining later I discovered that I was passing an object by value rather than by reference between the Model and the View and that lovely C++ phenomenon - ObjectSlicing
- was occurring. Hence, no polymorphism.
Is there a point to this story? When you've discounted everything fantastical, whatever's left, no matter how mundane, must be true ... or something like that. Testing is often about looking for simple, silly, trivial little mistakes that have big impact on what the computer executes. The Computer is Never Wrong - Blame Yourself First.
I also have a counter-example of this. I was recently working with DistributedSmalltalk
to implement a change notification mechanism over CORBA (COS Events wasn't up to the task for what we wanted to do). I implemented a subject, an observer, and a notifier (to pass notification of changes in the subject to the observers) and got them running in a single image. But when I created the subject and notifier on one machine and the observers on another things started to go wrong. Of course, I blamed myself first and checked all my code and IDL - finding nothing which would cause the problem.
Eventually I tracked the 'bug' down, not to my code but to DST (or rather my use of DST). Objects on one machine reference objects on another machine via an instance of DSTObjectReferenceRemote. This re-implements doesNotUnderstand to forward any messages it doesn't understand to the remote object. Unfortunately, though, DSTObjectReferenceRemote does understand quite a few messages because it is subclassed from Object rather than UndefinedObject
. It understood my #update message (sent from the notifier to all of the observers) and so didn't forward it, hence the problems.
May I suggest a name without Blame. This sounds like our culture where someone (or something) has to take the blame. If we learn to look for causes of problems without looking for scapegoats (and start to understand than most of the time there is not "the" cause), we may learn a way to convince our users not to blame everything that goes wrong on the computer or the softwaredeveloper. I spend lots of time convincing my schoolheads that they can just as well as for a delay in sending information - nobody will blame them - ... than to try to convince management - an easy thing to do, but not a solution to their problem in the future - that the delay is caused by software or a programmer without a face. They might need the software or the programmer soon. Go for a solution, not for blame!
is a fine starting point. Mostly it means to say Don't Blame The Other Person For Until All Else Is Has Been Examined.
I like to set up a neutral hypothesis whenever possible. This is practical in the earlier stages of design, but perhaps not when the code is in front of you and not working. I sometimes put a pen cap on the table and say, "Here is a notion, let us see if it will stand or fall..." then describe it. That takes the ego out of the notion. If there are two, I'll write them in different colors or use two pen caps of different colors. Then we can start to forget who nominated which and even change sides to expose the issues.
I think this is one of the charactistics of a good programmer, that he or she assumes the machine is rational. I have also seen people who thought things "just happened", that bugs come and go (especially go) without rhyme or reason. One guy even talked of the "magic incantations" needed to make things work - no deep understanding of what the machine was doing. Suspecting compiler or tool bugs as a first resort, is part of the same attitude. Programming is not done on quicksand. -- DaveHarris
It seems to me that people who are really good at architecture and design are good at coding and have, at some time, done some of that coding in assembler language or else have fiddled with the machine in some low level way. BruceAnderson
once told me about the time when our University got its first computer - a box with a small manual on how to plug it in - and he and others had to work pretty much from scratch to get it to do anything useful. This leads to a DeepUnderstandingOfTheMachine
which I think not only supports the person as a programmer but also as designer and architect (although not every good coder is a good designer - there are other skills being brought to bear here).
I don't know that this is a good place to make this comment, but:
I recognize debugging as a process of assigning blame, although I sympathize with the concern of MartineDevos
that blame is a bad word when applied to people. When applied to code, though, it is a fantastically useful concept. I know that IsolatingTheProblem?
is another way of asking the question, "What piece of code is to blame for this behavior that I'm getting that I don't want?"
ANECDOTE OF WHEN IT WAS "MY FAULT":
I see the comment above about object slicing, and I can add my own experience about my own code "being at fault" because of a misunderstanding of the semantics of the language or API that I was using when the problem happened. In using ODBC to access a database to do some simple DBMS manipulation, I burned 20 hours looking for a bug in my code that caused ODBC to report an "Illegal table state" error instead of executing my SQL statement. The error turned out to be "my fault" in the sense that I failed to consider the possibility that the error message that ODBC reported might not actually have its documented meaning, and that instead it was trying to tell me that the DBMS's permissions were not set up correctly for the particular operation I was trying to do. Once I stopped trying to interpret the error message text and started to treat everything that ODBC had to tell me as a "SOME FAULT" message, I was quickly able to find and fix the problem.
MACHINES ARE RATIONAL, BUT NOT NECESSARILY UNSURPRISING:
suggests that machines are rational, and that is true at the granularity of machine instructions. But how many times have we all used tools, languages, and APIs that have subtle dirty-nesses in their semantics that surprise us, because we had foolishly assumed that they were going to do what we expect? When we inadvertently use "features" of our implementation domains that we didn't know exist, is it our fault because we didn't understand the full semantics of our tools, or is it the tool's fault for having surprising, couterintuitive, muddy, inconsistent, or otherwise cruddy semantics? As I said, this is not really a debugging comment, but more of a tools design comment. If anyone would like to direct me to a ToolsSemanticThread?
, that might be a more appropriate place for this note.
Aha, a violation of the PrincipleOfLeastSurprise
! I have this problem too -- ChrisGarrod
This is kind of related to another pattern I have: ShowMe?
. Someone comes up and says "I have this problem. I need to do this, but when I do this, this, and this, I don't get what I expect." Instead of debugging the commentary, watch them go through the process. Sometimes people aren't really doing what they think they are doing. In addition, being there is often easier than the encoding and decoding that is part of the process of explanation. -- MichaelFeathers
Sounds similar to one of mine: LetMeShowYou?
. When I'm stuck and I try to explain my difficulty to someone else, the solution often pops out. I think the key is that making things explicit for another's benefit forces one to question things that were taken for granted. -- KielHodges
DebugByDescribing discusses this phenomenon!
This is a lesson that sooner or later I forget and have to learn all over again. For me, there's a definite temporal aspect to it too. In my experience, it's not just that you should BlameYourselfFirst
, but also ItWasSomethingYouJustDid
. -- CurtisBartley
There's also another similar pattern to this. One that I have seen so many times that it's often the first out of my toolbox.
Programmers in general like to learn new things; new languages, new libraries, new systems, new coding practices etc. A lot of programmers also have a general distrust of documentation (it's always out of date, after all). So what happens is a quick flip through the first few pages of a book or manual, then hit the keyboard for a play. The first few examples probably work OK, the programmer is a clever and skilled person and they came out of the book anyway.
Then the rot sets in. The programmer tries a little more and gets stuck in to some solution which isn't in the book. And it all falls to pieces.
In my particular areas of skill (C++, Java), these people often come to me with the problem, their head full of confusion about classes, exceptions, packages, templates and whatever else. Almost always, the problem is a simple thing which they had known for years
. Like strings in C/C++ need a NUL at the end, or you shouldn't change the index in a 'for' loop.
In the context of this discussion, what they should be doing is BlameYourselfFirst
, but what they are doing is BlameTheComplicatedStuffFirst?
After finding the problem (well, I mostly
find it), I often tell them the ParableOfTheRepairMan
. It's a good time to learn it.
A corollary is that, when a test fails, its usually a bug in the test. Its amazing how often inexperienced people will spend hours trying to find a bug in some code without first checking the test they've just written. The exception to prove this rule is that if the test has worked in the past, and you've just changed the program, then the bug is in the program - ItWasSomethingYouJustDid
Can I add another anecdote? When I first started learning C (which wasn't too long ago), I was writing various toy projects with an off-brand C compiler. There was one I wrote that compiled fine, and yet died every time I ran it. I looked and looked at my source code, which was a scant 20 lines of code, and could not for the life of me figure out what the problem was. Being that I was learning all of this by myself, and didn't have the luxury of LetMeShowYou?
, I finally chalked it up to an error in the compiler, gave it up, and went on to something else. It wasn't until coming back to it a few days later that I realized my beginner's mistake: I was writing too many characters to a character array. Talk about a sheepish face...
It was a good lesson learned though, and has given me a greater appreciation for things like PairProgramming
. Since that time I've had other experiences that have taught me again: BlameYourselfFirst
Although we recognize that computers are rational, real software isn't written for computers - it's written to run on top of a software stack [ operating system, class libraries, etc ] that may contain bugs. Claiming that some other level of the software stack introduced the error is only a cop-out when you can't back up your claim. In the last few months I've had the opportunity to bash on a few hundred bugs in a medium-sized Java project. The vast majority of the bugs we've encountered were our own. However, there were a few bugs that didn't make sense [ how can an assignment to an array component cause your VM to crash in a bounds-checked language? ]. It is only legitimate to absolve yourself of blame if you find a bug report that describes your situation in the documentation for your software stack [ eg Javasoft's bug parade ], or you can run your program under one VM but not another, all other variables being equal. In the case of timing-related or memory-related bugs, this may not absolve you, but it may give you insight into how your bug is revealed, as you gain an understanding about the interactions with the superstructure upon which you have built your program. As always, before absolving yourself of the blame [ and starting work on a workaround! ] you should let someone else review your "proof".
Agreement here. Sometimes it is
the compiler. I took a PL/1 course, and as part of the project, I was creating essentially an array of structures. I had to pass this to a subroutine. It failed. I could pass single structures, and arrays of scalars, but an array of structures, or a structure of arrays, would fail. I expended almost $25 dollars in class account money playing with the variations, and finally went to the professor, who investigated, and found that it was due to some compiler patch that had been applied by one of the grad students. SO the saying should be "BlameYourselfFirst
, but don't be a fool about it" --PeteHardie
Hard lesson to learn in the beginning. But a useful one. I wouldn't say the Computer is always right. Only that, if something just now stopped working, or failed, then it's probably something you just did. However, there are always, always exceptions. Which exceptions fuel the doubt next time, and make it less-than-trivial to reflexively point the finger at NaughtySelf?
I have had this pattern for years. The Pogo Rule: "We have met the enemy and he is us."
Best case: Found that a lot of people were trying to crack my account on evening in class. They were all in the class. I thought I could trust them. In fact it was a small script they were using that defaulted to login to my account. I had written the script and encouraged them to use it. Duh!
, Hunt and Thomas describe a principle called '"select" isn't broken'. This amounts to "blame yourself first." (See principle no.26)
Why not just skip the blame part and fix the problem? There is no sense in being a martyr ("Everything is my fault"), and if you find out it was not your mistake is it now okay to blame someone else? Blame, no matter who it is directed at, just diverts energy from resolving the problem at hand. -- WayneMack
Working out who to blame is really working out where the problem is most likely to be. If I'm convinced it can't be my fault, it can't be caused by one of my actions, so I don't need to look at those. Working out where the problem might lie is an important part of problem solving - you can't look everywhere at the same time - so if you "skip the blame part", you may well be less effective at solving the problem. The lesson of this page is that if you don't BlameYourselfFirst, you might be reducing problem fixing effectiveness, not increasing it. -- PaulHudson
Right. There are two blamings that tend to get mixed. There's causality as blaming, and there's punishment as blaming. Certain "modern" notions of management throw away causality with punishment, which is no good (especially when the punishment subtly creeps back in). My preference is to work with people for whom mutual trust and respect are high enough that anyone can point to anyone else and say "it's your fault" and it's okay. It's okay because we understand that discovering causality doesn't mean "you always fuck up and now must be killed", which is the primitive side of the punishment thing, I think, and why we fear being at fault.
Oh, and just because you could
go around pointing and assigning blame doesn't mean you have to. -- WaldenMathews
Why not just skip the blame part and fix the problem?
Just there are two meanings of blaming, there are two meanings of problem
. One is causality (again), and the other is outcome or symptom. Fixing the symptoms is usually not the optimum course of action, although there are times when just cleaning up the mess is sufficient (compare error detection with error avoidance). -- EricScheid
if you "skip the blame part", you may well be less effective at solving the problem.
I see the opposite problem all the time. People assume that that some section of code is to blame and immediately start attacking that area. They will make dozens of "improvements" none of which address the problem. I usually end up having to sit down and force them to do problem isolation first. Bugs, by their nature, are non-intuitive problems. Don't waste time by immediately assuming you know where the problem is (blaming a section of code). Repeat the problem, isolate it, then fix it. -- WayneMack
Oh, I see this too, but assigning blame poorly doesn't necessarily mean attempting to apportion blame is a losing strategy.
It's "you may well be less effective", not "you will be less effective". Blaming the correct area does speed things up, and the argument of this page is that BlameYourselfFirst
is often the correct area, and is often better than not blaming anything - because it really is probably your fault. Even problem isolation often involves guesses on what to try next.
You have code that doesn't behave right (say, a function produces the wrong result, or a web site is crashing). You can start looking at the generated machine code, in the first case (because I can't assume that the problem is in my code - it might be in the compiler, if I follow your method literally), or run hardware diagnostics, rig up a network sniffer, etc in second case. Or you can look at changes you recently made. Which do you do first? What information do you collect to help you isolate the problem? It depends on which you think is the most likely cause (i.e which bit is mostly likely "to blame"). In my experience, it's nearly always some recent change (i.e. BlameYourselfFirst
), and looking at that will lead to a fix more quickly and with less effort than a bottom up isolation effort that tries to rule out nothing.
Part of the skill of good problem fixing (and good problem isolation) is have an intuitive idea of where the problem lies (and another part is knowing when you're not getting anywhere with that and it _is_ time to go back to basics).
In other words, I disagree quite strongly that "bugs ... are non-intuitive problems". Some are. Many are not. Many are caused by the first thing you think of. -- PaulHudson
, but if that doesn't work, then you did something wrong. -- WaldenMathews
I believe that everyone and every element involved with the code, system, or design etc. should not have any prejudice or basis attached to it's likelihood of being the cause of a broken system. I had a very bad experience when I first was teaching myself programming. After my stint in the Army, I decided that I really wanted be a programmer. I had pervious experience do some basic web design. My interest now was in learning how to do desktop application coding. After a little research I decided that C++ was the language that I should learn. So I bought 2 books about C++. One was actually a tutorial book that came with the student copy of Borland IDE.
So I'm reading the fist chapter a breeze thought the first couple hello world examples, then I get to an example that uses a couple functions. I typed in the example verbatim, and it wouldn't work. I thought I had a typo, a missing semi-colon somewhere. I messed around for hours trying to figure why on earth this small one page code example wouldn't run. Very frustrated that I couldn't even get through the first chapter of the book I gave up on C++. The other book I tried, also had issues, at first I tried running even their simple console program examples and they would say open. Turns out none of the examples in this particular book had the "getchar()" or whatever function that is that stops the console from running when you click on a .exe. So I gave up on the idea of being a programmer, I thought I am stupid, how am I kidding etc. etc. Besides I had very low self-esteem anyway from growing up in abusive household and doing very poorly in school.
Fast forward 7 years later. After getting info computer networking, and studying TCP/IP etc. on my own I became once again very interested in the idea of doing something with computers. Thus started my self-education. I went back to college, went back east and stayed with family as I attended college. While I was home, I happened to come across that book with the code example I couldn't do before. I gave it another try, and I starred at it for hours. Then WHAM! it hit me, I found a typo in the code example. It wasn't a straight forwarded typo it was a serious logical typo. Very pissed off I called the Borland, the publisher of the book etc. that I wanted a refund to say the least.
So if someone asks me where to start blaming first, I'll tell them everyone and everything must be considered equally suspect. We must not attach any pre-conceived notions about where the blame starts. -- SAmundsen