Too Much Documentation

Inspired by a note of Alistair in WritingItAndMaintainingIt, I'd just like to start a new thread on the amount of documentation necessary. Here's one of my war stories:

The first time I was technical project lead I was on the CMM trip. I made the team gather requirements, writing them down with reference numbers, then make an x-hundred pages specification and so on. We instrumented all the stuff and I think, the project extrapolated to a complete organization would have well reached CMM level 3. My main part was the architecture. A sat down with the team for nearly half a year, sketching object designs on the whiteboard, writing it down in a document, refining it, writing it down, refining it, and so on. In the end we had a 120 page architecture document which was quite sound - but not a single line of code. Guess, how the story goes on: We were late (of course) and expanded the team. The new team members managed to get into the ideas in a very short time, because we had great documentation. Yet, we were not able to meet reasonable time-to-market requirements and half a year later the project was canceled.

From today's perspective it would have been a better idea, to write less documentation, but concentrate on early delivery. Of course it would have taken more time for new people to come into the project, but maybe we could have done the project without expanding at all. So documentation solved a problem we wouldn't have run into, if we hadn't written it. -- JensColdewey

In my many years of experience, I have never really suffered because of a lack of project documentation. In addressing an inherited program, I'd prefer clear code and no docs to unclear code and good docs. Some short documentation on the key points of the program would be helpful, but one rarely gets that and it hasn't been deadly in my experience.

I have often suffered because existing documentation was bad, and I have often spent many dollars preparing documentation that was never used for anything important.

And don't get me wrong: I love writing, I love drawing pictures, I love telling people what my projects are all about. I'm not trying to get out of it: on a given day I'd just as soon write words as code.

I'm advocating "just enough" documentation, because although I've had problems like Alistair mentions below, they've not been at all deadly. The approach we use on C3 is called TechnicalMemo: Ward refers to it in Episodes: The notion is to find the key things to document and document them to high quality, rather than the broad brush multi-volume thing.

So, seriously, I've seen too much spent on documentation often ... and can't recall ever suffering severely because there was too little. Got any war stories in the other direction, gang? -- RonJeffries
Sure do. In August I walked into a conversation that had been going on for two months, about deposits and withdrawals and monthly rollups by geographic area and by deposit/withdrawal types, and the need for caching. I joined the discussion and we all felt very satisfied by the end of the week. Next month new program manager and new lead developer joined, and we had the discussion over again, and tried to recall all the twists and turns that had led us to our particular solution. Next month we repeated again. Three months later had to add some functionality and repeated again. Each time, we thought we were done for good, so it wouldn't be necessary to write the reasoning and the twists. Each time we were wrong. We were thoroughly pissed off every time we had to go through it again, but there was never any documentation.

That particular section just got hit the most, but the same happened in a couple of places on that project. Similar on my current project. I am now asking for one of the arguers to write down the main points of the argument after we have either hit it for about the third time, or the arguments are very long and twisty.

I also recall being saved many hours of explanation by documentation. I am considered something of a minimalist by most people (but probably excluding RonJeffries and KentBeck, who consider me a paper-thirsty bureaucrat!), so you can imagine it took quite a bit of convincing before I agreed to allocate two weeks to writing down a full architectural sketch of a workstation we were developing. At the time, there were only three of us on the team, so it really seemed a waste, and it hurt to stop development for two weeks. However, over the course of the next year, there was a steady stream of microcoders and software developers and contractors and new staff who needed to know about the thing, and it saved me many precious hours to be able to say, "read this document first, then ask me about any parts you need to know more on." Write once, read many. Similar is repeating on my current project, except there no one made me write anything (yet), so I may get to suffer the consequences of not writing.

-- AlistairCockburn

One serious question here is whether any document would have avoided the conversations. In my experience, the people who ask these questions don't get the answers from documents anyway. They take the document, [perhaps] read it, then come back and talk about it anyway. It's rare for a document to satisfy such inquiries, in my experience. -- RonJeffries

The one case I personally know about where the documentation did save, there were three of us on the project; two of us did most of the documentation. Whenever we'd get a question about the documentation, we treated it like a code bug. "Ok, this didn't quite work right. Where is the bug?" Whoever answered the call would look over the documentation, and try to figure out how to re-write it such that the person who called could read it, based on the explanation that worked with him (or, rarely, when we really goofed, her.) Then we'd get the other, and do a quick PairProgramming exercise. Of course, we lacked UnitTests, but we later tried implementing them via a CluelessCoworker, and met with some success, until we could no longer find a willing CluelessCoworker. -- EdGrimm
I wonder whether this would be a good job for a WikiClone? You can go narrative, yet not have to worry too much about imposing an order on content.

(Yes. I've used WikiWikiHyperCard for exactly that purpose. -- WardCunningham)

One thing that I wish I still had: there was once a DOS product called Lotus Agenda which was pretty neat. You'd enter notes into it and then build hierarchical categories of notes based upon key words. It would also assign notes to categories based on keywords that you enter. Multiple trees of classification could serve as views into the database, er, notebase. I only had the chance to use it once for organizing design thoughts, but it was good. You'd enter in all sorts of tidbits and rearrange order and views. I heard that writers used to use it. I wish I had it now for some of my writing, and I've thought several times about making something similar. -- MichaelFeathers
I'm not sure TooMuchDocumentation is really the heart of the problems described here, it's more like TooMuchProcess? or WrongEmphasis? or something.

The problem JensColdewey had wasn't so much that they had a 120-page document, it's that they spent six months on design without getting to implementation (which would no doubt have been a problem for them even in the case where the designers KeepItInTheirHeads). (By the way, I think there are many circumstances where spending half a year producing "nothing but" an excellent architecture document would be a brilliant choice; apparently this wasn't the case here if the project was scuttled within another half-year).

Projects can go awry for all sorts of reasons based on wasting time and resources on the wrong things, whether multi-hundred-page design documents or weeks on end debating CodingConventions or rapidly changing requirements or any number of other things, some self-imposed, some imposed from outside. Most of those things (requirements specifications, conventions, design documents...), in themselves, are not the problem, it's the misplaced emphasis.

Of course it's possible to have TooMuchDocumentation, in the same way and for the same reasons that you can have TooMuchCode?. The thickness of your design documentation (or requirements spec, test plan, coding standards...) is about as reliable a guide to the state of your code as the number of lines of code: quantity does not equate (or even correlate) with quality. And the ability to write lean and elegant code doesn't necessarily correlate well with the ability to write lean and elegant documentation.

However, the decision that a certain level of documentation must exist is often made without an equivalent allocation of capable resources, the result being an over-engineered document that is over-precise, over-detailed, over-long, and soon obsolete. Worse, given an iteration or two of the above, the wrong adjustment may be made, a decision to omit the documentation altogether. And yet surely, hyperbole aside, we've all been in situations where information which was at one point available has been lost by not being captured in suitable documentation, and we have regretted (and suffered as a result of) that loss.

I am in agreement with RonJeffries (alert the media!) that the TechnicalMemo approach to design and project documentation is the best middle road [...]. -- JimPerry

(I've moved further discussion to the TechnicalMemo page. -- DaveHarris)
Another reason for writing is for the good of the author, not the reader. The process of writing things down forces you to make explicit things that have been implicit. You have to look at the words on paper (or on the screen) and you think to yourself, "Is that really true?", in a way that you don't when talking. That's why the dissertation is so important (and painful) in getting a PhD. Of course, once you've written it you can throw it away, but that's another issue. -- SteveFreeman

Absolutely true, but I don't need prose for that. If I keep refactoring until the CodeSmells are gone, especially if I am focused on my responsibilities as a communicator with other humans while I am refactoring, then I get the same feeling described above. I will say that it was a dramatic revelation to WardAndKent the first time we wrote a LiterateProgram (ScrollControllerExplained?). I would recommend that every university student write a literate program or two. -- KentBeck

I've used prose to good effect in the way Steve describes. I'm not as far down the "thinking in Smalltalk" road as WardAndKent. There are various modes of thinking that have been identified, and I'm prepared to believe that for some people sometimes, a diagram or a paragraph may really be what will best get them in touch with the idea they're reaching for. And, I am continually surprised at how much the team and I can get done with nothing but our code and our CRC cards. It seems that they are close enough to universal that they work quite well, in my opinion more efficiently than doing what would be more "natural" to me, i.e. diagramming or writing something up to clarify my thoughts. -- RonJeffries
If it's hard to answer a frequently asked question, then it's time to do something about it: On a project I did in early 1999, every technical person who looked at our project asked why we used a union of several hundred sequences of structures rather than using the CORBA "any" type. "Performance" is the answer, but the questioners always found that answer insufficient. So we made a one-pager with graphs from timing runs we had run. Then, when the question came up, it was resolved quickly, without argument, and without getting anyone all bent out of shape. (OK, I admit it -- someone on the project was taking things way too personally. ;-)

I read somewhere that DonaldKnuth had lots of trouble getting certain "parameters" (say, "magic numbers?") just "right" for driving the word break / space filling logic. (Understandable, as it's a highly subjective "quality of presentation" issue.) He mentions putting a big comment of the form "don't you dare change any of these numbers unless you REALLY REALLY SERIOUSLY know what you're doing." [...I'm doing this off the top of my head; those are not his actual words.] I [JeffGrigg] think that a good LiterateProgramming thing to do at that point would be to write a short document (and a number of UnitTests!) to document and illustrate the tradeoffs and the desired results. -- JeffGrigg
I've experienced that the "heavy documentation requirements" of some formal methodologies often obscures the most important documentation, and can even discourage people from documenting the things that most need to be documented!

First, in a three-inch pile of diagrams describing your system, how do you find the few diagrams that are most important to your work? Sure, it's easy to find the "overview" diagrams, but help me quickly find the 6 most complex / difficult / conceptually challenging diagrams -- the ones you really need to understand successfully change the system. They're in there somewhere, but they can be hard to find!

Second, after spending an outlandish amount of time producing mounds of diagrams that won't be used, and having maintenance slowed to a painful crawl to keep such documents up-to-date, I find that no one is in any mood to add another few diagrams or documents that aren't strictly required by the formal methodology. But I've found on all systems I've worked on, that the concepts most critical to correctly understanding and maintaining the system can be expressed with a small number of creatively drawn diagrams. (Read "creatively drawn" as "not conforming to UML rules." ;-) Thus, producing mounds of UML-compliant documents discourages people from documenting the things that most need to be documented.

Ironic: By spending more on documentation, and getting more of it, you end up with less value from it. -- JeffGrigg

(I wrote the above shortly after being on a project that was intentionally killed by a customer who decided that they no longer wanted the project. To convince the external contractors to give up and break the contract, the customer did a number of things, including requiring that all classes be documented in Rose, including full textual documentation of all object attributes, methods, and parameters of methods. Ick; that was painful. -- JeffGrigg)
I fear that programming 101 in universities may be a source of some trouble here. If you force complete beginners to "always comment functions and variables" then that's just what they'll do. I've not seen anyone emphasise that the comments are supposed to be useful.("What would you want to know about the code?") If they're not useful, why would anyone write them? (Perhaps students think it's an academic exercise?) It's depressing marking a simple program script that's a third code and two-thirds comments. It's like an essay that also compiles ;-)

Showing students examples that use "syntactic commenting" may be easier for the instructor, but some students will religiously insert these syntactic comments. I want to discourage students from doing this:

  if (x==y) { /* The beginning of the if statement*/
 } /* the end of the if statement*/
The above is useless - the comments don't tell me anything new and besides, they make the code harder to read.

See MartinFowler's DistributedComputing article entitled "TheAlmightyThud" (November/December 1997). [Thanks to ThomasPatzke? for the reference.]

-- -- ThomasPatzke? excellent description of my first point. -- JeffGrigg
I usually describe my vision for good comments this way: Write your comments so that a non-programmer can 'read the green' and see what the code is supposed to do. (comments in VB show up green) ... I want the comments to tell me the story of how I'm proceeding through the program, not tell me stuff I could figure out from the code itself (like, End If). -- Lonna
"Read the green" sounds like a bad idea to me. Non-programmers can't realistically be expected to get much beyond a single level of abstraction, if that. If your code can be decently explained to non-programmers, "in the green," then it's likely to be badly structured. Either that or you have a particularly simple project. -- MarkSchumann
Isn't this all about having the right amount of documentation, neither too much, nor too little, that really helps the project move forward?
Document is mass. KeepMassLow.
Sometimes documentation is necessary. If you are providing a set of services to be used by more than a dozen other development teams, you simply have to provide comprehensive documentation of your interfaces. Otherwise you will be overwhelmed by minor queries, irrespective of how clear your function names are.

This is why library publishers print API manuals.

Well, yes, but typically it's better to generate the API manual and various help file formats from a single set of code comments, e.g. using DoxyGen, JavaDoc.

Please don't let's be silly about this. Professionally written and published libraries have API documents that clearly describe every library function in great detail and often include useful examples. There is usually a cross reference for similar and contrasting functionality. Most library books have multiple lists, sorted by name, by category, and sometimes by other useful criteria. This is far superior to code comments. Who amongst us would be willing to pay Good Money´┐Ż for a "professional" library without such docs? Do you expect a library publisher to tell you to "read the code"? Come on.

In my experience, the most used section of API documentation is the sample code on how to use the interface. I have also found that it is extremely risky to deviate very far from the sample code. One can read all of the API documentation and string things together in seemingly valid combinations and the software simply does not work. So in this case, yes the library publishers are telling you to "read the code." Perhaps one day we can get the library developers to ship the sample code as a unified unit test rather than have it scattered across the bowels of a web site.

"...the software simply does not work." And you didn't ask the library vendor for help? They didn't stand behind their product? I have encountered this problem and have been all over the library vendor like stink on pig. If you aren't going to hold the vendor's feet to the fire on the performance of their library then somebody is getting ripped off -- the end victim being your client.

At any rate, this has nothing to do with the value of proper docs crafted to accurately reflect how a good piece of commercial software functions. If there is a problem with the documents or a problem with the software then obviously something needs to be fixed. If things are working the way they are supposed to -- as they are documented -- then the manual is a priceless document for saving time and eliminating needless experimentation. This translates directly into quicker, more accurate development and lower development costs for your client. To seek anything less is unprofessional.

The vendor (a well-known database supplier) was selected by the client many years ago, and our project is a small one and one of many that are required to use this product. We have no option to "hold the vendor's feet to the fire;" the client would end our project before it would consider changing database vendors. As for contacting the vendor's support staff, we did. The only support option purchased by our client was an e-mail contact and we had to forward all questions through the holder of the support contract registration. The "support" we received was a different sample program from the original one we had found and it initialized the interface in a different manner. Each of the individual calls works as documented, however, it is near impossible to determine the full set of functions needed nor the correct sequence from the documentation. The one page sample program (the working one) was far more valuable than the entire API documentation library.

It sounds like you were in a loser situation right from the get-go. Too bad. Well, I guess you just have to decide whether it's worth sticking out a bad contract like that for the possibility that the client can be trained to use better products later on. Up to you.

Documentation has a value and a cost. Sometimes the only place behavior is written down in a sufficiently rigorous and meaningful way is in the working code. Any effort spent trying to keep non-code documents in sync is potentially wasted, or at least misdirected. That non-code document that doesn't express the meaning as accurately as the code is a potential source of conflict. In many cases, the only place that being correct really matters is the code. All other versions and descriptions of the function are really subservient to what the code says and how it behaves.

War story:

I was involved in a situation at a company where management chose to spend time and effort creating a document in an effort to reduce the time experienced developers were spending training other developers to use an in-house toolkit. We had produced javadoc for the toolkit. After paying the costs of creating and publishing the document (including a contract technical writer) the demands on the experienced developers shifted to answering questions about the document rather than the tools documented.

In fact I do expect users of an API to be able to read the code. CodeReading skills are generally far below what they should be. We wouldn't expect a good novel out of a writer who didn't have the skills to analyse Moby Dick (for example), why do we expect quality applications from developers who can't be bothered to learn to understand a body of well-written code?

-- StevenNewton

A suggestion for all you weary writers. I often find myself wishing I had more time to write clear docs, but the time to do so is never easy to find. However when new people are poking around trying to figure out what the project has been up to, they often find the existing docs are out of date. So the simplest solution I have found is to state the problem up front to the new people, and ask them to help maintain the documentation. Their perspective as a new user is close enough to CluelessCoworker to be useful.

As to what to document, I have found over the years that the most important level to get down is the interface. What my interfaces have for constraints, tricks, requirements and assumptions is key. I also want people to be aware of network traffic, round trips, and data load in all the bottlenecks of the system, so anything that passes through these corridors has extra scrutiny at every level.

A small aside: Language features like LINQ are evil. We need to know explicitly what is going on in those areas that have the highest impact on performance. Hiding them beneath layers of uncontrollable helper code is not good.

-- MichaelRempel

I see this performance hiding in almost every type of ObjectRelationalMapping framework. In my case, HiberNate. -- IanOsgood

One solution: LiterateProgramming.

See: LatherRinseRepeat, ThreeRingBinder, TheAlmightyThud


View edit of April 10, 2012 or FindPage with title or text search