Code Formatting Patterns

Code formatting provides you with many opportunities to subtly communicate your intent to a reader. Far from being a backwater best left to draconian "style guides", code formatting is often your reader's first encounter with your system. It deserves attention and care.

The following patterns balance the forces affecting code formatting:

* Quick readability. The reader should be able to understand the gross structure of the code in a glance. The shapes made by blocks of text help you communicate overall structure.

* Detailed readability. The reader must be able to read lots of code in depth.

* Length. The number of lines must be minimized so that browsers can be as small as possible.

* Width. The width of code must be minimized. Code must not ordinarily spill across the right margin, requiring horizontal scrolling or destroying the shape of the text with line wrapping.

The patterns are:

1. Type Suggesting Parameter
2. Indented Control Flow
3. Rectangular Block
4. Guard Clause
5. Simple Enumeration Parameter
6. Interesting Return Value


1. Type Suggesting Parameter Name

You have settled on a method selector, either a Testing Method (?), a Converting Method (?), or ...

What should you call a method parameter?

There are two important pieces of information associated with every variable- what messages it receives (its type) and what role it plays in the computation. Understanding the type and role of variables is important for understanding a piece of code.

Keywords communicate their associated parameter's role. Since the keywords and parameters are together at the head of every method, the reader can easily understand a parameter's role without relying on the name.

Smalltalk doesn't have a strong notion of types. A set of messages sent to an object appears nowhere in the language or programming environment. Because of this lack, there is no direct way to communicate types.

Classes sometimes play the role of types. You would expect a Number to be able to respond to messages like +, -, *, and /; or a Collection to do: and includes:.

Therefore, name parameters according to their most general expected class, preceded by "a" or "an". If there is more than one parameter with the same expected class, precede the class with a descriptive word.

An Array which requires Integer keys names the parameters to at:put: as:

at: anInteger put: anObject

A Dictionary, where the key can be any object, names the parameters:

at: keyObject put: valueObject

After you have named the parameters, you are ready to write the method. You may have to use Temporary Variables (?). You may need to format an Indented Control Flow (2). You may have to use a Guard Clause (4) to protect the execution of the body of the method.


2. Indented Control Flow

You are writing a method following Type Suggesting Parameter Name (1).

How do you indent messages?

The conflicting needs of formatting to produce both few lines and short lines is thrown in high relief with this pattern. The only saving grace is that Composed Method (?) creates methods with little enough functionality that you never need to deal with hundreds or thousands of words in a method.

One extreme would be to place all the keywords and arguments on the same line, no matter how long the method. This minimizes the length of the method, but makes it difficult to read.

If there are multiple keywords to a message, the fact that they all appear is important to communicate quickly to a scanning reader. By placing each keyword/argument pair on its own line, you can make it easy for the reader to recognize the presence of complex messages.

Arguments do not need to be aligned, unlike keywords, because readers seldom scan all the arguments. Arguments are only interesting in the context of their keyword. (This would be a good place for a diagram with an arrow going down the keywords in order to read at:put:, and another scanning left to right as the reader understand the message and its arguments.)

Therefore, put zero or one argument messages on the same lines as the receiver.

	foo isNil
	2 + 3
	a < b ifTrue: [...]

Put the keyword/argument pairs of messages with two or more keywords each on its own line, indented one tab.

	a < b
		ifTrue: [...]
		ifFalse: [...]
	array
		at: 5
		put: #abc

Rectangular Block (3) formats blocks. Guard Clause (4) prevents indenting from marching across the page.


3. Rectangular Block

Indented Control Flow (2) has shown you where a block begins.

How should you format blocks?

Smalltalk distinguishes between code which is executed immediately upon the activation of a method and code whose execution is defered. To read code accurately, you must be able to quickly distinguish which code in a method falls into which category.

Code should occupy as few lines as possible, consistent with readability. Short methods are easier to assimilate quickly and they fit more easily into a browser. On the other hand, making it easy for the eye to pick out blocks is a reasonable use of extra lines.

One more resource we can bring to bear on this problem is the tendency of the eye to distinguish and interpolate vertical and horizontal lines. The square brackets used to signify blocks lead the eye to create the illusion of a whole rectangle even though one isn't there.

Therefore, make blocks rectangular. Use the square brackets as the upper left and bottom right corners of the rectangle. If the statement in the block is simple, the block can fit on one line:

	ifTrue: [self recomputeAngle]

If the statement is compound, bring the block onto its own line and indent:

	ifTrue:
	    [self clearCaches.
	    self recomputeAngle]

Guard Clause (4) will prevent the indenting from getting out of hand. Composed Method (?) keeps methods simple, also preventing excessive indentation. Simple Enumeration Parameter (5) keeps enumeration blocks readable.


4. Guard Clause

Indented Control Flow (2) and Guard Clause (3) produce methods where indentation communicates the gross structure of the method.

How should you format code which shouldn't execute if a condition holds?

In the bad old days of Fortran programming, when it was possible to have multiple entries and exits to a single routine, tracing the flow of control was a nightmare. Which statements in a routine got executed when was impossible to determine statically. This lead to the commandment "Every routine shall have one entry and one exit."

Smalltalk labors under few of the same constraints of long ago Fortran, but the prohibition against multiple exits persists. When routines are only a few lines long, understanding flow of control within a routine is simple, it is the flow between routines that becomes the legitimate focus of attention. Multiple returns can simplify the formatting of code, particularly conditionals. What's more, the multiple return version of a method is often a more direct expression of the programmer's intent.

Therefore, format conditionals which prevent the execution of the rest of a method with a return. Let's say you have a method which connects a communication device only if the device isn't already connected. The single exit version of the method might be:

connect
	self isConnected
		ifFalse: [self connectConnection]

You can read this as "If I am not already connected, connect my connection." The guard clause version of the same method is:

connect
	self isConnected ifTrue: [^self].
	self connectConnection

You can read this as "Don't do anything if I am connected. Connect my connection." The guard clause is more a statement of fact, or an invariant, than a path of control to be followed.

You may need to return a Nil Return Value (?) to signal an unusual condition.


5. Simple Enumeration Parameter

You have formatted your code with Rectangular Block (3).

What should you call the parameter to an enumeration block?

It is tempting to try to pack as much meaning as possible into every name. Certainly, classes, instance variables, and messages deserve careful attention. Each of these elements can communicate volumes about your intent as you program.

Some variables just don't deserve such attention. Variables that are always used the same way, where their meaning can be easily understood from context, call for consistency over creativity. The effort to carefully name such variables is wasted, because no non-obvious information is communicated to the program. They may even be counter-productive, if the reader tries to impute meaning to the variable that isn't there.

Therefore, call the parameter "each". If you have nested enumeration blocks, append a descriptive word to all parameter names.

For example, the meaning of "each" in

	self children do: [:each | self processChild: each]

is clear. If the block is more complicated, each may not be descriptive enough. In that case, you should invoke Composed Method (?) to turn the block into a single message. The Type Suggesting Parameter Name (1) in the new method will clarify the meaning of the object.

The typical example of nested blocks is iterating over the two dimensions of a bitmap:

	1 to: self width do:
		[:eachX |
		1 to: self height do:
			[:eachY | ...]]

Nested blocks that iterate over unlike collections should probably be factored with Composed Method (?).

You may need Composed Method (?) to simplify the enumeration block.


6. Interesting Return Value

???

When should you explicitly return a value at the end of a method?

All messages sends return a value. If a method does not explicitly return a value, the receiver of the message is returned by default. This causes some confusion for new programmers, who may be used to Pascal's distinction between procedures and functions, or C's lack of a definition of the return value of a procedure with no explicit return. To compensate, some programmers always explicitly return a value from every method.

The distinction between methods which do their work by side effect and those that are valuable for the result they return is important. An unfamiliar reader wanting to quickly understand the expected use of a method should be able to glance at the last line an instantly understand whether a useful object is generated or not.

Therefore, return a value only when you intend for the sender to use the value.

For example, consider the implementation of topComponent. Visual components form a tree, with a ScheduledWindow at the root. Any component in the tree can fetch the root, by sending itself the message "topComponent". VisualPart implements this message by asking the container for its topComponent:

	VisualPart>>topComponent
		^container topComponent

ScheduledWindow implements the base case of the recursion by returning itself. The simplest implementation would be to have a method with no statements. It would return the receiver. However, using Interesting Return Value, because the result is intended to be used by the sender, it explicitly returns "self".

	ScheduledWindow>>topComponent
		^self


Author: Kent Beck
Copyright 1995, First Class Software, Inc. All rights reserved

This document served by the Portland Pattern Repository