TModeling

TModeling.WritingSelf-DocumentingCoder1.1 - 19 Jun 2008 - 11:33 - Main.guesttopic end

Writing Self-Documenting Code

These sections are based on the book "Code Complete 2", Chapter 32, Self-Documenting code

Introduction to Self-Documenting Code

Good documentation is one sign of the professional pride that a programmer takes in producing high quality programs.

There are two types of documentation, External in the form of design documents which describe what is to be implemented and internal where the code itself acts as it's own documentation. This discussion will focus exclusively on the latter type.

The main goal of self-documenting code isn't writing good comments but instead the use of good programming style. Which makes the code easy to read and understand. Style includes the use of good program structure, straight forward approaches, good variable names, good routine names, use of named constansts instead of literals, clear layout and formatting, and minimizing complexity (both in control-flow and in data structures).

Bad Example

Here is an example of bad style written in Java

   for(i=2;i<=num;i++){
   meetsCriteria[i]=true;
   }
   for(i=-2;i<=num/2;i++){
   j=i+i;
   while(j<=num){
   meetsCriteria[j]=false;
   j=j+1;
   }
   }
   for(i=1;i<=num;i++){
   if(meetsCriteria[i]){
   System.out.println(i+" meets criteria." );
   }
   }

Problems:

The lack of white space makes it hard to read
The lack of indentation makes it hard to infer logical structure
The lack of good variable names makes it hard to figure out what the code does
The too generic array name meetsCriteria could be anything
In short, it's hard to figure out the purpose of this snippet of code

A Good Example

Here is an example of the same code written with good style (and NO comments) in Java

   for ( primeCandidate = 2; primeCandidate <= num; primeCandidate++ ) {
      isPrime[ primeCandidate ] = true;
   }

   for ( int factor = 2; factor = (num / 2); factor++ ) {
      int factorableNumber = factor + factor;
      while ( factorableNumber <= num ) {
         isPrime[ factorableNumber ] = false;
         factorableNumber = factorableNumber + factor;
      }
   }

   for ( primeCandidate = 1; primeCandidate <= num; primeCandidate++ ) {
      if ( isPrime[ primeCandidate ] = 1 ) {
         System.out.println( primeCandidate + " is prime." );
      }
   }

In this 2nd fragment, after a little examination, we see that this code has something to do with prime numbers and factors. It becomes obvious that this is a fairly standard implementation of the Sieve of Erathosthenes algorithm for finding prime numbers. Notice that the difference between the two code fragments has nothing to do with comments, As there were none in either example. However, the 2nd fragment is more readable.

Even better?

Actually, I (-- JackSnoeyink - 19 Jun 2008) think the comments on the "Good example" shows immediately the limitation of self-documenting code: it only expresses things at the level of abstract at which the code is written. It should not have to "become obvious that this is a fairly standard implementation of the Sieve of Erathosthenes" -- the comment should state this, so that the code can be chunked and understood at this higher level of abstraction, and there should be a test function.

In fact, the long variable names obscure the fact that the code above is inefficient -- you should apply the sieve only when factor is prime! E.g. you needn't cross off multiples of 4 when you've already done multiples of 2.

Here is the MATLAB version, just to be different.

%% find the primes from 1:n by the Sieve of Eratosthenese
sieve = ones(1,n);             % initialize sieve to all trues
sieve(1) = false;              % except for 1.
for k = 2:floor(n/2)
   if sieve(k)                 % if k is prime
       sieve(2*k:k:n) = false; % cross off all larger multiples of k
   end
end
primes = find(sieve);          % list of primes to return

Always explain the inputs, outputs, and invariants for a piece of code. Comment at one or more levels of abstraction above what you are writing. (Ask yourself, "And the reason I'm doing this is because...")

Self Documenting Code Check List:

Classes:

Does the class interface present a consistent abstraction?
Is the class well named? Does the name describe the central purpose of the class?
Does the class's interface make it obvious how you should use the class?
Is the class's interface abstract enough that you don't have to think about how it's underlying methods are implemented? Can you treat the class as a black box conceptually and in practice?

Routines: (Functions/Procedures/Methods)

Does each routine's name describe exactly what the routine does?
Does each routine perform one well-defined task?
Have all parts of each routine that would benefit from being put into their own routines been put into their own routines?
Is each routine's interface obvious and clear?

Data Names:

Are type names descriptive enough to help document data declarations?
Are variables named well?
Are variables only used for the purpose for which they were originally named?
Are loop counters given more informative names than i,j, and k?
Are well-named enumerated types used instead of makeshift flags or boolean variables?
Are named constants used instead of magic number literals or magic string literals?
Do naming conventions distinguish among type names, enumerated types, named constants, local variables, class variables, and global variables?

Data Organization:

Are extra variables used for clarity when needed?
Are references to variables close together?
Are data types simple so that they minimize complexity?
Is complicated data accessed through abstract access routines (Abstract data types) instead of directly?

Control:

Is the primary path through the code clear?
Are related statements grouped together?
Have relatively independent groups of statements been packaged into their own routines?
Does the normal case (most frequently executed) follow the if rather than the else?
Are control structures simple so that they minimize complexity?
Does each loop perform one and only one function, as a well-defined routine should do if it replaced the loop?
Is nesting minimized?
Have boolean expressions been simplified by using additional boolean variables, boolean functions, and decision tables?

Layout:

Does the program's layout show its underlying logical structure?

Design:

Is the code straightforward? Is it easy to understand? Does it avoid clever programming tricks which mask the underlying algorithm?
Does the code rely on language specific side-effects?
Are implementation details hidden behind a well-defined class interface or routine call?
Is the program written in terms of the actual problem domain as much as possible rather than in terms of computer-science or programming language structures?

Good Commenting Technique:

General:

Can someone pick up the code and immediately start to understand it from just the comments?
Do comments explain the code's intent or summarize what the code does rather than just repeating the code?
Is the Psuedocode Programming Process used to reduce commenting time?
Has tricky code been rewritten rather than commented?
Are comments up to date?
Are comments clear and correct?
Does the comenting style allow comments to be easily modified?
Before the code ships, have all marker comments been removed and properly dealt with?

Individual Line Statements and Paragraphs:

Does the code avoid endline comments?
Do comments focus on why rather than how?
Do comments prepare the reader for the code to follow?
Does every comment count? Have redundant, extraneous, and self-indulgent comments been removed or improved?
Are surprises documented?
Have abbreviations in comments been avoided?
Is code that works around an error or undocumented features commented?

Data Declarations:

Are units on data declarations commented?
Are the ranges of values on numeric data commented?
Are coded meanings commented?
Are limitations on input data commented?
Are flags documented to the bit level?
Has each global variable been commented where it is declared?
Has each global variable been identified as such at each usage, by a naming convention, a comment, or both?
Are magic numbers replaced with named constants, enumerations, or varaibles rather than just documented?

Control Structures:

Are comments inserted at their natural spots (control structures or at the beginning of long blocks of code)?
Are the ends of long or complex control structures commented and/or when possible simplified to avoid the need for comments?

Routines:

Is the purpose of each routine commented?
Are other important facts about the routine given in comments, when relevant, including input and output data, interface assumptions, limitations, error corrections, global side-effects, and sources of algorithms?

Files, Classes, and Programs:

Does the program have a short usage document?
Does the program have a short design document?
Is the purpose of each file described in a block header at the beginning of each file?
Is the author's contact info included in the block header for each file?
Does each class have a consistent commenting layout used throughout the interface?
Do implementation details of the class remain hidden (IE not spelled out in the interface comments)?

-- ShawnDB - 15 Jun 2008
to top

End of topic
Skip to action links | Back to top

Edit | Attach image or document | Printable version | Raw text | More topic actions
Revisions: | r1.4 | > | r1.3 | > | r1.2 | Total page history | Backlinks

You are here: TModeling > Software > CodeGuide > StyleGuide > WritingSelf-DocumentingCode

to top