This is a continuation of the blog post 12 Principles of Good Software Design.
4. An idiom should make your code easier to read
When you write code, you actually have two audiences, two “readers” who will be reading your software.
The first audience is the environment of compiler tools, preprocessing tools, analysis tools and other components of your development system which convert your code into a working application. That first audience is impartial, and only cares about the results of your authorship: does the compiled program run, does it meet the requirements, can testers break your code, does QA validate your code’s correctness.
For many of us, the reason why we got into software development was to write code for that impartial first audience. That audience knows no political values, is entirely emotionless. It allows us to solve difficult puzzles and tells us if our answers are right or wrong without any judgement beyond correctness. And for many of us, the fact that software development is one of the last professions which gives you a nearly guaranteed ticket to the upper-middle class is just a side benefit; it allows us to live in some degree of comfort as we solve increasingly interesting puzzles.
But there is a second audience for our code, one which many of us neglect.
Our fellow programmers.
And in many ways, writing code is very similar to writing an essay. You have related blocks of thought which are designed to hammer a single conclusion. You have varying ideas that are brought together into a single idea. You have unrelated topics that you may bounce around that eventually come together to tell a story.
All of this contributing to clarity and to communicating an idea.
Thus far, in the previous three essays covering making code simpler, making code concise and making code clear, we’ve incidentally also contributed to making code more readable. For this essay I’d like to concentrate on smaller idioms; on ways we can make code more readable that doesn’t necessarily employ using design patterns or on the choice of idioms as we refactor code.
One way we can make code more clear is in our choice of variable names.
It’s such a small thing, a variable name. But it can have a serious impact on our understanding of the code.
Now, compilers don’t care about variable names; only that you use those variables consistently. That’s how code obfuscation works with languages such as Javascript; the code remains the same, but the variables are renamed to obscure meaning to a human reader. As far as a compiler is concerned the declarations below:
#define __A long struct __a { __A __b; __A __c; __A __d; __A __e; };
is identical to the following declaration:
#define Integer long struct Rect { Integer left; Integer top; Integer width; Integer height; };
And if we were to write code that manipulates rectangles, either declaration would be fine, as far as the compiler is concerned.
Other readers of our code, however, may become puzzled when they see us manipulate rectangles using the __a declaration.
There are many naming conventions out there for variables, and many theories about why you should name variables using those different conventions. I don’t wish to come down on any side in those arguments, except to note three things.
First, a number of older conventions arose because of compiler limitations. For example, earlier C compiler linkers had limits on the maximum number of characters in a variable name that was stored. Some early C compilers only kept track of the first 6 characters or the first 32 characters of a variable name; thus, they could not tell the difference between aPointInTime and aPointOnTheLine, as compilers only storing the first 6 characters for a variable name would store aPoint for both variables.
(This is why in some early programs you see variable names like a11PointInTime or a12PointOnTheLine.)
I don’t believe any of these early tool chains exist anymore, so naming conventions which attempt to make the first N characters of a variable name (for some number N) no longer are relevant.
Hungarian Notation also arose from compiler limitations; in this case, from limitations in the type system of earlier compilers which could not distinguish between different variable types. However, again, earlier compilers have been replaced by compiles with a stronger type system, and variable naming conventions such as these place an additional burden on the author of the code (and by extension, on the reader) that is no longer quite as important.
Second, there are limits on the ability for a reader to read long variable names. There is a reason why Einstein’s famous equation is not written:
but rather, in the more compact form:
This, despite the fact that the former representation is arguably more informative.
Book layout editors and newspaper authors both know that the human eye can only see so much, can only scan so much distance across a line of text and successfully return to the next line, can only see so much text in a single word. Words like “antidisestablishmentarianism” exist mostly as curiosities in an essay such as mine where the average word length is less than 6 characters.
It’s not to suggest that we should go to the other extreme, that we should go to a FORTRAN-style naming convention or a naming convention from early forms of BASIC where all variable names were kept to two characters or less. But it is to suggest that a concise variable name may be easier for someone to follow–assuming the name is descriptive enough to describe what the variable holds.
In some places, I personally use short variable names. I grew up in an era where integer loop variables were i, j or k; sometimes I abbreviate temporary variables holding the length of a thing len, or a variable holding a single character as ch. But these are immediate temporary variables used only within local scope, and by using such short variable names I hope to distinguish between them and class-scope variables such as submitButton or cachedLength.
But this is my style, just as my essay writing style incorporates run-on sentences, sometimes punctuated with short sentences to drive a point home. There is a place for personal style.
So long as we convey the idea we wish to convey in a clear and concise manner, we’ve accomplished our purpose.
Third, consistent naming conventions help with understanding. You can see this all throughout Apple’s development libraries. From the earliest days of Inside Macintosh, Apple used standard naming conventions for their toolbox functions: open balances with close, new balances with dispose, create balances with delete, begin balances with end.
In more recent versions of MacOS X and iOS prior to ARC, method names starting with alloc, new, or containing copy would pass ownership to the caller and would need to be released either using the release or autorelease methods. This naming convention is enforced by ARC and is also enforced by Apple’s Xcode analysis tools.
Naming conventions are important, because they enforce consistency and help developers maintaining your code remember what are the correct methods to call when extending your code. It’s not necessary to use Apple’s naming conventions; what is important is to stick to a naming convention in a consistent fashion throughout your code.
If you use a naming convention such as the one used in Java, where routines that start with ‘get’ get the contents of a variable name (without side effects), and ‘set’ sets the value of a variable name (without meaningful side effects), then stick to that naming convention. Don’t suddenly create a new class like the following:
public class MyThing { private int __value; public int value() { return __value; } public void updateValue(int v) { __value = v; } }
Also, it’s important to be aware of any naming conventions in the environment you are using. For example, while Java uses ‘get’ as the getter for a variable held by a class, Objective C simply uses the name of the variable:
@implmentation MyThing - (void)setValue:(int)v { self.internalValue = v; } - (int)value { return self.internalValue; } @end
If you are creating a class which manages resources, you may wish to use a consistent naming convention throughout. So, for example, like in the Macintosh System Toolbox, creating a window is done using ‘newWindow’; releasing is done with ‘disposeWindow’. Creating a new resource is done using ‘newResource’; releasing with ‘disposeResource’. And this consistency allows a new developer who sees a chunk of code creating a new scratch database with the method ‘newScratchDatabase’ can know with some degree of certainty the release call will be ‘disposeScratchDatabase.’
Beyond just naming conventions there are other ways you can aid with readability of your code. Like this essay, code has “sections”, “paragraphs” (which group functionality), and “sentences” or statements which describe an idea. Your choice of how you organize your sentences within a paragraph can aid in readability, even if you do not choose to refactor your code into separate functions or by using a design pattern. White space can be a powerful tool in comprehension.
For example, take the following code from a previous essay:
- (void)viewDidLoad { [super viewDidLoad]; /* * Set up the appearance of our interface */ self.firstButton.layer.borderColor = [UIColor lightGrayColor].CGColor; self.secondButton.layer.borderColor = [UIColor lightGrayColor].CGColor; self.firstButton.layer.borderWidth = 1; self.secondButton.layer.borderWidth = 1; self.thirdButton.layer.borderWidth = 2; self.thirdButton.layer.borderColor = [UIColor darkGrayColor].CGColor; self.firstButton.layer.cornerRadius = 6; self.secondButton.layer.cornerRadius = 6; self.thirdButton.layer.cornerRadius = 6; }
The clump of code at the bottom initializes three buttons, one representing a “default” button. Our previous solution involved refactoring the code by removing common functionality into a separate method call.
But we can make our code above more readable simply by reordering our statements and adding some white space:
- (void)viewDidLoad { [super viewDidLoad]; /* * Set up the appearance of our interface */ self.firstButton.layer.borderColor = [UIColor lightGrayColor].CGColor; self.firstButton.layer.borderWidth = 1; self.firstButton.layer.cornerRadius = 6; self.secondButton.layer.borderColor = [UIColor lightGrayColor].CGColor; self.secondButton.layer.borderWidth = 1; self.secondButton.layer.cornerRadius = 6; self.thirdButton.layer.borderColor = [UIColor darkGrayColor].CGColor; self.thirdButton.layer.borderWidth = 2; self.thirdButton.layer.cornerRadius = 6; }
This is far more readable than the previous example, and all we did was clump calls which affected buttons into their own block of code, reordering the statements so that they match, and adding white space to help visually separate clumps of code.
In fact, this should be the first step in refactoring our code into a separate method call.
Run-on statements can similarly be broken down in order to aid with understanding. With earlier compilers, run-on statements helped the compiler optimize for performance, but today’s optimizing compilers do not require collapsing statements into more compact forms in order to optimize the code. In fact, certain statement representations may hinder code optimization, and rewriting run-on statements can help the compiler by reducing statement complexity.
Take, for example, the following statement which draws text within a rectangle:
- (void)draw { [UIColor.blackColor setStroke]; [[UIBezierPath bezierPathWithRect: CGRectMake(39, 31, 88, 30)] stroke]; [@"This is a test." drawInRect: CGRectMake(39, 31, 88, 30) withAttributes: @{ NSFontAttributeName: [UIFont fontWithName: @"HelveticaNeue" size: 12], NSForegroundColorAttributeName: UIColor.blackColor }]; }
While compact, this code is hard to follow, and has the problem of repeating code which initializes the rectangle we are drawing into. We can aid in the readability of this code by expanding the run-on statements:
- (void)draw { CGRect rect = CGRectMake(39, 31, 88, 30); [UIColor.blackColor setStroke]; UIBezierPath *path = [UIBezierPath bezierPathWithRect: rect]; [path stroke]; UIFont *font = [UIFont fontWithName: @"HelveticaNeue" size: 12]; UIColor *color = UIColor.blackColor; NSDictionary *attrs = @{ NSFontAttributeName: font, NSForegroundColorAttributeName: color }; [@"This is a test." drawInRect: rect withAttributes: attrs]; }
While taking more lines, we can now see where we are creating our rectangle path, where we are creating our font, our color for our text, and we can now understand all the side effects of our code. We even collapsed the two rectangle creation statements into one, thereby reducing the potential for error.
From the latter representation we now have a better understanding–and we can contemplate further refactoring steps, such as passing in the text or rectangle as parameters, or manipulating the location of the text to be centered in the rectangle. We could also obtain the text attributes from a separate function call (so we consistently use the same text attributes in other places in our code).
Sometimes smaller isn’t better.
In the end it doesn’t matter the naming conventions you use. What is important is that the convention is used with consistency; consistency aids with readability.
And as long as you remember that you are writing for an audience of fellow programmers, and write your code in such a way as to make your code easier to read, then you can help your audience in their understanding as they take over your source kit to maintain it in the future.