Clean Code Club
Clean Code: Formatting
We’ve now moved on to the fifth chapter of Robert C. Martin’s book “Clean Code”,1 which discusses code formatting. For developers, this is the war to end all wars. Tabs or Spaces? Do braces go on the same line or the next? Let’s dig down into why it even matters. As before, I’ll be forming my notes around the discussion points for the Clean Code Club meeting.
So first, before I launch into this, I want to make my position clear that in my world code readability is king. It should be as easy as it possibly can be to understand what some code is doing. There’s a common phrase that code is “written once, read many times”. This is obviously untrue, because code is both written and read many times.
There is a mental burden when reading code. You have to build a map of various classes and what they do. Hand-execute code to understand the algorithms. Anything that reduces this mental burden makes it easier to write and maintain code. You can work faster and make fewer mistakes. Ultimately, I believe the goal of Clean Code is to write code that minimises this mental burden. That’s why we’ve been talking about functions, comments, and naming in the previous chapters of this book.
Now we’re talking about formatting. In my mind, formatting is one of the most important parts of code readability. The code should look like you expect it to look. Sure, there’s a bunch of technical reasons for strict formatting rules; to avoid VCS conflicts, for example. Ultimately though, in my opinion formatting is here to serve readability, and Readability Is King.2
Are there any of Martin’s reasons for using formatting that you hadn’t considered or minimised?
In a previous article, I laid out a case for grouping your public and private variables in a given class. Specifically, what I said was:
I strongly prefer to have all my
public
functions at the top of the class, because they are providing a narrative to the broader world about what this class can do (or what you can do with this class). Theprivate
methods are not part of that narrative; they are part of an inner narrative. I consider the first narrative more important.
In this chapter, Martin talks about a “Newspaper Metaphor”.
We would like a source file to be like a newspaper article… The topmost parts of the source file should provide the high-level concepts and algorithms. Detail should increase as we move downwards, until at the end we find the lowest-level functions and details in the source file.
I think the newspaper analogy is particularly apt, because with these articles you get the headline and first paragraphs which summarise the story. In a class, that would be the public methods. In a newspaper, they’re telling you what the story is about. In a class, they’re telling you what the class can do. It’s something I was already doing, but I hadn’t really thought about the newspaper analogy.
However, later Martin talks about Dependent Functions and Conceptual Affinity. In the former, it is argued that Dependent Functions should be grouped together and appear underneath the things that use them. Conceptual Affinity refers to grouping things that have a similar concept together.
The problem I have here is that this might well lead you to mix your public API with your internal functions. This is actually something that I see quite often. For example, see the code snippet below from Apache commons-io.3
public static String removeExtension(final String fileName) {
if (fileName == null) {
return null;
}
requireNonNullChars(fileName);
final int index = indexOfExtension(fileName);
if (index == NOT_FOUND) {
return fileName;
}
return fileName.substring(0, index);
}
private static String requireNonNullChars(final String path) {
if (path.indexOf(0) >= 0) {
throw new IllegalArgumentException(
"Null byte present in file/path name. There are no known legitimate use cases for such data, but several injection attacks may use it");
}
return path;
}
public static String separatorsToSystem(final String path) {
return FileSystem.getCurrent().normalizeSeparators(path);
}
Going back to the newspaper analogy, this is a bit like the introductory text veering off into specific details, then cutting back to high-level overviews. Martin has this to say on that subject.
Have you ever chased your tail through a class, hopping from one function to the next,scrolling up and down the source file, trying to divine how the functions relate and operate, only to get lost in a rat’s nest of confusion?
For those concepts that are so closely related that they belong in the same source file, their vertical separation should be a measure of how important each is to the understandability of the other. We want to avoid forcing our readers to hop around through our source files and classes.
It is very hard to argue against that. Regardless, I am going to try.
Firstly, I want to tackle the concept of Conceptual Affinity. This is something I am pretty keen on. Getters and setters4 are grouped. If I implement an interface, I group the interface’s methods together. Outside of this “obvious” grouping I also attempt to group conceptually related methods together (e.g. I would group the methods shapeFromPoints(Point[] points)
and shapeFromLines(Line[] lines)
). However, this is an easier thing to maintain if internal (private) methods are not intermingled with the public ones. In fact, I’d consider that a violation of grouping by conceptual affinity. The public API is also a conceptual group.
Doesn’t that imply that there’s multiple conceptual groupings in a class, some of which extend beyond the functions and their relationships? What is the hierarchy? Which conceptual groupings take priority? How can you, as the person writing the code, predict which conceptual groupings will be the most obvious to the person maintaining it?
The second point I want to make is about how intermixing public and private methods can make a class more fragile when changes need to be made. If some new method is added that also uses the dependent function, do we then move the dependent function? Where to? Maybe it’s as simple as putting both functions together above the dependent one, but that might create some awkward code as well. There’s also a chance that a maintainer might mistakenly break up two methods that have been put together. Perhaps they need to add a simple function to an existing class. Should this developer be required to identify and understand the relationships between existing methods before they can add their new function?5
Ordering a class’s functions firstly by their access modifiers seems to me to be a good place to start with resolving some of these issues. Doing so makes the API of each class the top priority in terms of grouping the code, which I think is a reasonable approach to take. Inner workings of the class being pushed to the bottom gets the details out of the way.
All that said, I still think this is a judgement call for the person writing the code. This sort of thing is highly sensitive to context. Whatever makes the code easiest to read is what matters most.
Are there any reasons for using formatting you can think of that aren’t listed here?
Martin talks a lot in this chapter about formatting the code to make it presentable, and ordering the code to put important and related information together. At the start of the chapter, Martin states that:
Code formatting is about communication, and communication is the professional developer’s first order of business.
While I agree with this, I don’t think that digs deeply enough into what communication means in this context. It’s certainly alluded to at several points in this chapter.
Have you ever chased your tail through a class, hopping from one function to the next, scrolling up and down the source file, trying to divine how the functions relate and operate, only to get lost in a rat’s nest of confusion? Have you ever hunted up the chain of inheritance for the definition of a variable or function?
Martin is talking here about the process of building up a mental map of what the code does. This mental map is critical to understanding any code, and it takes effort to build. Hunting around looking for bits of code makes this map harder to build and maintain, and more likely to contain errors.
Later in the chapter, Martin talks about teams needing to agree on a style.
Remember, a good software system is composed of a set of documents that read nicely. They need to have a consistent and smooth style. The reader needs to be able to trust that the formatting gestures he or she has seen in one source file will mean the same thing in others. The last thing we want to do is add more complexity to the source code by writing it in a jumble of different individual styles.
I agree this is important, but is it possible to form a deeper understanding of why? Martin devotes a meager 4 paragraphs to his assertion that everyone on the team should use the same style guide. Surely there’s more at play here? What actually happens when we encounter code that doesn’t match our style guide, or doesn’t match our expectations?
It is my opinion that when this happens we have to make a context switch. We move from one state of flow, where we are reading code that we understand and building our mental map, to suddenly having to take a full-stop and mull over the oddity we’ve just encountered. Once deciphered, it is added to the mental map and we resume the previous process of working through the code.
Context switches are dangerous, and should be avoided as much as possible. I know this because it was a central theme of my PhD.6 Perhaps exploring this sort of thing would be overkill for a chapter about the benefits of formatting code. Yet this chapter also ignores considerable amounts of other, more directed academic research into the relationship between code formatting and readability in favour of personal experience and anecdotes. I think more could have been done here.
Returning to the importance of “Team Rules”, my personal rule is to go a step further. I don’t know of any programming language that doesn’t have either an officially-endorsed or a de facto standard coding convention. It is my opinion that all projects should aim to use the coding conventions of the language. Otherwise all the training material, StackOverflow posts, etc. will be written in an unfamiliar format. New team members will have to learn an alien coding convention for a language they feel they already know. Of course, sometimes the standard conventions can be very silly, so exceptions can be made.7
What kind of formatting rule here do you find most beneficial, and why?
The most beneficial rule I see here is the comments about where to put variables. Group class variables at the top of the class. Local variables go next to where they’re used as much as possible. The data owned by a class is a critical ‘conceptual’ group. Spreading this out over a class is a cardinal sin as far as I’m concerned, and unfortunately one I see far too often.
One notable problem with this is anonymous classes, especially ones that are class variables. This can lead to putting class logic above the constructors, which violates Martin’s rules about code being written in the order that its called. I generally try to avoid anonymous classes for this reason, although I’ve found that Java’s lambda expressions often allow me to abstract this away into code that calls a method in the class instead, which isn’t so bad (as shown below).8
private final HumanityManager humanityManager;
private final HumanityThreatListener threatListener = this::destroyAllHumans;
public SkynetDefenceManager(HumanityManager humanityManager){
this.humanityManager = humanityManager;
humanityManager.addThreatListener(threatListener);
}
private void destroyAllHumans(){
humanityManager.destroy();
}
Are there any standard formatting rules, consensus, or auto-implemented adjustments by your formatter that you find actually make it harder to read code?
This one’s tricky as the things I might want to criticise here generally fall under language design or usage, rather than code formatting. I’m not a fan of some conventions, such as opening braces being on their own line. Even so, I try to use the conventions of the language I’m working in. The languages that I currently use day-to-day I’ve been using for so long that I struggle to think of anything that bothers me.
I can instead offer a little story. I once had some code in a simple ternary block. I can’t remember the exact code, but it was probably something like this:
return condition ? a & (b | c) : 0;
I had compiled this code, tested it, and everything worked. I formatted the code and submitted it to the VCS. Later, my tech lead pulled me aside and said “Hey, the code you committed is broken, can you check this line?”. The code read as follows:
return condition ? (a & b | c) : 0;
I was pretty confused, because I knew that was wrong. I checked the code out, fixed the bug, reformatted, and - wait a minute, the bug is back! It turned out that the formatter had some rules about parentheses around long expressions in ternary operators that were getting confused by the presence of bitwise operators. The result of this confusion was the moving of the left parenthesis, changing the order of the operations. In other words, the formatter was adding a bug.
I think that’s all I have to say about this chapter. Unusually, I think I agree with everything Martin says in this chapter. I just think it’s a little bit “surface level” and could have benefitted from a bit more of a deep-dive, or at least more backup from academic sources about the costs of poor code formatting.
-
Martin, R. C.(2009). Clean Code: A Handbook of Agile Software Craftsmanship. Upper Saddle River, NJ, Prentice Hall. ↩
-
Readability Rex? ↩
-
From the class
FilenameUtils.java
from Apache commons-io, available on GitHub. ↩ -
Or Accessors and Mutators if you prefer, although I know no-one who does. ↩
-
In my opinion, yes, absolutely! It’s a question worth asking, but if you are maintaining a class you should be able to figure out what most or all of it does before you start hacking away. ↩
-
If terminally bored you can read my PhD Thesis, and all my published papers, over on my publications page. The Literature Review chapter of my PhD Thesis covers context-switching in great detail. ↩
-
Such as Python’s insistence on using 4 spaces as indentation and forbidding tabs. ↩
-
Obviously this is something that could always be done, but it’s very ugly without lambda expressions. ↩