Monday, June 29, 2009

Writing Code Is Much Like Writing Prose

There are many similarities when comparing the writing of code to the writing of prose. Because of this, we should be able to learn from doing each of these and apply things learned from one to the other.

The Absolutes

In writing code and in writing prose, there are a few things that are either absolute or approach very nearly the status of absolute. This is especially true for programming languages where syntactic and semantic rules must be followed for the code to be compiled and/or interpreted correctly. Even when a programming language supports certain generally frowned-upon features, some of these features are avoided to such a large degree that they almost appear absolute. For example, direct use of "goto" is generally frowned upon and is rarely seen in most code bases. However, less obvious versions of this (such as break and continue in Java) do seem to be less strictly avoided.

Although it is less strictly enforced in writing prose, there still is significant pressure to conform to certain absolutes even in writing prose. For example, it is generally assumed that most professional prose will include sentences that begin with capital letters and end with periods. Similarly, proper names are almost always capitalized and correct spelling and reasonable grammar are also expected. The degree of enforcement for such things in prose often depends on the media. Professional papers and articles typically are the most enforced with the author and professional editors investing significant effort into polishing the prose. On the opposite end of the spectrum are e-mail messages, blogs, and Twitter messages, which seem to have less enforced absolutes.


The Frowned-Upon

Although there are a small number of absolutes or near-absolutes as just discussed, code development and prose writing seem to have many more things that are not absolutely avoided but seem to be strongly discouraged. However, these things tend to creep in despite their negative reputations because they do offer some advantages. Usually the advantages these items offer are ease for the writer at the expense of later reader or maintainer having more difficult prose or code.

Many developers realize problems associated with using so-called "magic numbers." However, they still seem to crop up. Often they are put in place "temporarily" and then forgotten or limited schedules prevent replacing them with a constant. These "magic numbers" are quick and easy to use when doing initial development, but can lead to a maintenance nightmare. Global variables offer a similar trade-off of easy early development at the expense of maintainability, robustness, and scalability.

Prose authoring has similar frowned-upon, but still often used, features. For example, it is often said that sentences should not end with prepositions. Similarly, it is often said that strong, active voice should be used. The tense of the writing should also be consistent. These are all things that are recommended because they do offer recognized benefits, but they are also easy to ignore or cheat on a bit when it is not deemed worth the time or effort to satisfy all of them all of the time.


The Standards

Since nearly the beginning of software development, developers have seemed to want to create and adhere to coding standards and conventions. Of course, we also seem to have been resistant to other peoples’ ideas of standards and conventions for nearly as long. The reason most of us are willing to give up some "creative freedom" and adhere to standards and conventions is that we have learned that code is more readable and maintainable (especially by others) when we adhere to a minimum set of conventions.

Prose can benefit from the same benefits of standardization and convention. There are books such as Elements of Style and Chicago Manual of Style devoted to prose style. Most of the arguments in favor of these prose writing style conventions are the same arguments used in favor of coding conventions: easier to read and maintain and consistency to benefit different readers and authors/developers.

One style issue is very similar between prose writing and code development. The subject of spaces can be surprisingly controversial in both arenas. In software development, most developers seem to agree that the optimal number of spaces for indentation is between 2 and 4 spaces. However, trying to narrow down which of these is best (2 spaces or 3 spaces or 4 spaces) is significantly more difficult.

There has been a controversy in the prose writing world regarding how many spaces should follow a period that ends one sentence before the first letter of the next sentence. I grew up thinking that two spaces was the expected number of spaces between the end of one sentence and the beginning of the next sentence. However, I was recently informed by a prose reviewer and editor that a single space is now preferred. I wondered if this was a web browser-inspired shift, but there seems to be evidence that this shift started even before the widespread adoption of HTML.


Conciseness

There seem to be differences of opinion on whether prose should be concise or verbose. To some extent, this depends on the subject of the prose. I prefer technical prose to be as concise as possible while still remaining thorough. This is especially true of technical references. However, with novels, extra verbosity can sometimes be nice to explain the story and character development. This can even go too far, for my taste, as evidenced by Moby Dick.

I have found that even in software development there is a wide diversity of opinion about conciseness of code. The longer I work in the industry, the more I value conciseness. However, I know many Java developers who don’t like the same degree of conciseness that I like. An example of this is the Java ternary operator. This operator has really grown on me, but I still know many Java developers who do not care for it at all. Although many developers are migrating to programming languages that emphasize and value conciseness, there are limits to how concise we want to be. After all, none of us are probably too excited to write and maintain production code that can be sneakily squeezed into a single line.


Refactoring

"Refactoring" is a popular term in software development, but it does have its equivalent in prose writing. In my case, I find that when I revisit my own articles, I continually "refactor" the text to make it flow better, to reduce unnecessary repetition, and to make it generally more concise. The editors of formal articles often do this to an even greater degree. In fact, the editors’ reviews often remind me of how developers are eager to change others’ peoples’ code to match their own preferences. Some of the "refactorings" I see in both development and in article editing have marginal value. However, I think most of us can agree that some "refactoring" or editing is useful and recommended for writing code or for writing prose.

When the editors and reviewers at Oracle Technology Network recommended cutting my original draft of the Basic JPA Best Practices article to less than half its original draft size, it took some effort to "refactor" that article to that point. Although some minor details and some explanatory text were removed in the process, most of the substance was retained even though the final article had half as many words as my original draft. It was not trivial trimming that draft down without losing too much substantial content, but the effort the reviewers, editors and I invested is reflected in the improvements. That article still weighs in around 11 pages, but it is leaner, tighter, and more optimized than my original draft. That sounds awfully similar to the benefits of code refactoring. The process really was like refactoring because it was more than just removing words; it involved changing words and changing sentence structure and paragraph structure.

I don’t spend much time reviewing my blog posts at the time of their writing. I think this is common among blogs, though some are exceptions. Because of this, most of us expect blog posts to be rougher than formal articles. There are definitely different expectations for different forms of writing. Similarly in code, prototypes and demonstration code can often be a little "rougher around the edges" than highly reviewed and refactored production code.


More Knowledge Means More Expressiveness

When writing prose, one of the most useful techniques for writing concise but thorough prose is to know and carefully use the appropriate words and phrases. Words have different nuances and these nuances can be used to provide more expressiveness with the same number of words. In code, we see the same thing. There is often more than one way to get the job done, but thorough understanding of the language’s features and provided class libraries allows us to select the most appropriate language feature or class that provides the exact nuanced solution appropriate to the problem at hand.

When writing prose, it is common to use well-known idioms and phrases to imply much more than the few words would normally imply. For example, "a picture is worth a thousand words" consists of only seven words but implies much more than what we might say with only seven words. Some assumed knowledge is required (readers must be familiar with the idiom) to make this work. In code, we often use design patterns and other common phrases to succinctly describe much larger ideas that otherwise would require much more description.


The Value of Review

I have found that both code and prose that I write benefit when reviewed by someone else. When I write my own code or prose, I know what I am trying to say and it all makes sense. Reviewers of articles and reviewers of code can ask questions about what is intended and provide feedback that makes the code or article more generally appealing. Sometimes we’re too close to the product for our own good and the reviewer can help us to see things that we don’t see.


"Readable" is in the Eye of the Beholder

To some degree, what is "readable" depends on the person doing the reading. This applies to both prose and code. Readers (whether reading prose or someone’s code) have their own preferences. Just as we all like different prose authors’ writing, it is not surprising that we each find different styles of code easier or more difficult to read. For example, I have an easier time reading code written by people who have similar tastes and preferences to mine. For this reason, I don’t think we’ll ever see a single programming language or framework that everyone uses. There is just too wide of a spectrum of differences of opinion for any one language or framework to appeal to everyone. This is also an important observation to realize when writing code or prose. You can try to appeal to the widest set possible, but no matter what you do there will probably be at least a small group of people who don’t like it.


Conclusion

Writing prose and writing code have much in common. Many of the same techniques that make better prose also make better code. In both cases, knowing what one has to work with (vocabulary and common phrases for prose and language features and class libraries for code) can make it easier to write particular effective prose or code. Both types of writing also benefit tremendously from review. Many of the same controversies surround both types of writing.

No comments: