Thursday, September 9, 2021

Surprisingly High Cost of Java Variables with Capitalized Names

I've read hundreds of thousands or perhaps even millions of lines of Java code during my career as I've worked with my projects' baselines; read code from open source libraries I use; and read code examples in blogs, articles, and books. I've seen numerous different conventions and styles represented in the wide variety of Java code that I've read. However, in the vast majority of cases, the Java developers have used capitalized identifiers for classes, enums and other types and used camelcase identifiers beginning with a lowercase letter for local and other types of variables (fields used as constants and static fields have sometimes had differening naming conventions). Therefore, I was really surprised recently when I was reading some Java code (not in my current project's baseline thankfully) in which the author of the code had capitalized both the types and the identifiers of the local variables used in that code. What surprised me most is how difficult this small change in approach made reading and mentally parsing that otherwise simple code.

The following is a represenative example of the style of Java code that I was so surprised to run across:

Code Listing for DuplicateIdentifiersDemo.java

package dustin.examples.sharednames;

import java.util.Date;
import java.util.List;
import java.util.concurrent.TimeUnit;

import static java.lang.System.out;

/**
 * Demonstrates ability to name variable exactly the same as type,
 * despite this being a really, really, really bad idea.
 */
public class DuplicateIdentifiersDemo
{
    /** "Time now" at instantiation, measured in milliseconds. */
    private final static long timeNowMs = new Date().getTime();

    /** Five consecutive daily instances of {@link Date}. */
    private final static List<Date> Dates = List.of(
            new Date(timeNowMs - TimeUnit.DAYS.toMillis(1)),
            new Date(timeNowMs),
            new Date(timeNowMs + TimeUnit.DAYS.toMillis(1)),
            new Date(timeNowMs + TimeUnit.DAYS.toMillis(2)),
            new Date(timeNowMs + TimeUnit.DAYS.toMillis(3)));

    public static void main(final String[] arguments)
    {
        String String;
        final Date DateNow = new Date(timeNowMs);
        for (final Date Date : Dates)
        {
            if (Date.before(DateNow))
            {
                String = "past";
            }
            else if (Date.after(DateNow))
            {
                String = "future";
            }
            else
            {
                String = "present";
            }
            out.println("Date " + Date + " is the " + String + ".");
        }
    }
}

The code I encountered was only slightly more complicated than that shown above, but it was more painful for me to mentally parse than it should have been because of the naming of the local variables with the exact same names as their respective types. I realized that my years of reading and mentally parsing Java code have led me to intuitively initially think of identifiers beginning with a lowercase letter as variable names and identifiers beginning with an uppercase letter as being type identifiers. Although this type of instinctive assumption generally allows me to more quickly read code and figure out what it does, the assumption in this case was hindering me as I had to put special effort into not allowing myself to think of some occurrences of "String" and "Date" as variables names and occurrences as class names.

Although the code shown above is relatively simple code, the unusual naming convention for the variable names makes it more difficult than it should be, especially for experienced Java developers who have learned to quickly size up code by taking advantage of well-known and generally accepted coding conventions.

The Java Tutorials section on "Java Language Keywords" provides the "list of keywords in the Java programming language" and points out that "you cannot use any of [the listed keywords] as identifiers in your programs." It also mentions that literals (but not keywords) true, false, and null also cannot be used as identifiers. Note that this list of keywords includes the primitive types such as boolean and int, but does not include identifiers of reference types such as String, Boolean, and Integer.

Because very close to all Java code that I had read previously used lowercase first letters for non-constant, non-static variable names, I wondered if that convention is mentioned in the Java Tutorial section on naming variables. It is. That "Variables" section states: "Every programming language has its own set of rules and conventions for the kinds of names that you're allowed to use, and the Java programming language is no different. ... If the name you choose consists of only one word, spell that word in all lowercase letters. If it consists of more than one word, capitalize the first letter of each subsequent word. The names gearRatio and currentGear are prime examples of this convention."

Conclusion

I've long been a believer in conventions that allow for more efficient reading and mental parsing of code. Running into this code with capitalized first letters for its camelcase variable name identifiers reminded me of this and has led me to believe that the greater the general acceptance of a convention for a particular language, the more damaging it is to readability to veer from that convention.

No comments: