Wednesday, October 12, 2016

Unintentionally Obfuscated: Dealing with Challenging Legacy Code

I recently had to deal with some legacy code with significant performance issues. It was more challenging than I thought it should have been to fix the issues and I was reminded throughout the process of how relatively simple good practices could have made the code easier to fix and might have even helped avoid the troublesome code from being written in the first place. I collect some of those reminders and reaffirming observations in this post.

I was having difficulty following what the code was doing and realized that some of the difficulty was due to the naming of the methods and parameters. Several methods had names that implied the method had a single major responsibility when it actually had multiple responsibilities. There are advantages to a method having only a single responsibility, but even methods with multiple responsibilities are more readable and understandable when their name appropriately communicates the multiple responsibilities. As I changed the method names to reflect their multiple responsibilities, I was able to more clearly understand all the things they were doing and, in some cases, break the methods up into separate methods that each had a single responsibility and together implemented the multiple responsibilities of a single method.

Another issue that reduced readability of the code and added to confusion was out-of-date comments. These comments were likely correct at one point, but had become obsolete or incorrect and often described something quite different than what the code was actually doing. As I removed some of these comments and corrected and updated others, the code was easier to read because the code and comments were in harmony. I tried to only leave comments that are necessary because the code itself does not provide an easy way to express the behaviors.

Another common characteristic of this challenging piece of legacy code was the use of several overloaded methods. The types of the parameters in some cases were exactly the same, just in different orders to allow the methods to be overloaded. The methods were far clearer in terms of intent and in terms of differentiation between them when I broke the "clever" use of overloaded methods and named the methods more appropriately for what they were doing. Part of this was changing the method names to reflect the entire reason for the particular ordering of the parameters in the first place. For example (these are conceptual illustrations rather than the methods I worked on), if the methods were to covert an instance of ClassA to an instance of ClassB and vice versa, I changed the method names and signatures from convert(A, B) and convert(B, A) to convertAToB(A, B) and convertBToA(B, A). This made the calling code much more readable because one did not need to know that the local variables passed to these respective methods were of type A or of type B to know which method was being called.

The following summarizes the relatively small issues that together made this a difficult piece of code to understand and fix:

  • Methods doing too many things (too many responsibilities)
  • Poorly named methods, arguments, and local variables that insufficiently described the constructs or even incorrectly described them
    • Overloaded methods with same arguments but very different functionality added to confusion.
    • A colleague remarked that it would have been better if the methods and variables had been named something nonsensical like "fred" and "bob" than to have their incorrect names that seemed explanatory but ended up being misleading.
  • Obsolete and outdated comments that were now misleading or blatantly incorrect

Each of these issues by itself was a small thing. When combined in the same area of code, however, they compounded the negative effect of each other and together made the code difficult to understand and therefore difficult to change or fix. Even with a powerful IDE and unit tests, this code was difficult to work with. Making the code more readable was a more effective tool in diagnosing the issue and fixing it than even the powerful IDE and unit tests. More importantly, had this code been written in a more readable fashion in the first place, it would more likely have been implemented correctly and not required a fix later.

No comments: