3 Steps to Understanding Ungrokkable Legacy Code

On Friday I was faced with making some changes to some old code that nobody really understood. Nobody liked to go into this code, so nobody was familiar with it. I too dreaded to make changes to it, and only dipped my toes in it as little as possible to understand what I needed to know.

Well now I needed to understand it better in order to make a bug fix. It turned out to be a good opportunity to learn what this code was doing and make it easier to understand.

Here are some tips for understanding legacy code that is hard to grok:

Step 1. Print out the code.

Sometimes the code you face is so gnarly that you just have to print out the sucker. The code needs to be printed out onto hardcopy so you can lay it out on a table and get a sense of what it is trying to do. So I printed it out. The code itself wasn’t very long – about 10 pages or so – but it was extremely confusing. When I laid the pages out on the table, though, I could start to get a handle on it. I could make connections between the 3–deep class hierarchy – I could see what was overriding what.

Step 2. Tidy up the code.

Tidying up whitespace and fixing the style of the code is a great, low–investment way to get familiar with the code. This is a tip I picked up from the interview with Douglas Crockford in the book Coders At Work (fascinating book of interviews with famous coders, btw):

Seibel: “How do you read code you didn’t write?”

Crockford: “By cleaning it. I’ll throw it in a text editor and start fixing it. First thing I’ll do is make the punctuation conform; get the indentation right, do all that stuff.”

So fix up the superficial things, just so you can start getting your hands (a little) dirty working with the code.

Step 3. Make the code easier for yourself and others to understand.

What I mean here is adding doc, and especially renaming variables, methods, and classes to be easier to grok. For example, renaming $compiled to $configValues made a quantum difference in the understandability of the code, and I made 13 other renames like that. There is power in names – I have a hunch that design is nothing more than the art of precise naming (if you know of a paper, perhaps in linguistics, that backs up that statement, I’d be interested – let me know).

Having printed out the code, tidied it up a little, and especially documented it and renamed things to be easier to understand, I now understood the dragon that we had long feared, and made it grokkable for my teammates as well. I knew how to fix it, what part of the code I needed to change – that took about 20 minutes, including writing a unit test.

It feels good to slay dragons.

What tips do you have for dealing with obtuse legacy code?

6 thoughts on “3 Steps to Understanding Ungrokkable Legacy Code”

  1. I have problems of trying to remember what I was doing with my own code in our unit test system. So, I’ve recently taken to writing my software in a literate style. See literateprogramming.com for resources on it. Unlike what Knuth says, you don’t need to write your code as an essay — it’s completely sufficient to just write your documentary as though you were giving a code review/walkthrough to someone else. Because, in fact, that’s precisely what you’re doing.

  2. Cool stuff, Sam. I’m a fan of literate programming, and have written one literate program. The only qualm I have with it is that it adds a compilation step (to convert the literate program into executable code – php in my case). But it is a pleasure to write and to read.

  3. Like anything else, it’s an engineering decision. I find the relative time saved by using literate programming more than compensates for the time lost in the extra compilation step. 🙂

  4. great post, jon. as i believe you may know i am a great fan of scm logs. those combined with the issue tracker can often give a sense of understanding. though they can also lead to confusion 🙂

  5. This is where I find visualization tools, such as UML diagrams, become extremely powerful. While I prefer tools that generate these for you when trying to grok a large system, the act of creating either a class diagram or sequence diagrams generally means that you not only look at enough of the code to really understand what’s going on, but also have a visual representation of such for future reference.

    In the .NET space, NDepend (http://ndepend.com) was also a great tool. (I don’t know if there’s a java version; I doubt there’s a php version.) It generated a dependency graph at the library and class levels, as well as providing a wealth of data on method dependencies, code metrics, naming conventions, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *