GenAI and the Golden Age of Legacy Code
Introduction
Software developers are increasingly turning to generative AI to streamline the development process. GenAI technologies can be helpful for automating repetitive tasks, generating code snippets, and offering intelligent insights, freeing up time for developers to focus on more creative and complex problem-solving.
I’ve heard people claim 20%, 50%, or even 70% improvements in developer productivity when using GenAI tools. Of course, despite many decades of software engineering research, in reality nobody has any idea how to quantify developer productivity, and anybody who claims they do is trying to sell snake oil to your C-Suite.
That said, I think we can all agree that at the very least, GenAI allows developers to produce more code in less time. But is more-code-less-time automatically as good as it sounds? Is it possible that in a year or two we might regret some of the code that we’re generating today?
Indeed, based on what I’ve seen so far, I worry that when used naively and in an undisciplined way,
GenAI can lead to massive piles of technical debt.
Let’s first discuss why AI-generated code often exhibits the characteristics of legacy code the minute it comes out of the chatbot, and then consider a few mitigations that your organization should implement today.
The Room-Building Robot
Imagine I live in a home with a fantastic architecture and interior design. The rooms are functional and well-laid-out, the paint and carpet nicely compliment each other, and quality and craftsmanship of the construction is top-notch. When I first purchased this home, it met all of my needs. But my needs change over time, and it turns out that I need to add some new rooms.
Fortunately, I just purchased a room-building robot. All I need to do is tell the the robot what kind of room I would like (optionally providing a few additional parameters and constraints) and the robot adds the room to my home.
One day, I decide I need a gym room to get into better shape. So I tell the robot to "add a gym room to my home”, and just like that, the robot builds me a gym. Wow! That certainly feels more “productive” than hiring and scheduling an architect and a bunch of contractors to build it. Of course, the gym is hexagonal, far larger than necessary, with lime-green walls, and the entrance is off the master bedroom closet.
If I were thinking ahead, I might have told the room-building robot exactly where to locate the gym room, how large I would like it to be, and to match the overall interior design of the rest of the home. That would have been a modest bit more planning ahead for me. But for now I don’t really care because I can immediately start working out. Like I said, “productive”.
The experience with the gym room went so well that I decide to do a bit more expansion. I add a pool room, a new bedroom suite, a wet bar, and a library. A year later, I even add another gym (having forgotten about the first one after growing fat and lazy). These rooms are all functional, but they all have different shapes, design schemes, and rather random locations around the home.
The first problem is that I have to clean all of these rooms. This isn’t “complex” per se since cleaning one room is more or less like cleaning any other. But as time goes on, I notice that there are…just a lot of rooms. It turns out that my ability to rapidly add rooms to my home has increased my cleaning burden to the point where I’m finding myself less productive in other areas.
The second problem is that as anybody who has ever owned a home can tell you, eventually you have to perform home maintenance. Because my robot-generated rooms use all kinds of different designs and materials, my ability to leverage synergies between them when doing maintenance is limited. I need to buy 20 different colors of paint, 17 different kinds of carpet, and multiple brands of replacement windows. I can’t get the best deal from any given vendor, I might need to deal with multiple installers and contractors, and I end up with a lot of material overage.
And what am I going to do with the extra gym?
Eventually I’m so frustrated by the effort and added expenses that I have to do a major remodel. It’s hard to deny that the room-building robot made me productive. It also built me a “legacy home”.
The Code-Building Robot
I’ve already seen some of the dynamics of the room-building robot when GenAI is used for software development. Probably the most typical GenAI prompt pattern for software developers is, “write me a function in <your programming language here> that…”. It’s not “write me a program…” or even “write me a package/module…”.
In isolation, having GenAI write you a function is fine. In the context of a large program, it’s a potential tech debt disaster.
Why is this the case? It comes down to context. And in most cases, the current generation of GenAI tools isn’t working with very much of it. Your code-generating robot is working with the trained model (from a lot of open-source code and forum posts), what you tell it in the prompt, maybe some information from earlier in the chat, and maybe some fairly limited context from the code repository you’re working in.
In other words, GenAI is making mostly local decisions. If you want maintainable code, you need to make global decisions based on the entirety of your code base and your organization’s coding standards.
Here are some specific examples of things I’ve seen as results of GenAI code that I would prefer not to see creep into a long-lived enterprise code base:
Different libraries that accomplish the same tasks (multiple JSON parsers, string utils, etc.).
Different versions of the same library (e.g. AWS SDK v1 and v2).
Different API and library calling patterns.
Different error-handling idioms, naming conventions, comment styles, etc. (basic coding style stuff).
Different log message formats.
Large volumes of duplicate code.
Again, we’ll stipulate that the GenAI code works and the software developer certainly felt more productive when integrating it into the codebase. None of these things “break” the program. But it’s clearly technical debt the second it’s committed, and we will likely regret these issues when we have to maintain the code 6 months or 6 years down the road.
Use the Code-Building Robot Correctly
If we want to use GenAI (we should!) and we want a maintainable code base (ditto!), what should we do to manage this situation?
One approach would be to hope that GenAI gets good enough fast enough to start identifying and fixing codebase-wide issues like these automatically. It very well may. But hope is not a strategy, as they say, so here are a few concrete actions you should take today to improve the use of GenAI by software developers in your organization:
Do code reviews (and do them well). You may have observed that all of the issues I mentioned above could be called out in a code review. Code that comes from a GenAI chatbot deserves extra review scrutiny. Make developers aware of what to look for.
Use traditional program analysis tools. Many of the issues that I mentioned above can also be caught (and sometimes automatically fixed) by traditional code formatters and linters. Make sure that every developer in your organization is using a standard toolchain that has these tools enabled, consistently configured, and dialed up to 11.
Use a better GenAI tool. Don’t cheap out and make your developers use vanilla ChatGPT. Get a developer copilot that is tuned for code generation. Get one that leverages your own codebase. Get one that can use embeddings rather than simple textual matching. If you want to see what kind of difference this can make, check out Sourcegraph Cody, which is checking off these boxes.
Tune the model. If you’re a somewhat larger and more sophisticated organization, you can consider tuning your own model to your own codebase and coding standards. Then you’ll be a lot more likely (albeit still not guaranteed) to get generated code that fits neatly into your codebase with less out-of-the-box tech debt.
Prompt better. You want code that uses a specific version of a library or SDK? You can tell the GenAI copilot. Of course, if you start to add even two or three of these constraints to every prompt, it starts to get very verbose, and you can’t count on developers to always do this correctly. So customize the GenAI tooling to build in your standards.
Finally, a parting thought: A GenAI coding assistant is a useful tool for software developers, but it’s not their only tool. And tools should be part of a larger ecosystem that includes requirements management, code reviews, build/test/release processes, and operations. If you want to make best use of GenAI, think about how it fits in the overall picture of your current developer productivity story.