Does AI Make Open Source Like Training Your Replacement?

Jun 30

Training Your Replacement

Since the dawn of the outsourcing revolution, one of the most hated activities of anyone in IT is being told by your manager to "train your replacement." It's an experience fraught with frustration and resentment as it feels like you're being asked to undermine your own job security. This unsettling task often forces professionals to pass on their hard-earned knowledge and expertise to someone else who will be doing their job at a fraction of the cost, leaving them to face an uncertain future and likely layoff.

AI Can Be The “Replacement” for Software Developers…

As worried as software developers profess to be about AI replacing them, why don’t they pay more attention and draw lessons or at least some inspiration from the “train your replacement” phenomenon?

Maybe when it comes down to it, they aren’t really that worried. In fact, there are strong arguments to not worry about AI wiping out massive numbers of software development jobs. I’ve dabbled at this in some previous posts and may come back to it later. (Spoiler alert: For the most part, I think the risk to developer jobs is overblown.)

One of the biggest criticisms of LLMs as they are being practiced today involves whether AI providers should need explicit permissions to training data on the Internet. I’m not a lawyer, but I do know that there are lawyers who are actively exploring whether or not LLMs are creating “derivative” works and committing all manner of trademark, copyright, and license violations in the process. Who knows how long it will take for courts to hash this out. But some content providers aren’t waiting to find out and instead are taking active legal and technical measures to restrict training AI models on their content.

But who is NOT doing this and instead is doing the exact opposite? Software developers!

Trained By Software Developers Themselves…

Where do you think ChatGPT learned how to write Python code to integrate AWS SQS with Azure CosmosDB and build the result into a Docker container? That’s right, it derived it by cleverly recombining bits and pieces its model ingested by reading a massive amount of code that you, my software developer friends, posted to GitHub and StackOverflow. I shouldn’t have to point out that you most likely didn’t get compensated for posting that code.

But it’s open source, you say, so you never expected to get compensated for it. You did it for the love, or the reputation, or because you were bored on Saturday, or whatever. Fair enough. Five years ago, that was probably a fine answer anyone could accept without much question.

I shouldn’t have to point this out to technologists, but what’s different today is that

Your code being freely available on the Internet is probably helping advance systems that may actively threaten your livelihood.

But Only Because They Allow It

If you are really in the AI-will-take-our-development-jobs camp, this really should change your cost-benefit analysis around open source and even forum participation. I realize how awful and harsh that will sound to many of my readers. After all, many developers enjoy community as well as career benefits from making OSS contributions.

I’m not going to suggest you quit posting code to GitHub. But again, if you’re truly concerned about AI taking your job and you’re publishing an interesting project or piece of code, you might want to at least consider reserving some rights explicitly beyond what copyright already might or might not already provide you.

Indeed, some people have thought of this, as evidenced by the existence of a collection of “non-AI licenses” on GitHub, which sits at 168 stars as of the time of this writing. It’s what it sounds like specifically:

”This repository contains templates for software and digital work licenses that restrict software and other work from being used in AI training datasets or AI technologies.”

I suppose you can make an argument that such a license doesn’t technically meet the spirit of an “open-source” license, but that’s highly pedantic so let’s not discuss it here. The point is that if you have a piece of code that for whatever reason you do want to publish but may prefer to not have ingested by AI models, it might be time to consider placing some additional restrictions on its use.

Your Code, Your Future

During the outsourcing revolution, when your manager asks you to “train your replacement”, you probably don't have much of a choice. But when it comes to training AI, you do. As software developers, you hold the power to decide how your contributions are used. Before sharing your code, consider the potential implications and whether you want to take steps to protect your work from being used to train AI systems that could eventually replace your job. You have the ability to shape the future of your industry, so make informed choices about how you share your expertise.

Scott McMaster https://www.smcmaster.com