The end of test-driven development

In my sophomore year at university, I was introduced to test-driven development (TDD). This approach was mentioned frequently throughout my studies and was encouraged in various classes. During my career at Microsoft, I was one of the few engineers in my organization who regularly used TDD. Interestingly, another was my officemate, Bill. Despite this, I wasn't particularly strict about it.

According to some recent estimates, about one in four engineers use TDD, a figure that aligns with Google Trends data. Over the past few years, the adoption rate has ranged from 20% to 25%. Based on my recent experience, I believe TDD's popularity will decline.

The rise of LLMs in software engineering

I'm sure you're aware of the increasing use of Large Language Models (LLMs) in software engineering. As a software engineer, you've probably experimented with – if not regularly used – LLMs when writing code, whether through copilots or chat interfaces.

The bad news? The integration of LLMs into coding workflows often renders TDD irrelevant or even incompatible.

The good news? LLMs excel at generating test cases and assisting with test creation, especially for unit tests of well-scoped code. Depending on the model and context, they can handle larger parts of a codebase as well.

My experience using LLMs for test generation

I enjoy thinking about test cases and emphasize comprehensive tests during code reviews. It's crucial to consider testability early in the development process to structure the code accordingly. However, writing tests can be tedious, involving repetitive tasks like tweaking mocks or fixing test race conditions.

In recent months, I've been experimenting with generating tests using LLMs. Here's my workflow:

Write the code, whether it's an API, utility function, library, or React component.
Copy and paste the code into an LLM, asking it to generate unit tests. I’ve tested various prompts and found little difference between simple and detailed ones.
Analyze, try out, and debug the generated tests.

It's rare for the LLM-generated tests to be perfect on the first attempt. Most errors stem from misused dependencies or improper setup. However, the core test cases and repetitive boilerplate code are usually solid.

I use both ChatGPT-4 and Claude 3.5 Sonnet, combining their outputs for the final result. ChatGPT often generates complex, less error-prone tests, while Claude's tests are simpler but require more tweaking. Typically, my final test files are about 70% Claude and 30% ChatGPT.

Why TDD is dying

LLMs are becoming more integral to programming. If we assume all code will eventually be AI-generated, the need for human-written tests diminishes. AI can write both the code and the tests, potentially making tests unnecessary over time.

Even without fully automated coding, in a world where business logic is co-written, the tests can definitely be automated. The inverse is less likely: humans will not write tests while AI writes the entire business code.

Some people really enjoy breaking things, but many more prefer building. Historical trends in software engineering vs. QA roles, and even the TDD statistics mentioned earlier, show an appreciation for testing but less enthusiasm for creating tests. Many engineers view comprehensive test writing as a necessary evil.

Productivity gains from AI

As a former executive, I was often asked about the productivity impact of AI tools like Copilot. Based on my experience:

Code writing: Tools like Copilot (or Supermaven, which I prefer) are invaluable, akin to Resharper for C# at Microsoft. These tools significantly boost productivity by auto-completing code with high accuracy and increasing complexity. It’s essential for companies to invest in such tools for their engineers.
Test writing: LLMs save me O(hours) each week, especially in greenfield projects.
Resource: LLMs are excellent for learning new tools and languages, saving me O(hours) each week.
Debugging: LLMs are less helpful for complex debugging, but can catch simple bugs. I've probably wasted O(minutes) each week.

Overall, LLMs improve my productivity by about 4-6 hours each week, roughly a 12% increase [0].

Summary

Use a copilot: It’s a red flag if a company doesn't invest in this for its engineers.
Leverage LLMs: Treat them as a helpful coworker always available for assistance.
Automate repetitive tasks: Use LLMs to generate mundane code based on precise instructions (e.g. tests based on code, or a client based on a thorough spec).
Think longer-term: Current models are the worst we’ll ever use, so future improvements are promising.

[0]: Assuming 50-60 hours / week x 70% builder time