4 min read

New Study Measures What AI Coding Studies Ignore

Picture of Writing Team Writing Team : Feb 16, 2026 8:00:00 AM

Research Coding AI Capabilities

7:57

We finally have research that asks the right question about AI coding tools.

Dave Farley, co-author of Continuous Delivery and host of the Modern Software Engineering channel, recently published findings from a pre-registered controlled experiment that measured something most AI productivity studies completely ignore: what happens when the next developer has to maintain AI-generated code.

This matters because maintenance costs represent 50-80% of total software ownership expenses over a system's lifetime—three to four times more than initial development. Yet most AI coding studies stop at "did the developer finish faster?" That's measuring typing speed, not engineering impact.

@aeyespybywinsome
AI is an amplifier.
♬ original sound - AEyeSpy

The Study That Actually Simulated Real Work

This wasn't undergraduate students completing toy assignments. The research involved 151 participants, 95% of whom were professional software developers—a rarity in academic studies that typically rely on student populations because they're easier to recruit.

The experiment used a two-phase design that mirrors actual software development reality:

Phase One: Developers added features to buggy, unpleasant Java web application code. Some used AI assistants (GitHub Copilot, Cursor, Claude Code, ChatGPT). Others worked without AI.

Phase Two: A different set of developers was randomly assigned the code produced in Phase One and asked to evolve it, without knowing whether it was originally written with AI assistance or not. Crucially, no AI assistance was allowed in Phase Two.

This design isolates the key variable: how easy is AI-generated code for someone else to change later? That's the actual test of code health and maintainability.

What They Measured (And Why It Matters)

The researchers didn't guess—they measured multiple dimensions of maintainability:

Time: How long the next developer took to evolve the code
Objective code quality: Using Code Scene's code health metric
Test coverage: Actual percentage of code under test
Perceived productivity: Using the SPACE framework

This multi-dimensional approach acknowledges that maintainability isn't a single magic number. Anyone claiming otherwise should be treated with suspicion.

The Findings That Challenge Both Hype and Fear

The headline result: There was no significant difference in maintenance cost between AI-generated and human-generated code.

Code written with AI assistance was no harder to change, no easier to change, no worse in quality, and no better in quality from a downstream perspective. AI didn't break anything. Given the fear-mongering around "AI slop," that's a significant finding—and one that appears to be new to this research.

The expected result: AI users in Phase One were approximately 30% faster to reach a solution. Habitual AI users were closer to 55% faster. Yes, AI speeds up initial development. That's no longer controversial.

The interesting result: When experienced developers who already knew what they were doing used AI habitually, their code showed a small but measurable improvement in maintainability later on.

One explanation: AI tends to produce boring, idiomatic, unsurprising code. And boring code is maintainable code. Surprise is usually the enemy of maintainability.

The Critical Caveat: AI Amplifies, It Doesn't Replace

What's absolutely clear from the research: AI does not automatically improve code quality. Developer skill matters more than AI usage.

As Farley notes, "AI code assistance acts as a kind of amplifier. If you're already doing the right things, AI will amplify the impact of those things. If you're already doing the wrong things, AI will help you to dig a deeper hole faster."

This aligns with recent DORA research on AI impact: tools amplify capability, they don't replace it.

Jason Gorman's breakdown of "doing the right things" in AI-assisted coding includes:

Working in small batches, solving one problem at a time
Iterating rapidly with continuous testing, code review, refactoring, and integration
Architecting highly modular designs that localize the blast radius for changes
Organizing around end-to-end outcomes instead of role or technology specialisms
Working with high autonomy, making timely decisions instead of escalating everything

In other words: fundamental software engineering discipline still matters—perhaps more than ever.

The Long-Term Risks Nobody's Measuring

The study authors highlight two slippery slopes toward disaster:

Code bloat: When generating code becomes almost free, teams generate far too much of it. Volume alone drives complexity, and AI makes it easier than ever to drown in your own codebase.

Cognitive debt: If developers stop thinking deeply about the code they create, understanding erodes, skills atrophy, and innovation slows. This long-term risk doesn't show up in sprint metrics.

What Marketing and Growth Teams Should Learn

If you're building marketing technology systems, internal tools, or automation platforms, this research offers practical guidance:

AI coding tools improve short-term productivity without damaging maintainability—when used by people who already understand good engineering practices. They don't remove the need for good design, decomposition skills, or hard thinking about problem-solving.

The real technical skill isn't typing speed. It's decomposition—breaking problems into small pieces that AI assistants can handle well, then guiding them toward solutions you're actually happy with.

Need help building AI-assisted development practices that prioritize long-term maintainability? Winsome's growth experts help teams implement AI tools strategically—not recklessly.

Study Methodology, Approach, and Key Findings

Source: Dave Farley, Modern Software Engineering channel (February 2025)

Methodology

Participants: 151 total, 95% professional software developers (not students)
Design: Pre-registered controlled experiment with two phases
Technology: Java web application with realistic complexity
Phase One: Developers add features to buggy code; some use AI assistants (GitHub Copilot, Cursor, Claude Code, ChatGPT), others don't
Phase Two: Different developers randomly assigned Phase One code to evolve; no knowledge of whether AI-assisted; no AI tools allowed
Control: Variables measured rather than assumed

Measurements

Time to complete evolution tasks
Objective code quality (Code Scene's code health metric)
Test coverage percentages
Perceived productivity (SPACE framework)
Multi-dimensional approach acknowledging maintainability isn't a single metric

Key Findings

No significant difference in maintenance cost between AI-generated and human-generated code
No quality difference downstream—code neither harder nor easier to change
30% speed increase for AI users in initial development (Phase One)
55% speed increase for habitual AI users in initial development
Small measurable improvement in maintainability when experienced developers used AI habitually
Developer skill matters more than AI usage in determining code quality
AI acts as amplifier: Amplifies good practices if present; amplifies poor practices if present
No evidence of hidden costs from AI-assisted development in maintenance phase

Identified Risks

Code bloat: Nearly-free code generation encourages over-production and complexity
Cognitive debt: Reduced thinking leads to eroded understanding and atrophied skills over time
Long-term risks don't appear in short-term sprint metrics

Critical Conclusion

AI assistants improve short-term productivity without damaging maintainability—but only when used by developers who already practice good engineering discipline, decomposition, and thoughtful problem-solving.

Google AI Studio's New Build Interface: Lowering the Floor Without Raising the Ceiling

Writing Team : Oct 24, 2025 8:00:00 AM

Google just rolled out a significant redesign of AI Studio's "Build" interface, and the target is clear: eliminate the friction between "I have an...

CatAttack Study Exposes Vulnerabilities in AI Reasoning Models

Writing Team : Jul 8, 2025 8:00:00 AM

A groundbreaking study from researchers at Collinear AI, ServiceNow, and Stanford University has exposed a fundamental vulnerability in...

AI fail AI Capabilities

What the Reverse CAPTCHA Study Means for Marketers

Writing Team : Mar 4, 2026 8:00:00 AM

Researchers just proved that invisible characters — literally unreadable to human eyes — can be embedded in ordinary-looking text to hijack AI...

Cybersecurity Data privacy AI Capabilities

New Study Measures What AI Coding Studies Ignore

The Study That Actually Simulated Real Work

What They Measured (And Why It Matters)

The Findings That Challenge Both Hype and Fear

The Critical Caveat: AI Amplifies, It Doesn't Replace

The Long-Term Risks Nobody's Measuring

What Marketing and Growth Teams Should Learn

Study Methodology, Approach, and Key Findings

Methodology

Measurements

Key Findings

Identified Risks

Critical Conclusion

Google AI Studio's New Build Interface: Lowering the Floor Without Raising the Ceiling

CatAttack Study Exposes Vulnerabilities in AI Reasoning Models

What the Reverse CAPTCHA Study Means for Marketers

Industries We Primarily Support

Our Ideas

Our Services