20 Comments
User's avatar
Abhay Ghatpande's avatar

This is a remarkable post, @Steve Newman. Thank you for putting in the time and effort. Thank you for keeping it balanced and objective, and not resorting to hyperbole on either side. Loads to think about and ruminate over the weekend. I’ll have to read it a couple of time again I think to digest all the implications.

Expand full comment
Dario Calia's avatar

First, thank you, Steve, as Abhay Ghatpande suggested, for this high-quality post, which provides a balanced and objective review of the study.

One of the striking elements is that developers consistently overestimate AI's impact on their productivity by nearly 40 percentage points (from a -19% actual to a +20% perceived increase), highlighting that subjective productivity assessments in the AI era may be fundamentally unreliable without objective measurements. With all the possible biases at play, this is not surprising and reminded me of some of the insights from https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow by https://en.wikipedia.org/wiki/Daniel_Kahneman

It also helps reinforce the importance of measuring ROI for both objective and subjective metrics to understand the benefit and impact of AI that organizations leverage.

Expand full comment
Sean Trott's avatar

Very interesting study and great post!

I was surprised by the headline result, but the explanation does make sense and tracks with my own experience. I find ChatGPT pretty useful for coding, but: 1) I’m not a professional software engineer, my coding is for scientific research; and 2) it’s most useful when I’m trying to learn something new. I’ve definitely wasted time trying to get ChatGPT to do something that I could’ve done myself. (I’d say it’s analogous in those situations to just kind of brute forcing various changes in the code and seeing what works.)

Expand full comment
Tedd Hadley's avatar

Knew you'd flag this! Completely true. I've been using models (mainly Gemini 2.5 pro) for development in an area where I don't know the API well (MLIR) but at the same time I have decades of programming experience. Exactly right. The LLM does not get the high-level right, but it nails the API.

Sure, if I let the AI lead, it screws it up every time. It is awesome at the API-- where I'm poor -- but it doesn't get the big picture!

METR drives it home: don't be fooled by the AI vast amazing breadth of knowledge. It is NOT an experienced developer; it's an idiot-savant.

I've never worked with a human programmer quite like this.

The answer so far is: trust my instincts, develop the big picture, let the AI fill in the outlines and flesh out the API calls and details. But use now is in no way comfortable. I'm wasting a lot of time cut/pasting, being misled by LLM overconfidence. The paradox of extreme accuracy in detail, but sheer lunacy at the high level. How do I make this work??

Expand full comment
Ethan Heppner's avatar

Very much appreciate the depth of this study!

This also tracks with some credible findings of software developer employment being on the rise again after a brief dip in 2024: https://x.com/econ_b/status/1929924924409536815

Expand full comment
Steve Newman's avatar

I'm not convinced that AI is having enough of an impact to affect hiring patterns yet (though I wouldn't rule out an effect in some early-adopter niches, such as tech startups).

If there is a macro scale effect of AI on dev hiring, I would expect it to be based more on anticipation of needing fewer engineers than an actually realized reality. I don't have any careful analysis to back this up though, it's just my sense of where things are at.

Expand full comment
Ethan Heppner's avatar

Indeed, I wonder if this anticipation already got ahead of reality at some point in 2024, and the uptick in SWEs in 2025 is a correction to that, similar to how Klarna had to walk back its claims of fully automating its customer service workforce.

Another possibility is that AI disrupted the hiring process so much that companies have had to rethink how they post and interview for jobs. I know this was a takeaway in Derek Thompson's recent piece on the hiring crisis for young people: https://www.derekthompson.org/p/young-people-face-a-hiring-crisis

It does seem like there is strong evidence for hiring of entry-level workers slowing down more broadly relative to hiring overall, and this might even fit with one of the more interesting findings from this paper about AI assistance not slowing down shorter-time horizon tasks like it did longer ones. If entry-level workers are more likely to do these sorts of tasks (particularly for more greenfield problems), reduced hiring might be one of the early hard indicators of AI changing the job market. But if the human capacity to ingest more context remains a durable advantage, entry-level hiring could pick back up again. And that does look to be the case at least for now when looking at recent TrueUp data: https://x.com/econ_b/status/1940944269633900812

Expand full comment
Matt Bamberger's avatar

Thank you for this excellent analysis of an excellent paper. The methodology seems very strong and the results are even more interesting for being (at least to me) quite counter-intuitive.

Your observation about jagged capabilities is spot-on, and I'd love to see more work like this that tries to tease apart different factors including types of project, developer experience, and especially skill with using the tools. My instinct is that some developers get much more benefit from the current tools than others, but I'd love to see data on that.

Expand full comment
jurita's avatar

👍

Expand full comment
kew's avatar

In fact, I don't think so.

Expand full comment
EarlyAdopter's avatar

@grok, is this real?

Expand full comment
h5's avatar

Great write up. Interesting to see an RCT on this with robust methodology. Anecdotally, I'm regularly surprised by how "vibe-coding" accomplishes nearly the same (or, at best, marginally more) quantity of work when compared to "traditional" programming. I find that instead of writing code, I now allocate that time writing exhaustive prompts and reviewing the AI-generated code. Regardless, it's definitely a boon for productivity: (1) run parallel Claude Code sessions, (2) automate AWS CLI usage, and (3) use Plan Mode to prime it for my design goals.

Related: https://en.wikipedia.org/wiki/Productivity_paradox

Expand full comment
Steve Newman's avatar

What sort of work are you primarily using AI for? New project vs. large mature codebase, things you're very familiar with vs. learning new language / libraries / etc.?

Expand full comment
h5's avatar

Yes, I use it to maintain and develop existing projects using tools I'm strongly familiar with. And if the AI tries to bleed out of my circle of competence, I cancel and send a new prompt specifying my preferred language/runtime/dependency etc.

Expand full comment
Ryan Jessell's avatar

Really thoughtful post that summarized the topic well enough for even a non-technical audience (i.e., me) to feel they are grasping it well. One thing that comes to mind, which I think you hint at near the end of your post, is whether there was a boost in deliverable quality for the AI Allowed tasks. Is it possible that there is any time saved on the back-end QA, etc. that could offset some of the productivity loss?

My other observation was in the distribution of how the subjects indicated they were using AI. Most appeared to be "experimenting", which I would expect to lead to a much less efficient deployment of the tools. Was there any differential in productivity gain/loss for those who were using tools proficiently vs. experimenting? In general, this tells me that engineering managers need to allocate a meaningful amount of time and budget toward AI training to optimize the effectiveness of their engineers in deploying AI tools.

Thanks again for a very enlightening post!

Expand full comment
Steve Newman's avatar

To my understanding, the study was not able to gather any data on code quality, beyond the fact that (I believe) both the AI Allowed and AI Disallowed work quickly passed through code review.

The AI Allowed tasks resulted in more lines of code being submitted, but it's unclear whether this suggests more complete work or code duplication / bloat.

Expand full comment
Jonathan Pohl's avatar

Great article. Thanks for posting. This confirms what I also see. Low level(means going too deep into details) AI adoption creates more drag, than tailwind.

This is partially reigniting old well known IT issue. The best is the enemy of the good. Thriving for perfection often stalls project and renders them obsolete in volatile environment.

Expand full comment
Sam Atman's avatar

This tracks with my own experience. What I’ve found is that chatbots don’t save any time, they save effort, and only on specific sorts of task. These days I have a lot more data munging and automation scripts in my repertoire, because cajoling an LLM into producing them is nearly effortless compared to doing it myself, and I don’t care about robustness, or correctness in the face of unexpected input, two things chatbots are bad at.

When I resort to them for ‘real programming’, it often turns out to be a mistake. I sometimes find myself in the absurd and embarrassing situation of arguing with a computer program, as though it were a misguided junior who might benefit from being shown his mistake, but of course LLMs don’t learn in the usual sense of the term, and this is wasted effort. Much like Larry Ellison, anthropomorphizing an LLM is a mistake.

Expand full comment
Mike Hernandez's avatar

It's the dopamine hit that makes developers think (feel like) they're going 20% faster when using these AI tools.

Whether its tab completion, or feeding instructions through a prompt, this type of experience I'd bet makes one feel like they're in a flow state faster. Moreso than sitting in a meeting with a product manager or reading a design document.

Would love to see the biological science behind what a person is going through when using AI tools.

>But perhaps the most important takeaway is that even as developers were completing tasks 19% more slowly when using AI, they thought they were going 20% faster

Expand full comment
Luise's avatar

Pretty good article, as it clearly stated the result, procedures, and potential limitations of the research. After reading this article, my biggest takeaway is that 1) AI is still most suited to some simple, independent, and greenfield coding work, instead of those with huge database and memory because 2) current AI tools still cannot process and hold large amount of workflow, coding, and data base, so if the developers need to explain to AI and use AI to code for some higher-level and complicated coding work, they would spend a huge amount of time in reviewing, correcting, waiting for AI to answer to their prompt, which in reverse reduced their efficiency. 3) However, AI coding can still reserve people's energy without intense thinking processes like in the past.

From my standpoint, for future AI development, those LLM AI developers should focus on how to increase the memories of AI without slowing or crashing it down, so that the AI could do more higher-level work, or would it be possible to encrypt those AI chats into different database, so that they could know how the data base looks like, what changes have been done, to give more constructive advice in coding or designing?

- I'm not an SDE, but a business consultant interested in AI, so don't take what I've said too seriously and feel free to deny my point of view by your reasoning. Always feel good to learn!

Expand full comment