This is a remarkable post, @Steve Newman. Thank you for putting in the time and effort. Thank you for keeping it balanced and objective, and not resorting to hyperbole on either side. Loads to think about and ruminate over the weekend. I’ll have to read it a couple of time again I think to digest all the implications.
First, thank you, Steve, as Abhay Ghatpande suggested, for this high-quality post, which provides a balanced and objective review of the study.
One of the striking elements is that developers consistently overestimate AI's impact on their productivity by nearly 40 percentage points (from a -19% actual to a +20% perceived increase), highlighting that subjective productivity assessments in the AI era may be fundamentally unreliable without objective measurements. With all the possible biases at play, this is not surprising and reminded me of some of the insights from https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow by https://en.wikipedia.org/wiki/Daniel_Kahneman
It also helps reinforce the importance of measuring ROI for both objective and subjective metrics to understand the benefit and impact of AI that organizations leverage.
I was surprised by the headline result, but the explanation does make sense and tracks with my own experience. I find ChatGPT pretty useful for coding, but: 1) I’m not a professional software engineer, my coding is for scientific research; and 2) it’s most useful when I’m trying to learn something new. I’ve definitely wasted time trying to get ChatGPT to do something that I could’ve done myself. (I’d say it’s analogous in those situations to just kind of brute forcing various changes in the code and seeing what works.)
Knew you'd flag this! Completely true. I've been using models (mainly Gemini 2.5 pro) for development in an area where I don't know the API well (MLIR) but at the same time I have decades of programming experience. Exactly right. The LLM does not get the high-level right, but it nails the API.
Sure, if I let the AI lead, it screws it up every time. It is awesome at the API-- where I'm poor -- but it doesn't get the big picture!
METR drives it home: don't be fooled by the AI vast amazing breadth of knowledge. It is NOT an experienced developer; it's an idiot-savant.
I've never worked with a human programmer quite like this.
The answer so far is: trust my instincts, develop the big picture, let the AI fill in the outlines and flesh out the API calls and details. But use now is in no way comfortable. I'm wasting a lot of time cut/pasting, being misled by LLM overconfidence. The paradox of extreme accuracy in detail, but sheer lunacy at the high level. How do I make this work??
I'm not convinced that AI is having enough of an impact to affect hiring patterns yet (though I wouldn't rule out an effect in some early-adopter niches, such as tech startups).
If there is a macro scale effect of AI on dev hiring, I would expect it to be based more on anticipation of needing fewer engineers than an actually realized reality. I don't have any careful analysis to back this up though, it's just my sense of where things are at.
Indeed, I wonder if this anticipation already got ahead of reality at some point in 2024, and the uptick in SWEs in 2025 is a correction to that, similar to how Klarna had to walk back its claims of fully automating its customer service workforce.
Another possibility is that AI disrupted the hiring process so much that companies have had to rethink how they post and interview for jobs. I know this was a takeaway in Derek Thompson's recent piece on the hiring crisis for young people: https://www.derekthompson.org/p/young-people-face-a-hiring-crisis
It does seem like there is strong evidence for hiring of entry-level workers slowing down more broadly relative to hiring overall, and this might even fit with one of the more interesting findings from this paper about AI assistance not slowing down shorter-time horizon tasks like it did longer ones. If entry-level workers are more likely to do these sorts of tasks (particularly for more greenfield problems), reduced hiring might be one of the early hard indicators of AI changing the job market. But if the human capacity to ingest more context remains a durable advantage, entry-level hiring could pick back up again. And that does look to be the case at least for now when looking at recent TrueUp data: https://x.com/econ_b/status/1940944269633900812
Thank you for this excellent analysis of an excellent paper. The methodology seems very strong and the results are even more interesting for being (at least to me) quite counter-intuitive.
Your observation about jagged capabilities is spot-on, and I'd love to see more work like this that tries to tease apart different factors including types of project, developer experience, and especially skill with using the tools. My instinct is that some developers get much more benefit from the current tools than others, but I'd love to see data on that.
Great write up. Interesting to see an RCT on this with robust methodology. Anecdotally, I'm regularly surprised by how "vibe-coding" accomplishes nearly the same (or, at best, marginally more) quantity of work when compared to "traditional" programming. I find that instead of writing code, I now allocate that time writing exhaustive prompts and reviewing the AI-generated code. Regardless, it's definitely a boon for productivity: (1) run parallel Claude Code sessions, (2) automate AWS CLI usage, and (3) use Plan Mode to prime it for my design goals.
What sort of work are you primarily using AI for? New project vs. large mature codebase, things you're very familiar with vs. learning new language / libraries / etc.?
Yes, I use it to maintain and develop existing projects using tools I'm strongly familiar with. And if the AI tries to bleed out of my circle of competence, I cancel and send a new prompt specifying my preferred language/runtime/dependency etc.
Really thoughtful post that summarized the topic well enough for even a non-technical audience (i.e., me) to feel they are grasping it well. One thing that comes to mind, which I think you hint at near the end of your post, is whether there was a boost in deliverable quality for the AI Allowed tasks. Is it possible that there is any time saved on the back-end QA, etc. that could offset some of the productivity loss?
My other observation was in the distribution of how the subjects indicated they were using AI. Most appeared to be "experimenting", which I would expect to lead to a much less efficient deployment of the tools. Was there any differential in productivity gain/loss for those who were using tools proficiently vs. experimenting? In general, this tells me that engineering managers need to allocate a meaningful amount of time and budget toward AI training to optimize the effectiveness of their engineers in deploying AI tools.
To my understanding, the study was not able to gather any data on code quality, beyond the fact that (I believe) both the AI Allowed and AI Disallowed work quickly passed through code review.
The AI Allowed tasks resulted in more lines of code being submitted, but it's unclear whether this suggests more complete work or code duplication / bloat.
Great article. Thanks for posting. This confirms what I also see. Low level(means going too deep into details) AI adoption creates more drag, than tailwind.
This is partially reigniting old well known IT issue. The best is the enemy of the good. Thriving for perfection often stalls project and renders them obsolete in volatile environment.
This tracks with my own experience. What I’ve found is that chatbots don’t save any time, they save effort, and only on specific sorts of task. These days I have a lot more data munging and automation scripts in my repertoire, because cajoling an LLM into producing them is nearly effortless compared to doing it myself, and I don’t care about robustness, or correctness in the face of unexpected input, two things chatbots are bad at.
When I resort to them for ‘real programming’, it often turns out to be a mistake. I sometimes find myself in the absurd and embarrassing situation of arguing with a computer program, as though it were a misguided junior who might benefit from being shown his mistake, but of course LLMs don’t learn in the usual sense of the term, and this is wasted effort. Much like Larry Ellison, anthropomorphizing an LLM is a mistake.
It's the dopamine hit that makes developers think (feel like) they're going 20% faster when using these AI tools.
Whether its tab completion, or feeding instructions through a prompt, this type of experience I'd bet makes one feel like they're in a flow state faster. Moreso than sitting in a meeting with a product manager or reading a design document.
Would love to see the biological science behind what a person is going through when using AI tools.
>But perhaps the most important takeaway is that even as developers were completing tasks 19% more slowly when using AI, they thought they were going 20% faster
Pretty good article, as it clearly stated the result, procedures, and potential limitations of the research. After reading this article, my biggest takeaway is that 1) AI is still most suited to some simple, independent, and greenfield coding work, instead of those with huge database and memory because 2) current AI tools still cannot process and hold large amount of workflow, coding, and data base, so if the developers need to explain to AI and use AI to code for some higher-level and complicated coding work, they would spend a huge amount of time in reviewing, correcting, waiting for AI to answer to their prompt, which in reverse reduced their efficiency. 3) However, AI coding can still reserve people's energy without intense thinking processes like in the past.
From my standpoint, for future AI development, those LLM AI developers should focus on how to increase the memories of AI without slowing or crashing it down, so that the AI could do more higher-level work, or would it be possible to encrypt those AI chats into different database, so that they could know how the data base looks like, what changes have been done, to give more constructive advice in coding or designing?
- I'm not an SDE, but a business consultant interested in AI, so don't take what I've said too seriously and feel free to deny my point of view by your reasoning. Always feel good to learn!
I definitely would not have expected this gap between perception and measured reality, and that's an important data point!
If I had to take a wild guess, I would at least wonder if some of the the perceived productivity increase might come from feeling better after completing a task due to lowering the mental overhead of completing it. In which case, it is possible (IDK if this was tested or is being tested?) that the AI users are able to work on more tasks per day or week without hitting their mental focus limits. Theoretically, this could even show up as having more mental horsepower available to apply to the AI-disallowed tasks than would otherwise have been the case. I wonder if there was or could be any analysis of the timing and order of different tasks on different days. In my own (non-coding) job, there is some mental cost of switching between my AI and non-AI workflow styles, and some of the benefit of using AI seems to be feeling less stressed and drained throughout the day so that I'm able to better direct my focus when needed.
There's one other question/objection I had as soon as I read this that AFAICT is not addressed: How does this compare to the (oft-discussed) slow down that comes from hiring someone new to work on a mature project, getting them up to speed, and delegating similar tasks to them? How much initial effort went into 'getting the AI up to speed' compared to that?
It's an interesting question whether subjects could have chosen whether to start on an AI Allowed or AI Disallowed task depending on their energy level. I don't know enough about the study design to understand whether that would be possible.
I presume that the need to get the AI up to speed is a big issue. Power users of AI coding tools talk about ways to alleviate this, such as placing "here's how we do things in this repository" notes in a cursor.md or claude.md file that the AI will consult for every task.
It turns out this is addressed in the paper! See https://x.com/joel_bkr/status/1944806218985562439. In brief: yes, it's possible that this played a role; the paper lists it as a factor "with unclear effect on slowdown". Search the paper for references to "issue completion order".
FWIW, over the course of this year I've been leading the AI Tool Selection and Adoption team for my own (small company, non-coding) employer, and one of the medium terms questions has been "How and when do we get the AI from being like an army of untrained smart interns to being like trained new hires?" Now we're starting to get data on how our distribution of work hours spent, total output per person, output vs AI usage variation, and quality of output (hardest to measure) are changing for different kinds of tasks. It's obviously a work in progress, and not in any way controlled like a study, but it's how I'm used to thinking about this kind of thing.
Also, I think that even if the participants weren't ordering tasks in such a biased way (even unconsciously), there could still be an effect if not having to spend as much energy on AI-assisted tasks left them with more energy on average for other tasks. I doubt such an effect would be anywhere near this large, but it might exist.
I don't think "having energy left over" could have skewed the results in the sense of reducing the time they needed for AI Disallowed tasks, because AI Allowed tasks should have benefited more or less as much (either category of task would be equally likely to be preceded by a restful AI Allowed task, barring odd scenarios such as a developer choosing to alternate task categories, which even then would be limited e.g. because chance would often result in a different number of tasks in each category).
But independent of that, it's possible that the measured time for AI Allowed tasks overestimates the fraction of the daily productive work capacity consumed by those tasks.
1) I'd love to see a greater breakdown on what the AI-permitted coders were spending time on aside from AI-specific tasks like prompting and waiting on the AI. Debugging, integration, refactoring...? Also in raw time and not a percentage.
2) Part of this reminds me of how for the first decade or so (mid-90s to the mid-00s) when PCs were on every office desk, productivity didn't move much and then it increased significantly. Not sure if that was a training/comfort issue, finding more efficient ways to do tasks (typing and printing a memo in a word processor vs. sending an email), or finding effective ways to repurpose the time saved versus sitting idle. Some of the learning curve matters took that decade+ to get through.
Breakdowns in raw time (not a percentage) are given down in an appendix E.4 of the paper (page 32). But only in the same categories used in the main paper, not quite as fine-grained as you suggest.
You discussed a lot of things I'd privately thought about AI, which is that outsourcing knowledge will inevitable slow down the process of getting things done. Coding isn't a mechanical task stitching (which nonetheless still needs humans at the machine) and it isn't something humans do badly, like detecting cancer cells. It is largely on-the-fly problem solving, an art in its own right, and to treat it like something easily replaceable by a machine is going to bring us to the same problems as AI image creation, which is that AI might be able to generate an image but it cannot edit one effectively on its own.
Many generative AI tasks right now require a human babysitter fixing the bots mistakes, when a skilled human doing the same task would just not make the mistakes and therefore automatically save time. Its like giving a job to a prodigy toddler with no work experience and expecting them to have restraint.
Enjoyed the essay! I'm unsurprised that this is what the study concluded, and I do think generative AI is largely a marketing gimmick, but I could very easily be proven wrong.
This is a remarkable post, @Steve Newman. Thank you for putting in the time and effort. Thank you for keeping it balanced and objective, and not resorting to hyperbole on either side. Loads to think about and ruminate over the weekend. I’ll have to read it a couple of time again I think to digest all the implications.
First, thank you, Steve, as Abhay Ghatpande suggested, for this high-quality post, which provides a balanced and objective review of the study.
One of the striking elements is that developers consistently overestimate AI's impact on their productivity by nearly 40 percentage points (from a -19% actual to a +20% perceived increase), highlighting that subjective productivity assessments in the AI era may be fundamentally unreliable without objective measurements. With all the possible biases at play, this is not surprising and reminded me of some of the insights from https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow by https://en.wikipedia.org/wiki/Daniel_Kahneman
It also helps reinforce the importance of measuring ROI for both objective and subjective metrics to understand the benefit and impact of AI that organizations leverage.
Very interesting study and great post!
I was surprised by the headline result, but the explanation does make sense and tracks with my own experience. I find ChatGPT pretty useful for coding, but: 1) I’m not a professional software engineer, my coding is for scientific research; and 2) it’s most useful when I’m trying to learn something new. I’ve definitely wasted time trying to get ChatGPT to do something that I could’ve done myself. (I’d say it’s analogous in those situations to just kind of brute forcing various changes in the code and seeing what works.)
Knew you'd flag this! Completely true. I've been using models (mainly Gemini 2.5 pro) for development in an area where I don't know the API well (MLIR) but at the same time I have decades of programming experience. Exactly right. The LLM does not get the high-level right, but it nails the API.
Sure, if I let the AI lead, it screws it up every time. It is awesome at the API-- where I'm poor -- but it doesn't get the big picture!
METR drives it home: don't be fooled by the AI vast amazing breadth of knowledge. It is NOT an experienced developer; it's an idiot-savant.
I've never worked with a human programmer quite like this.
The answer so far is: trust my instincts, develop the big picture, let the AI fill in the outlines and flesh out the API calls and details. But use now is in no way comfortable. I'm wasting a lot of time cut/pasting, being misled by LLM overconfidence. The paradox of extreme accuracy in detail, but sheer lunacy at the high level. How do I make this work??
Very much appreciate the depth of this study!
This also tracks with some credible findings of software developer employment being on the rise again after a brief dip in 2024: https://x.com/econ_b/status/1929924924409536815
I'm not convinced that AI is having enough of an impact to affect hiring patterns yet (though I wouldn't rule out an effect in some early-adopter niches, such as tech startups).
If there is a macro scale effect of AI on dev hiring, I would expect it to be based more on anticipation of needing fewer engineers than an actually realized reality. I don't have any careful analysis to back this up though, it's just my sense of where things are at.
Indeed, I wonder if this anticipation already got ahead of reality at some point in 2024, and the uptick in SWEs in 2025 is a correction to that, similar to how Klarna had to walk back its claims of fully automating its customer service workforce.
Another possibility is that AI disrupted the hiring process so much that companies have had to rethink how they post and interview for jobs. I know this was a takeaway in Derek Thompson's recent piece on the hiring crisis for young people: https://www.derekthompson.org/p/young-people-face-a-hiring-crisis
It does seem like there is strong evidence for hiring of entry-level workers slowing down more broadly relative to hiring overall, and this might even fit with one of the more interesting findings from this paper about AI assistance not slowing down shorter-time horizon tasks like it did longer ones. If entry-level workers are more likely to do these sorts of tasks (particularly for more greenfield problems), reduced hiring might be one of the early hard indicators of AI changing the job market. But if the human capacity to ingest more context remains a durable advantage, entry-level hiring could pick back up again. And that does look to be the case at least for now when looking at recent TrueUp data: https://x.com/econ_b/status/1940944269633900812
Thank you for this excellent analysis of an excellent paper. The methodology seems very strong and the results are even more interesting for being (at least to me) quite counter-intuitive.
Your observation about jagged capabilities is spot-on, and I'd love to see more work like this that tries to tease apart different factors including types of project, developer experience, and especially skill with using the tools. My instinct is that some developers get much more benefit from the current tools than others, but I'd love to see data on that.
@grok, is this real?
Great write up. Interesting to see an RCT on this with robust methodology. Anecdotally, I'm regularly surprised by how "vibe-coding" accomplishes nearly the same (or, at best, marginally more) quantity of work when compared to "traditional" programming. I find that instead of writing code, I now allocate that time writing exhaustive prompts and reviewing the AI-generated code. Regardless, it's definitely a boon for productivity: (1) run parallel Claude Code sessions, (2) automate AWS CLI usage, and (3) use Plan Mode to prime it for my design goals.
Related: https://en.wikipedia.org/wiki/Productivity_paradox
What sort of work are you primarily using AI for? New project vs. large mature codebase, things you're very familiar with vs. learning new language / libraries / etc.?
Yes, I use it to maintain and develop existing projects using tools I'm strongly familiar with. And if the AI tries to bleed out of my circle of competence, I cancel and send a new prompt specifying my preferred language/runtime/dependency etc.
Really thoughtful post that summarized the topic well enough for even a non-technical audience (i.e., me) to feel they are grasping it well. One thing that comes to mind, which I think you hint at near the end of your post, is whether there was a boost in deliverable quality for the AI Allowed tasks. Is it possible that there is any time saved on the back-end QA, etc. that could offset some of the productivity loss?
My other observation was in the distribution of how the subjects indicated they were using AI. Most appeared to be "experimenting", which I would expect to lead to a much less efficient deployment of the tools. Was there any differential in productivity gain/loss for those who were using tools proficiently vs. experimenting? In general, this tells me that engineering managers need to allocate a meaningful amount of time and budget toward AI training to optimize the effectiveness of their engineers in deploying AI tools.
Thanks again for a very enlightening post!
To my understanding, the study was not able to gather any data on code quality, beyond the fact that (I believe) both the AI Allowed and AI Disallowed work quickly passed through code review.
The AI Allowed tasks resulted in more lines of code being submitted, but it's unclear whether this suggests more complete work or code duplication / bloat.
Great article. Thanks for posting. This confirms what I also see. Low level(means going too deep into details) AI adoption creates more drag, than tailwind.
This is partially reigniting old well known IT issue. The best is the enemy of the good. Thriving for perfection often stalls project and renders them obsolete in volatile environment.
This tracks with my own experience. What I’ve found is that chatbots don’t save any time, they save effort, and only on specific sorts of task. These days I have a lot more data munging and automation scripts in my repertoire, because cajoling an LLM into producing them is nearly effortless compared to doing it myself, and I don’t care about robustness, or correctness in the face of unexpected input, two things chatbots are bad at.
When I resort to them for ‘real programming’, it often turns out to be a mistake. I sometimes find myself in the absurd and embarrassing situation of arguing with a computer program, as though it were a misguided junior who might benefit from being shown his mistake, but of course LLMs don’t learn in the usual sense of the term, and this is wasted effort. Much like Larry Ellison, anthropomorphizing an LLM is a mistake.
It's the dopamine hit that makes developers think (feel like) they're going 20% faster when using these AI tools.
Whether its tab completion, or feeding instructions through a prompt, this type of experience I'd bet makes one feel like they're in a flow state faster. Moreso than sitting in a meeting with a product manager or reading a design document.
Would love to see the biological science behind what a person is going through when using AI tools.
>But perhaps the most important takeaway is that even as developers were completing tasks 19% more slowly when using AI, they thought they were going 20% faster
Pretty good article, as it clearly stated the result, procedures, and potential limitations of the research. After reading this article, my biggest takeaway is that 1) AI is still most suited to some simple, independent, and greenfield coding work, instead of those with huge database and memory because 2) current AI tools still cannot process and hold large amount of workflow, coding, and data base, so if the developers need to explain to AI and use AI to code for some higher-level and complicated coding work, they would spend a huge amount of time in reviewing, correcting, waiting for AI to answer to their prompt, which in reverse reduced their efficiency. 3) However, AI coding can still reserve people's energy without intense thinking processes like in the past.
From my standpoint, for future AI development, those LLM AI developers should focus on how to increase the memories of AI without slowing or crashing it down, so that the AI could do more higher-level work, or would it be possible to encrypt those AI chats into different database, so that they could know how the data base looks like, what changes have been done, to give more constructive advice in coding or designing?
- I'm not an SDE, but a business consultant interested in AI, so don't take what I've said too seriously and feel free to deny my point of view by your reasoning. Always feel good to learn!
I definitely would not have expected this gap between perception and measured reality, and that's an important data point!
If I had to take a wild guess, I would at least wonder if some of the the perceived productivity increase might come from feeling better after completing a task due to lowering the mental overhead of completing it. In which case, it is possible (IDK if this was tested or is being tested?) that the AI users are able to work on more tasks per day or week without hitting their mental focus limits. Theoretically, this could even show up as having more mental horsepower available to apply to the AI-disallowed tasks than would otherwise have been the case. I wonder if there was or could be any analysis of the timing and order of different tasks on different days. In my own (non-coding) job, there is some mental cost of switching between my AI and non-AI workflow styles, and some of the benefit of using AI seems to be feeling less stressed and drained throughout the day so that I'm able to better direct my focus when needed.
There's one other question/objection I had as soon as I read this that AFAICT is not addressed: How does this compare to the (oft-discussed) slow down that comes from hiring someone new to work on a mature project, getting them up to speed, and delegating similar tasks to them? How much initial effort went into 'getting the AI up to speed' compared to that?
It's an interesting question whether subjects could have chosen whether to start on an AI Allowed or AI Disallowed task depending on their energy level. I don't know enough about the study design to understand whether that would be possible.
I presume that the need to get the AI up to speed is a big issue. Power users of AI coding tools talk about ways to alleviate this, such as placing "here's how we do things in this repository" notes in a cursor.md or claude.md file that the AI will consult for every task.
It turns out this is addressed in the paper! See https://x.com/joel_bkr/status/1944806218985562439. In brief: yes, it's possible that this played a role; the paper lists it as a factor "with unclear effect on slowdown". Search the paper for references to "issue completion order".
Thanks!
FWIW, over the course of this year I've been leading the AI Tool Selection and Adoption team for my own (small company, non-coding) employer, and one of the medium terms questions has been "How and when do we get the AI from being like an army of untrained smart interns to being like trained new hires?" Now we're starting to get data on how our distribution of work hours spent, total output per person, output vs AI usage variation, and quality of output (hardest to measure) are changing for different kinds of tasks. It's obviously a work in progress, and not in any way controlled like a study, but it's how I'm used to thinking about this kind of thing.
Also, I think that even if the participants weren't ordering tasks in such a biased way (even unconsciously), there could still be an effect if not having to spend as much energy on AI-assisted tasks left them with more energy on average for other tasks. I doubt such an effect would be anywhere near this large, but it might exist.
I don't think "having energy left over" could have skewed the results in the sense of reducing the time they needed for AI Disallowed tasks, because AI Allowed tasks should have benefited more or less as much (either category of task would be equally likely to be preceded by a restful AI Allowed task, barring odd scenarios such as a developer choosing to alternate task categories, which even then would be limited e.g. because chance would often result in a different number of tasks in each category).
But independent of that, it's possible that the measured time for AI Allowed tasks overestimates the fraction of the daily productive work capacity consumed by those tasks.
Thanks again, that makes sense.
Devil's advocacy here:
1) I'd love to see a greater breakdown on what the AI-permitted coders were spending time on aside from AI-specific tasks like prompting and waiting on the AI. Debugging, integration, refactoring...? Also in raw time and not a percentage.
2) Part of this reminds me of how for the first decade or so (mid-90s to the mid-00s) when PCs were on every office desk, productivity didn't move much and then it increased significantly. Not sure if that was a training/comfort issue, finding more efficient ways to do tasks (typing and printing a memo in a word processor vs. sending an email), or finding effective ways to repurpose the time saved versus sitting idle. Some of the learning curve matters took that decade+ to get through.
Breakdowns in raw time (not a percentage) are given down in an appendix E.4 of the paper (page 32). But only in the same categories used in the main paper, not quite as fine-grained as you suggest.
You discussed a lot of things I'd privately thought about AI, which is that outsourcing knowledge will inevitable slow down the process of getting things done. Coding isn't a mechanical task stitching (which nonetheless still needs humans at the machine) and it isn't something humans do badly, like detecting cancer cells. It is largely on-the-fly problem solving, an art in its own right, and to treat it like something easily replaceable by a machine is going to bring us to the same problems as AI image creation, which is that AI might be able to generate an image but it cannot edit one effectively on its own.
Many generative AI tasks right now require a human babysitter fixing the bots mistakes, when a skilled human doing the same task would just not make the mistakes and therefore automatically save time. Its like giving a job to a prodigy toddler with no work experience and expecting them to have restraint.
Enjoyed the essay! I'm unsurprised that this is what the study concluded, and I do think generative AI is largely a marketing gimmick, but I could very easily be proven wrong.