Are You Really Deploying AI Agents In Production?

I keep seeing many posts and having discussions with people about how their organizations are "deploying AI agents in production." A LangChain survey of 1300+ professionals showed that 57% of their organizations are using AI agents in production [1]. My gut reaction was that this is just the next buzzword. It's what board members can't wait to share with their peers to show how they are on the bandwagon of AI. If you're reading this and feel attacked by my reaction, that's good. It means I'm tapping into the uncertainty. Let me explain why "AI agents in production" is the new trendy fashion statement and give you a more nuanced perspective.

First, let's take a look at the trend. There is no doubt that AI adoption is here to stay, and honestly that is great. It is a tool, and we should definitely adopt new tools, especially given the potential for this one to improve business value. A report from KPMG at the end of 2025 showed that companies with AI agents in production was up to 42% from 11% two quarters before [2]. That's extremely high adoption for a technology that few fully understand, especially one where the outputs are non-deterministic. Another data set showed 79% of respondents have adopted AI agents in production "to some extent" [3]. That qualifier 'to some extent' is exactly what I want to focus on here.

We need to tease out this idea: what is an AI agent? Let's start by taking away the AI part. When you think of agency, what comes to mind? For me, it implies autonomy, consistency, and accountability. My assessment is that the 'agents' you have deployed are lacking at least one of these three things. So, by this definition, can you call them agents? No, but of course it sounds impressive, and technically you can call them whatever you'd like in the end.

AI systems do not have autonomy. Otherwise, why would your business need you? Autonomy implies the abilities to reason, make decisions, and act with free will. Do the AI systems you deploy have autonomy? Perhaps some do with a limited sense, but not likely in the way you are thinking about or claiming. If they were autonomous, then why did this survey report that the number of AI systems requiring human validation rose from 22% to 63% [4]?

AI systems are not consistent. I recently wrote about an experiment I performed to test the consistency of Claude's Sonnet 4.6 model [5]. I created a simple prompt for a user fetch task, one that any developer would be likely to ask. I prompted the model 5 times in a row and got 5 different answers. The differences were enough that they might break the systems consuming the API responses. This one is easy to test yourself. Ask your agent to perform the same task many times, capture the outputs, and see the variability for yourself.

AI systems are not accountable. What are you going to do if your AI system sends an inappropriate response to a customer's inquiry? What if it overwrites customer data that your contract states must be kept auditable for 7 years? How are you going to hold the system accountable? The 'agent' doesn't care if you threaten to stop paying for it. Put your agent in front of the customer and see if it can convince them not to leave for one of your competitors.

So, if you're following, I don't think most organizations are being accurate when they say they have agents in production. You might have AI-integrated systems that are doing a job with minimal or no supervision, but I don't think that qualifies as an agency. To give you some more nuance, here is a general taxonomy of AI systems:

Tier 1 - Bare LLM

This is essentially an advanced autocomplete. Give it some inputs, like an email chain, and it will generate a summary.

Tier 2 - LLM with scripted tool use

At this level, LLMs are equipped with tools that they use in a predefined sequence to help augment or improve the response, or customize it for the use case.

Tier 3 - LLM with reactive tool use

Perhaps you are using a tool like Claude or ChatGPT where the model can use internal tools to help generate an answer. This allows them to perform research, read images, and produce various forms of documents. Tool choices are decided based on your prompts.

Tier 4 - LLM with persistent memory and multi-step executions

Tools like Claude Code are able to produce outputs, test them, analyze both the outputs and the test results, and modify the outputs essentially working with feedback loops. Memory can persist across execution steps and the data can be modified as needed for use with different tools used by the model.

Tier 5 - Autonomous LLMs working towards self-directed goals

This is the version that would most closely resemble agency. These models would be able to have a goal, ingest and parse data themselves, respond to environment changes, and reason safely and effectively to accomplish the goal. These would be the steps before general or narrow-band super intelligence. Any versions of these that do exist are confined to top research labs, or more likely, still theoretical.

Of course, I am not everywhere, but my educated guess would be most companies are deploying systems that are at the level of Tier 2, maybe Tier 3. If you've made it this far, I think you would agree we wouldn't consider that these systems have agency in the true sense of the word, even though they are called agents. So, I would encourage you to take a more nuanced approach to your view on AI systems. Stop feeding the buzzword machine. Besides, investors are going to have more confidence if you can describe the nuance, and not just regurgitate the buzzwords.

My hope was that all of this would hit home with you, take you a couple steps deeper than you have gone, and set you up for even deeper conversations. Now, if you want to keep telling people you are "deploying agents in production", despite knowing better, no problem. The risk of a misnomer or continued ignorance may not be so large for some. However, deploying AI systems and allowing them to act even semi-autonomously without proper safeguards might be catastrophic to your organization.

You don't have to just take it from me. Let's look at some evidence. The RAND Corporation produced an analysis showing over 80% of AI projects failed to deliver the intended business value in 2024 [6]. That is a large percentage of projects that are failing to deliver business value, which is one of the less harsh failure modes that can occur. Unfortunately, it appears to be getting worse. A 2025 MIT-funded project reported that 95% of custom enterprise AI tools never reached production [7]. Failing to deliver on promises can still sting, as it often leads to reputation damage. However, you can recover from it. It takes an immense amount of effort. For systems that do reach production and deliver value, even worse risks can emerge, such as legal risks. In 2024, Air Canada was sued and lost because its AI agent misled a customer to buy full-price tickets instead of offering the bereavement rates [8]. I would imagine that you don't want your company to be the next AI failure news headline.

So before you walk into your meeting and tell them you're deploying AI agents in production, ask yourself which tier you're actually deploying. Then decide if that's still the story you want to tell.

Are You Really Deploying AI Agents In Production?

References