Why Chatbots Fail on Complex Tickets (And What to Do About It)

Most AI chatbots handle simple questions well but break down on multi-step cases. Here is what goes wrong and how persistent agents fix it.

There is a pattern we see every time a company deploys a chatbot for customer support. The first demo goes great. Simple questions get good answers. The team is excited.

Then real tickets start flowing in.

The 40% wall

A customer writes: "I was charged twice for my last order, and I also want to change the shipping address on my next order." Two problems in one message. The chatbot picks one, ignores the other, and the customer writes back frustrated.

We tracked this across several AGO deployments. On average, 35% of support tickets contain more than one intent. And when a traditional chatbot hits one of these multi-intent messages, the resolution rate drops from around 80% to below 40%.

That is a massive gap. And it explains why many companies end up with their chatbot handling only the easy questions while humans still do the heavy lifting.

What actually breaks

It is not that the AI cannot understand the second question. The problem is architectural. Most chatbot frameworks are built around a single request/response loop. The user sends a message, the bot picks an intent, calls one tool, and responds.

When the user packs two requests into one message, the bot has to pick. And it almost always picks the first one, because that is what appears first in the text.

Even if you build intent splitting into the pipeline, you run into a second problem: the two tasks might depend on each other. The customer wants a refund on the duplicate charge and wants to know if the refund changes the total on their next order. You cannot answer the second question without completing the first.

How persistent agents handle this differently

A DeepAgent does not work in a single request/response loop. It receives the full message, breaks it into sub-tasks, and executes each one sequentially while keeping the full context.

For the example above, the agent would:

Look up the order and confirm the duplicate charge
Process the refund
Check the next order and recalculate the total with the refund applied
Update the shipping address on the next order
Reply to the customer with everything resolved in one message

The key difference: the agent maintains state across all these steps. The refund amount from step 2 is available in step 3. The shipping address change in step 4 can reference the updated order from step 3.

What we learned in production

After switching from a traditional chatbot to DeepAgents on multi-intent tickets, we saw the resolution rate on those cases go from 38% to 67%. Not perfect, but a meaningful jump.

The cases that still fail are mostly ones where the customer references something ambiguous ("the thing I ordered last time") and the agent picks the wrong interpretation. We are working on better disambiguation, but that is a topic for another post.

The takeaway: if your chatbot metrics look great on simple questions but your overall resolution rate is stuck, check how it handles messages with more than one request. That is probably where the gap is.

Want to see how DeepAgents handle complex tickets in your environment? Get in touch.