Over the past year, I've heard the same explanation come up from community members trying to make AI useful beyond early experimentation. At some point—usually right when they try to depend on it—the conversation shifts in a predictable way: "AI is moving too fast to keep up." I often feel the same way using Rovo and other AI systems. Conversations I had at Team '26 in early May only amplified what I'd been hearing. It's not just a Rovo thing. The same pattern shows up across every AI tool people (including me) are trying to depend on. The sentiment is reasonable. But when you look more closely at where things actually start to break down, speed isn't doing most of the damage. In fact, it's a good thing. The issue is the mental model people are using to understand how these systems work, and that's a harder problem to solve because it doesn't announce itself directly.
Most software conditions you to expect a certain kind of stability. The same input produces the same output. You learn what the system does, where things live, and how inputs translate into outputs. Over time, you stop thinking about the tool because your mental model is good enough to rely on. It doesn't need to be perfect—it just needs to hold. AI doesn't give you that kind of footing. The same prompt can produce a different result tomorrow, and not because anything obvious changed. Capabilities expand without clearly defined boundaries, improvements show up unevenly, and sometimes they disappear. You notice that responses are slightly more coherent, or that the system aligns with your intent a little more cleanly than it did before. Then, in the next interaction, it doesn't. Nothing obvious changed, but the behavior isn't quite what you expected. You gain enough consistency to sense what's happening, but not enough to trust that understanding. This combination creates false confidence that lasts just long enough to fool you into thinking you're finished.
That's usually where people start filling in the gaps with assumptions. It's a natural reaction. If something behaves as if it has memory, you assume it remembers. If structure appears in the output, you assume that structure carries meaning. If prior interactions seem to influence current output, you assume there's continuity behind the scenes. And sometimes those assumptions are partially correct, which makes them even harder to challenge. The problem is that "partially correct" isn't something you can build on. It's enough to get through a few interactions, but not enough to support anything that needs to be repeatable.
That's the point where confidence starts to drop, and it often gets misdiagnosed as a capability issue. In reality, what's changed is the tolerance for inconsistency. When you're experimenting, something that is usually right feels impressive. When you're depending on it, that same behavior turns into friction very quickly. And if the mental model underneath is off—even slightly—getting better at using the tool doesn't compound in a meaningful way.
This is the part that doesn't show up in demos. You only see it when teams try to rely on the output. Content gets generated, but it doesn't get reused in any meaningful way. What worked once doesn't hold in the next context, so teams stop building on it and start checking each other's work instead. Over time, it becomes clear there isn't a consistently shared definition of what "good" looks like. Just a growing amount of second-guessing.
In some cases, this actually makes teams slower. Work gets revalidated before it can be trusted, outputs get rewritten instead of reused, and any work that matters ends up going through the same checks it would have without AI. In practice, teams often stop at individual productivity and fail take the next step.
That's the gap. It's not about whether AI can produce something useful. It's about being able to rely on it on a daily basis. This isn't new. As James Bessen (Boston University) has pointed out, new technologies often require extensive organizational learning before their benefits show up.
Underneath that, the more persistent issue is that people are trying to map these systems to something familiar, a chatbot, a search engine, or a system with memory, for example.
In practice, AI exhibits characteristics of all of those, depending on the situation, the input, and how context is interpreted in that moment. That hybrid behavior is what makes it powerful. It's also what makes these systems difficult to reason about using patterns that worked when software did the same thing every time.
Once you're anchored to an incomplete or incorrect model, the system starts to feel unpredictable. Even when it's behaving consistently on its own terms.
Most teams are trying to fix this in ways that used to work: training, standardization, better prompts. Those approaches assume the system is stable enough for improvements to carry forward, and that assumption doesn't quite hold. Feature-based training assumes stability. Sharing examples assumes those examples will generalize cleanly. Neither assumption is particularly safe. You can understand what a capability does and still not know when it will behave in a way you can rely on. If the mental model is even slightly off, getting better at using the tool doesn't remove the inconsistency. It just makes the inconsistency harder to ignore.
The teams that get past this don't have a fundamentally better understanding of the technology. They've made three practical moves.
They stop assuming shared context. Most of the time you spend with AI, you're filling in context you don't realize you're filling in. You know what "the Q3 release" means in your world. You know whether "blocker" means a P0 bug or a stuck approval. The system doesn't. The teams that get value spell it out—they tell Rovo what project they mean, what "important" looks like for this question, and where to look. They don't assume the Atlassian connection brought the right context with it.
They separate exploring from depending. Rovo Chat is good for exploring. Agents are where you build something you intend to rely on. Those are different jobs. Most teams blur the line—they get something useful in Chat, decide that's the workflow, and never pin it down. When the same prompt produces something different next month, they conclude the tool is inconsistent. What actually happened is that they treated an exploration as if it were a process. The move is to draw the line yourself. Use Chat to figure out what you want. When you know, move it into an Agent with named sources, explicit instructions, and a behavior you've tested more than once.
They tell the system what matters. "Summarize this page" is a hopeful prompt. The system might surface what you'd consider important. It might not. And what it surfaces today may not be what it surfaces tomorrow. The deliberate version is harder to write and shorter to fix: "Summarize this page focusing on decisions and open questions. Skip background context." Now the output has a target. Now the inconsistency has somewhere to land.
When I'm not sure what the deliberate version should be, I ask Rovo to ask me clarifying questions first, then rewrite the prompt itself. The version it gives back is almost always better than the one I would have written.
None of that depends on keeping up with every change. If anything, it assumes you won't. That's the work.
Most of the change is small. Someone who used to retype the same question expecting the same answer starts writing prompts the way you'd brief a sharp new hire who walked in this morning—naming the project, the deadline, what "important" means in this context. The output gets steadier. Nothing about Rovo changed.
A team that had been pasting Chat responses into Confluence as a record of decisions stops doing that. They notice the same question coming back different a week later and realize they were treating an exploration as a source of truth. The Chat conversations don't go away. They just stop being the artifact. The work that needs to hold gets pinned into an Agent.
Raghavendran Narayanan, writing in the Community forum, ran into this directly. He'd been tightening prompts and restricting his agent's Skills and Knowledge—and the agent still wandered. So he stopped reaching for tighter prompts and built a structural fix instead: an empty default scenario that acts as a kill-switch when the agent drifts. As he put it: "This is a structural constraint, not a prompt-based one—and structural constraints always win." The shift wasn't a better prompt. It was a different layer to push on.
None of these are dramatic. That's the point.
There's a tendency to frame all of this as a gap in capability, or as a temporary phase while the technology matures. That framing misses the underlying issue. The friction isn't just about what the system can or can't do. It's about the gap between how it behaves and how people expect it to behave based on everything they've learned from other tools. That gap is where most of the confusion shows up. It's also where the shift happens—once people stop trying to force the system into a familiar shape and start adjusting how they interact with it instead.
This article is part of an ongoing AI/Rovo Article Series exploring responsible AI adoption.
See also…
Dave Rosenlund _Trundl_
2 comments