As an ALM specialist working with Rovo and AI-driven workflows, I’ve been thinking about a growing challenge in the industry:
How are we supposed to properly verify and test AI agents at scale?
Traditional QA and ALM practices were built for deterministic systems.
But agents behave differently:
reasoning is probabilistic
outputs can vary
edge cases are almost infinite
tool usage and memory introduce new failure points
So I’m curious how the community is approaching this.
Some questions I’m exploring:
How can we systematically test AI agents while covering realistic scenarios and edge cases?
Can we build “QA agents” that evaluate and validate other agents?
Are there effective methods today for validating reasoning, workflow execution, and tool orchestration?
Can we estimate confidence or correctness beforehand?
Example: identifying whether a response is likely production-safe or only “50% reliable”
Would love to hear what others in the Rovo community are thinking about this.
Hi @Nadia Volanovsky - welcome to the Community,
so Atlassian's current answer to this question are evals: https://community.atlassian.com/forums/Atlassian-AI-Rovo-articles/Introducing-Evaluations-for-Rovo-Agents/ba-p/3202093
I've not done much with it myself and it's early stages, but with that you can at least have repeatable tests against your "gold standard".
I see two main problems at the moment
1) Agents are not using their skills like their supposed to
A lot of the times, Agents sort of ignore their skills so pages are not created, comments not published or work items not updated - even though the Agent claimed they did the work. So: you can't actually check the end result, only their text answer, which may be a blatant lie.
2) There is no API for Studio
So: you can't check Agents' instructions and setups at scale to validate if the Agent is following guidelines like
We'd need a public API to pull all Agents.. Open feature request is here: https://jira.atlassian.com/browse/ROVO-516
The alternative: use UI Automation and click through everything. But that would be a nightmare to maintain as the UI keeps changing
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.