Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Checking Agents at scale

Nadia Volanovsky
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
May 11, 2026

As an ALM specialist working with Rovo and AI-driven workflows, I’ve been thinking about a growing challenge in the industry:

How are we supposed to properly verify and test AI agents at scale?

Traditional QA and ALM practices were built for deterministic systems.
But agents behave differently:

  • reasoning is probabilistic

  • outputs can vary

  • edge cases are almost infinite

  • tool usage and memory introduce new failure points

So I’m curious how the community is approaching this.

Some questions I’m exploring:

  • How can we systematically test AI agents while covering realistic scenarios and edge cases?

  • Can we build “QA agents” that evaluate and validate other agents?

  • Are there effective methods today for validating reasoning, workflow execution, and tool orchestration?

  • Can we estimate confidence or correctness beforehand?
    Example: identifying whether a response is likely production-safe or only “50% reliable”

Would love to hear what others in the Rovo community are thinking about this. 

 

 

@Dikla Tavor-Haimpur 

1 answer

1 accepted

0 votes
Answer accepted
Rebekka Heilmann _viadee_
Community Champion
May 12, 2026

Hi @Nadia Volanovsky - welcome to the Community,

so Atlassian's current answer to this question are evals: https://community.atlassian.com/forums/Atlassian-AI-Rovo-articles/Introducing-Evaluations-for-Rovo-Agents/ba-p/3202093

I've not done much with it myself and it's early stages, but with that you can at least have repeatable tests against your "gold standard".

 

I see two main problems at the moment

1) Agents are not using their skills like their supposed to

A lot of the times, Agents sort of ignore their skills so pages are not created, comments not published or work items not updated - even though the Agent claimed they did the work. So: you can't actually check the end result, only their text answer, which may be a blatant lie.

2) There is no API for Studio

So: you can't check Agents' instructions and setups at scale to validate if the Agent is following guidelines like

  • allowed use case
  • filled in descriptions with the right info
  • naming conventions
  • restrictions to Manager and user permissions
  • ...

We'd need a public API to pull all Agents.. Open feature request is here: https://jira.atlassian.com/browse/ROVO-516

The alternative: use UI Automation and click through everything. But that would be a nightmare to maintain as the UI keeps changing

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events