Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Checking Agents at scale

Nadia Volanovsky
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
May 11, 2026

As an ALM specialist working with Rovo and AI-driven workflows, I’ve been thinking about a growing challenge in the industry:

How are we supposed to properly verify and test AI agents at scale?

Traditional QA and ALM practices were built for deterministic systems.
But agents behave differently:

  • reasoning is probabilistic

  • outputs can vary

  • edge cases are almost infinite

  • tool usage and memory introduce new failure points

So I’m curious how the community is approaching this.

Some questions I’m exploring:

  • How can we systematically test AI agents while covering realistic scenarios and edge cases?

  • Can we build “QA agents” that evaluate and validate other agents?

  • Are there effective methods today for validating reasoning, workflow execution, and tool orchestration?

  • Can we estimate confidence or correctness beforehand?
    Example: identifying whether a response is likely production-safe or only “50% reliable”

Would love to hear what others in the Rovo community are thinking about this. 

 

 

@Dikla Tavor-Haimpur 

0 answers

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events