Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Introducing Evaluations for Rovo Agents

Ship better agents with confidence: Rovo Agent Evals are here 🎯

Building a great agent is one thing. Proving it’s great—reliably, repeatedly, and at scale—is another. That’s where Rovo Agent Evals come in. This release introduces a dedicated evaluation workspace so you can systematically test, measure, and improve quality of your agent.

Screenshot 2026-03-06 at 3.03.22 PM.png

 

What you can do

Rovo Agent Evals provide three complementary ways to validate how your agent behaves:

1) Response Accuracy: test against the answers you expect

When you know what “good” looks like, this reference based judge lets you lock it in.

  • Upload a set of questions and their ideal responses as a CSV.

  • Run your agent against that test set in one go; an LLM compares agent responses to your reference responses (with the input of your instructions).

  • See pass/fail judgments, plus qualitative feedback explaining where responses diverged.

Great for objective, repeatable checks on critical flows like HR policies, IT support FAQs, product knowledge, onboarding, and internal process guidance. Move from “I think it’s working” to “It passes a high percentage of our reference tests.”

2) Resolution Rate: score whether a request was resolved

For many service and Q&A agents, the key question is simple: did the agent actually resolve the user’s request?

  • Upload a CSV of questions only; your agent responds as usual.

  • An LLM scores each interaction as “Resolved” or “Unresolved,” based on how well the answer addresses the question.

  • Ideal when you don’t have curated “golden answers” but need fast resolution-quality signals.

3) Manual Testing: review behavior across lots of scenarios

  • Upload a CSV of questions (no expected answers needed).

  • Run them all at once and skim generated responses in a single view.

  • Use it to sanity-check after instruction changes, explore edge cases, and spot strengths/weaknesses quickly.

Tips for great test sets

  • Mix critical, common, and edge-case prompts so scores reflect real usage.

  • Keep prompts short and unambiguous; put needed context in the prompt.

  • Iterate: fold real-world failures back into your test sets to prevent regressions.

Where to go next

  • Open your agent in Studio and look for the Evaluations tab on a published agent to upload a CSV and kick off a run.

 

Please let us know what you think! Excited to hear your feedback.

10 comments

Rebekka Heilmann (viadee)
Community Champion
March 6, 2026

@Jensen Fleming well done, Team! Expect some feedback in the next few weeks :)

Like • # people like this
Lars Maehlmann
Community Champion
March 6, 2026

HI team, thank you! This is a great feature. Are there any extra costes like credits for using Rovo?

Like • # people like this
Fazila Ashraf
Community Champion
March 6, 2026

Fantastic! Gonna give it a try now

Like • # people like this
Nicoleta Enache
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 6, 2026

This is awesome! Gonna spread the knowledge on this with our teams. 

Like • # people like this
Josh
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Champions.
March 9, 2026

So excited for the use cases this functionality will open up as well as the general quality improvements it will provide to our agents. Thank you, @Jensen Fleming and team!

Like • Jensen Fleming likes this
Bruno Legeard _Lynqa_
Atlassian Partner
March 11, 2026

That's excellent. Thanks, @Jensen Fleming 
In our experience, the combination of “golden set,” LLM as a judge, and expert validation is the best in practice. It's the one we use to benchmark our agents for testing. We'll try it in Rovo.
Is test dataset versioning implemented in this validation feature?

manuel_grande
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 12, 2026

¡Buenos dĂ­as! Que maravilla @Jensen Fleming

¿Sabemos si también se pueden ver las conversaciones que tienen los clientes a través del "portal de clientes" para saber si nuestro agente virtual resuelve las dudas de las personas que hacen de su uso?

¡Muchas gracias!

 

Jensen Fleming
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 12, 2026

@manuel_grande -- Live conversation review is a different stream we are working on. You can follow it's progress here: https://jira.atlassian.com/browse/ROVO-356

 

Jensen Fleming
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 12, 2026

@Bruno Legeard _Lynqa_ -- We don't have dataset versioning in this release. Something we are exploring. 

Like • Bruno Legeard _Lynqa_ likes this
manuel_grande
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 12, 2026

@Jensen Fleming ÂˇEstupendo que ya se estĂ© trabajando en ello!

Muchas gracias.

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events