Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Introducing Evaluations for Rovo Agents

Ship better agents with confidence: Rovo Agent Evals are here 🎯

Building a great agent is one thing. Proving it’s great—reliably, repeatedly, and at scale—is another. That’s where Rovo Agent Evals come in. This release introduces a dedicated evaluation workspace so you can systematically test, measure, and improve quality of your agent.

Screenshot 2026-03-06 at 3.03.22 PM.png

 

What you can do

Rovo Agent Evals provide three complementary ways to validate how your agent behaves:

1) Response Accuracy: test against the answers you expect

When you know what “good” looks like, this reference based judge lets you lock it in.

  • Upload a set of questions and their ideal responses as a CSV.

  • Run your agent against that test set in one go; an LLM compares agent responses to your reference responses (with the input of your instructions).

  • See pass/fail judgments, plus qualitative feedback explaining where responses diverged.

Great for objective, repeatable checks on critical flows like HR policies, IT support FAQs, product knowledge, onboarding, and internal process guidance. Move from “I think it’s working” to “It passes a high percentage of our reference tests.”

2) Resolution Rate: score whether a request was resolved

For many service and Q&A agents, the key question is simple: did the agent actually resolve the user’s request?

  • Upload a CSV of questions only; your agent responds as usual.

  • An LLM scores each interaction as “Resolved” or “Unresolved,” based on how well the answer addresses the question.

  • Ideal when you don’t have curated “golden answers” but need fast resolution-quality signals.

3) Manual Testing: review behavior across lots of scenarios

  • Upload a CSV of questions (no expected answers needed).

  • Run them all at once and skim generated responses in a single view.

  • Use it to sanity-check after instruction changes, explore edge cases, and spot strengths/weaknesses quickly.

Tips for great test sets

  • Mix critical, common, and edge-case prompts so scores reflect real usage.

  • Keep prompts short and unambiguous; put needed context in the prompt.

  • Iterate: fold real-world failures back into your test sets to prevent regressions.

Where to go next

  • Open your agent in Studio and look for the Evaluations tab on a published agent to upload a CSV and kick off a run.

 

Please let us know what you think! Excited to hear your feedback.

4 comments

Rebekka Heilmann _viadee_
Community Champion
March 6, 2026

@Jensen Fleming well done, Team! Expect some feedback in the next few weeks :)

Like • Tomislav Tobijas likes this
Lars Maehlmann
Community Champion
March 6, 2026

HI team, thank you! This is a great feature. Are there any extra costes like credits for using Rovo?

Like • # people like this
Fazila Ashraf
Community Champion
March 6, 2026

Fantastic! Gonna give it a try now

Like • Tomislav Tobijas likes this
Nicoleta Enache
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 6, 2026

This is awesome! Gonna spread the knowledge on this with our teams. 

Like • Tomislav Tobijas likes this

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events