Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

ROVO agent hallucination

Ramathirtha Randhi
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
January 21, 2026

Developed a Rovo Agent that finds similar resolved tickets in history using semantic similarity for a TODO ticket.

It worked and suggested similar tickets for the first couple of days. After that, it was completely off (suggestions are wrong), and it works sometimes and is off again.

Is there a way to correct this behaviour and gives accurate answers all the time ?

And is this behavior because of model choice, so that some models are giving it right and others are off? 

2 answers

1 accepted

2 votes
Answer accepted
Kris Klima _K15t_
Community Champion
January 21, 2026

Hi @Ramathirtha Randhi and welcome to the Community.

Here's another Community debate on the subject of Rovo hallucinations

https://community.atlassian.com/forums/Rovo-questions/Rovo-OKR-Generator-Hallucinations/qaq-p/2917086

On a more general note, LLM prediction models do 'hallucinate' (get things wrong), unfortunatelly.

Dr Valeri Colon _Connect Centric_
Community Champion
February 10, 2026

Hi @Ramathirtha Randhi welcome to the community. Rovo’s semantic similarity isn’t deterministic, so suggestions can vary between runs. It doesn’t continuously “learn” from your Jira history, and it may rely on partial indexing or fall back to general reasoning when data isn’t available. You can improve accuracy by narrowing the agent’s scope, giving strict filtering rules, and including fields like summary, description, and resolution. Model choice affects tone, not the retrieval quality.

Ramathirtha Randhi
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
February 10, 2026

@Dr Valeri Colon _Connect Centric_ I work on building agents extensively and this is my observation: The reason I think it's based on model selection is:

Embedding quality - Different models produce different vector embeddings, which directly affects which tickets rank as "similar" during retrieval. Claude vs GPT-4 embeddings will rank differently.

Hallucination consistency - Weaker models are more prone to suggesting tickets that don't actually match or don't exist. Some models are fundamentally more reliable at this task.

Semantic reasoning accuracy - Better models can reason more precisely about what actually makes two tickets similar (vs just lexical overlap). This affects quality of the ranking logic.

Stability between runs - Some models have better temperature consistency and produce more reproducible results across identical inputs.

That said, I agree with you that scope + filtering rules probably solve the problem. But I'd test 2-3 models side-by-side with identical prompts to test our theory. I think its not possible to test that way in ROVO now.

0 votes
Rebekka Heilmann _viadee_
Community Champion
February 3, 2026

Hi @Ramathirtha Randhi - welcome to the Community,

i am a bit late to the party.

Have you tried out giving the Agent examples? Give it links to the things it should find and links to the things it shouldn't find for a given example.

Rovo isn't really "learning" yet and changes in the data set (in this case, Jira work items) will affect results. You'll never get accurate results "all the time" as Rovo uses GenAI and that will always be random to some extent. Although the skills itself (Read jira work, update jira etc.) are deterministic, the generated inputs coming from the Agent are not.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events