ROVO agent hallucination

Developed a Rovo Agent that finds similar resolved tickets in history using semantic similarity for a TODO ticket.

It worked and suggested similar tickets for the first couple of days. After that, it was completely off (suggestions are wrong), and it works sometimes and is off again.

Is there a way to correct this behaviour and gives accurate answers all the time ?

And is this behavior because of model choice, so that some models are giving it right and others are off?

2 answers

1 accepted

2 votes

Answer accepted

Hi @Ramathirtha Randhi and welcome to the Community.

Here's another Community debate on the subject of Rovo hallucinations

https://community.atlassian.com/forums/Rovo-questions/Rovo-OKR-Generator-Hallucinations/qaq-p/2917086

On a more general note, LLM prediction models do 'hallucinate' (get things wrong), unfortunatelly.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi @Ramathirtha Randhi welcome to the community. Rovo’s semantic similarity isn’t deterministic, so suggestions can vary between runs. It doesn’t continuously “learn” from your Jira history, and it may rely on partial indexing or fall back to general reasoning when data isn’t available. You can improve accuracy by narrowing the agent’s scope, giving strict filtering rules, and including fields like summary, description, and resolution. Model choice affects tone, not the retrieval quality.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@Dr Valeri Colon _Connect Centric_ I work on building agents extensively and this is my observation: The reason I think it's based on model selection is:

Embedding quality - Different models produce different vector embeddings, which directly affects which tickets rank as "similar" during retrieval. Claude vs GPT-4 embeddings will rank differently.

Hallucination consistency - Weaker models are more prone to suggesting tickets that don't actually match or don't exist. Some models are fundamentally more reliable at this task.

Semantic reasoning accuracy - Better models can reason more precisely about what actually makes two tickets similar (vs just lexical overlap). This affects quality of the ranking logic.

Stability between runs - Some models have better temperature consistency and produce more reproducible results across identical inputs.

That said, I agree with you that scope + filtering rules probably solve the problem. But I'd test 2-3 models side-by-side with identical prompts to test our theory. I think its not possible to test that way in ROVO now.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi @Ramathirtha Randhi - welcome to the Community,

i am a bit late to the party.

Have you tried out giving the Agent examples? Give it links to the things it should find and links to the things it shouldn't find for a given example.

Rovo isn't really "learning" yet and changes in the data set (in this case, Jira work items) will affect results. You'll never get accurate results "all the time" as Rovo uses GenAI and that will always be random to some extent. Although the skills itself (Read jira work, update jira etc.) are deterministic, the generated inputs coming from the Agent are not.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

ROVO agent hallucination

2 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events