The Making of Our First Rovo Agent - and the Results We Didn’t Expect

December 10, 2025

Introduction

My name is Yuri, and I am the co-founder of Release Management , where we build tools for managing complex cross-project releases and supporting advanced Agile and Kanban processes. Naturally, as AI adoption accelerated across the industry, we began exploring how to implement our own AI-powered features. For a long time, we hesitated: integrating Jira with third-party AI providers like Google or OpenAI seemed like the most straightforward path, yet it raised serious concerns around data privacy and security. Since the majority of our customers are enterprise-grade organizations, safeguarding their data is paramount - even if it means delaying innovation or limiting functionality.

Everything changed when Atlassian introduced Rovo, a true game-changer for the entire Atlassian ecosystem.

What Is Rovo?

So, what exactly is Rovo? Is it simply another AI model, like ChatGPT or Google Gemini? Or maybe just a white-labeled wrapper around one of the existing LLMs? Surprisingly, neither of these assumptions is accurate.

Rovo initially emerged as a wrapper around ChatGPT, capable of natively retrieving Jira data while obfuscating sensitive information and guaranteeing that none of that data would be used for model training. Over time, however, Rovo evolved far beyond that initial concept. Although Atlassian hasn’t publicly disclosed every technical detail, the way Rovo responds in real scenarios - and the insights shared by partners and the Atlassian team - strongly suggest that it has become a hybrid AI platform. It appears to combine multiple models from OpenAI, Google, Anthropic, and potentially others, allowing it to leverage the unique strengths of each model depending on the task. Whether it’s content generation, analytical reasoning, or even fully autonomous code creation directly from Jira user stories, Rovo selects the most suitable capabilities to deliver the best results.

Yes, you read that right: the future is already here. You can write a Jira story and ask RovoDev, a context-aware AI agent, can do planning, coding, reviews, and automate repetitive work at scale. We tested it in real production scenarios, and the results were often surprisingly impressive.

If you have more accurate technical insights, feel free to share them in the comments - I’d love to hear from others experimenting in this space.

And of course, we shouldn’t overlook one of Atlassian’s biggest historical advantages: the deep, seamless integration between their products. Rovo inherits that same philosophy, aiming to reduce friction and unify the entire ecosystem.

Atlassian Teamwork Graph in the context of Rovo

One of the biggest challenges in AI adoption is data aggregation and normalization. For Atlassian-native products, this is relatively straightforward. But what about marketplace apps? What about the countless integrations used by customers today?

Atlassian’s success has always been deeply tied to its ecosystem - partners, developers, marketplace vendors - who solve everything from enterprise portfolio management to displaying Bitcoin exchange rates in Jira (yes, that app truly existed - and even had real installations).

So with AI, would this ecosystem advantage disappear?

Fortunately, the answer is no. Atlassian invested heavily in a unified data space, essentially a platform-wide data lake that aggregates information from all products, apps, and integrations. This gives Rovo and other tools controlled access to standardized, cross-system data.

This unlocks massive potential for AI adoption. Each vendor decides what data to expose and how to format it, and once shared, that data becomes part of a powerful synergy - where 1 + 1 = 4. It enables seamless app-to-app interaction, richer automation, and new types of complex cross-product workflows, all without code. This topic alone deserves a separate article - let me know in the comments if you’d like a deep dive.

Why It Matters

All these components together form a trusted, extensible, secure ecosystem - critical for enterprise use. Migrating to Atlassian Forge Platform for Marketplace Apps (also providing Runs on Atlassian capability), evolving in the same direction, Atlassian is clearly building a unified environment where partners and users can leverage data safely and consistently across the entire platform.

The result? A future where AI can access the full breadth of organizational knowledge, eliminate routine work, integrate apps effortlessly, and empower people to focus on high-value, strategic activities.

Now, Back to the App We Built

At this point, you may be wondering: “Why am I still reading a general overview of AI in Atlassian? I came here for hands-on experience with building Rovo agents!”
And you’re absolutely right.

So, let’s get into the real story.

Screenshot 2025-12-10 at 14.12.16.png

From Ideas to Reality: Our First Rovo-Powered Use Case

We had - and still have - a long list of features we want to supercharge with AI. These range from intelligent release portfolio analysis and early detection of problematic releases, to smart approvals and change management, automated dependency identification, and even human-like advanced release notes.

However, we quickly encountered a major limitation: as of mid-2025, marketplace apps cannot call Rovo programmatically without going through the chat interface. In practical terms, this means that our app couldn’t trigger Rovo directly within our Release Board. Instead, customers needed to switch to the Rovo chat and start interacting with it manually - typing questions and receiving answers there.

This constraint was unexpected. And to be fair, looking ahead, we already know that this limitation will be resolved in Rovo soon.

But it didn’t stop us!

Around the same time, we remembered a support ticket from a release manager at one of our customer companies who asked:

“Guys, every day I get tons of questions from internal teams:
When do we release X? When do we release Y? What’s the scope or current status of these releases?
It’s honestly annoying. Can I give them a Release Management agent that would answer all these questions instead of me?”

This was exactly the kind of real-world pain point where Rovo could shine. It sounded like a perfect MVP - a simple, high-value use case that would let us start experimenting with Rovo-driven automation.

And that’s where our implementation journey began.

The Next Challenge: Lack of Persistent Memory

The next issue we faced was Rovo’s limited session memory. Each new conversation started as a clean slate - Rovo completely forgot the context, preferences, or configuration established in previous interactions.

It felt a bit like that old joke:
“A fish is always happy to see you because it only remembers the last 10 seconds.”

For our use case, this was a serious problem. We needed the agent to remember certain settings and not ask the user the same questions every time they opened a new chat. Without persistent context, the experience would be clunky and frustrating.

Fortunately, the solution was straightforward.
We implemented our own storage layer inside the application and designed the prompts so that, during every operation, Rovo would fetch the relevant data from our internal storage and rebuild the context on the fly.

Great - we solved the problem.
Or at least, that’s what we thought.

Because, of course, things were about to get more interesting.

The Hallucination Problem or Invented Releases

The Hallucination Problem: When Rovo Invented Entire Releases

Then we ran into another very common issue with generative AI: hallucinations.

Whenever we asked Rovo for information about releases, sprints, or epics within a certain time period - and our data didn’t contain anything relevant - it would confidently generate imaginary releases. And it did this extremely well.

Early in testing, we didn’t even realize the data was fake because the responses were so detailed and plausible. Full release names, timelines, scopes - it created everything out of thin air with absolute confidence.

So the question became:
How do we stop the AI from confidently inventing things?

Our engineers dove deep into research, experimented with dozens of prompt variants, and even ended up reading long Reddit threads. Eventually, we discovered a couple of techniques that actually worked.

The most surprising one?
Adding “temperature = 0” at the end of the prompt.
This drastically reduced hallucinations.

We also suspect that starting each prompt with a statement like
“Your life depends on the correctness of the provided results.”
helped as well - although we can’t scientifically confirm that part. Still, it certainly didn’t make things worse

Context Limits: Struggles With Too Much Information

Screenshot 2025-12-10 at 14.15.10.png

Another challenge we faced was Rovo’s tendency to lose context when working with a large volume of information. In practice, this showed up in two main ways:

Mistakes or failures when processing large datasets – for example, when we asked Rovo to summarize a release scope in a specific format, but the release contained a huge number of work items, it either produced incomplete results or got confused.
Decreased accuracy with long, single-shot prompts – the longer and more complex the prompt we wrote in a single block, the less accurate and consistent the final output became.

We approached these issues separately.

For the first problem, we switched to a chunking strategy:
we split the information into smaller pieces, asked Rovo to summarize each chunk, and then asked it to create a summary of summaries. This significantly improved reliability and reduced errors when dealing with large release scopes.

For the second problem, we rethought how we wrote prompts. Instead of one long paragraph with multiple requirements, we started structuring them as a step-by-step guide with short, concrete instructions for each step. So instead of a single sentence trying to describe everything, we wrote something more like a 10-step algorithm with a clear sequence of actions.

Surprisingly, this alone boosted the quality and predictability of the results in a very noticeable way.

Even Counting Is Hard

Screenshot 2025-12-10 at 14.15.36.png

You might think that counting the number of work items in a release is a simple, almost trivial task.
Well, not for Rovo at the moment, we did our implementation.

No matter how we phrased the question, Rovo consistently miscounted items. Sometimes it skipped entries, sometimes it double-counted, and sometimes it confidently provided numbers that didn’t exist anywhere in the data. Even with small releases, accuracy was unreliable.

To address this, we moved all calculations to the backend.
Instead of asking Rovo to compute anything, we:

Pre-calculated all numeric values on our side,
Passed those values to Rovo, and
Explicitly instructed it to rely on the provided numbers, not its own interpretation.

This eliminated the counting errors entirely and made the responses both precise and predictable.

“Today” Is Uncertain: Fixing Date Awareness

Screenshot 2025-12-10 at 14.16.08.png

A similar issue appeared when Rovo needed to work with dates.
Tasks like showing releases scheduled for this week, last month, or next week should have been straightforward - but Rovo often struggled with something as basic as knowing the current date.

Depending on the model and context size, it sometimes used outdated internal timestamps or guessed incorrectly, leading to inconsistent or completely wrong results.

To solve this, we took the same approach we used for counting:

We implemented our own backend method that always returns the current date and time.
We passed that value directly to Rovo with every request.
We explicitly instructed Rovo to rely on this provided value instead of using its internal assumptions.

With this workaround, date-based queries finally became stable and predictable, and Rovo stopped living “in its own timeline.”

This issue has been reported and, based on our observations, has already been fixed: https://community.developer.atlassian.com/t/rovo-thinks-we-are-in-october-2023/86954/4

Performance Matters

Another challenge we encountered was performance. Even after solving accuracy and context issues, the overall solution sometimes became too slow. Rovo’s processing time - combined with multiple backend functions constantly pulling fresh data - introduced noticeable delays.

In practice, this meant that users had to wait for every answer.
And while this may be acceptable for occasional deep analysis, it seriously impacted the usability of simple, everyday queries.

In other words, even when everything worked, it didn’t always work fast enough.

This forced us to rethink how often we pulled data, optimize redundant calls, and restructure prompts to reduce unnecessary processing. Performance quickly proved to be just as important as correctness.

Finally: The Testing Nightmare

Last - but definitely not least - came the testing phase.
With AI, ensuring consistent results is incredibly difficult. Outputs can vary depending on countless factors. As a result, predictable, repeatable testing becomes a major challenge.

Automated testing doesn’t make things easier either. Traditional test suites rely on deterministic behavior, but AI - by design - is not deterministic. Creating reliable automated tests for an AI-driven workflow quickly became far more complex than we expected.

And naturally, the moment the agent went live, our support team knew exactly what was coming. Long story short… their vacation plans were quietly reassigned to the backlog.

In the End, It All Pays Off

After reading everything above, you might feel that building something with Rovo is a true challenge - and yes, sometimes it truly is. But that’s only one side of the story.

On the other hand, we’re seeing continuous, meaningful progress as the platform evolves. In our conversations with the Atlassian AI team, it’s clear that they have a strong roadmap and a solid understanding of the challenges partners are facing. Many of the current limitations are already being addressed and should disappear in the near future.

While talking to the ROVO team during Atlassian Team '25 Europe, we realized that Rovo may soon become callable via API (UPDATE: Thank you, @Ulrich Kuhnhardt _IzymesCo_ for providing the link to EAP registration) , which would be a massive leap forward. This change would unlock countless new use cases and greatly enhance the user experience, allowing the heavy, slow operations to run in the background rather than in the chat.

And we shouldn’t underestimate one of the biggest advantages: Rovo is free. For Marketplace partners, that makes it an incredibly attractive companion for building new apps. For end users, it means we’ll see more and more automation powered by Rovo across the entire Atlassian ecosystem. And honestly - that’s amazing.

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

The Making of Our First Rovo Agent - and the Results We Didn’t Expect

Introduction

What Is Rovo?

Atlassian Teamwork Graph in the context of Rovo

Why It Matters

Now, Back to the App We Built

From Ideas to Reality: Our First Rovo-Powered Use Case

The Next Challenge: Lack of Persistent Memory

The Hallucination Problem or Invented Releases

Context Limits: Struggles With Too Much Information

Even Counting Is Hard

“Today” Is Uncertain: Fixing Date Awareness

Performance Matters

Finally: The Testing Nightmare

In the End, It All Pays Off

2 comments

Comment

Was this helpful?

Thanks!

About this author

TAGS

Atlassian Community Events