Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

How Rovo Dev accelerated my Platform Engineering Role

In the beginning, there was Micros

Atlassian has a strong culture of “You Build It, You Run It”. We have a powerful, internal Platform as a Service (PaaS) called Micros that allows any engineer at Atlassian to go from empty git repository to a “Hello World” service in just a few minutes. This PaaS allows for engineers to iterate quickly without getting lost in the paperwork of running a service.

Micros continues to be a huge part of engineering at Atlassian. As we move into a cloud-first world customers have mandatory requirements for data residency, stronger isolation controls and even multiple cloud providers. We need to expand our PaaS to meet customer needs without invalidating lessons learned or products built on the existing platform.

It takes a village, but supporting that village is tough

When an Atlassian engineer builds on the Micros Platform, they’ll predominately interact with the Micros team. A single team is the point of contact for features, bugs, documentation and support. For our newer PaaS, there are multiple teams involved in building out the platform. My team is responsible for providing the front line support, and some of the tooling. Other parts of the infrastructure and larger pieces of tooling are owned by other teams. And because the platform is rapidly evolving, it’s really hard to keep up with all the changes. We maintain documentation in a dedicated Confluence space, but even then it becomes hard to keep that documentation up to date.

Kicking off with Rovo Dev CLI

We knew that we had a problem, but we didn’t really have a meaningful way of quantifying what the problem was. Our primary point of contact was our Slack Room, where requests would be turned into Jira work items. If we’ve got those work items, then we can analyse them. I asked Rovo Dev CLI to analyse the themes and opportunities from the most recent hundred work items in our Help project. The report was pretty helpful, and highlighted the documentation as a strong area for improvement.

As part of onboarding my team to this platform, we had a series of Loom recordings from the engineers who built it, designed to get us up to speed. Many meetings spread across many hours. I knew there was meaningful data, but I dreaded slogging through hours of meetings, some I had attended. Then I remembered Loom had transcripts for each meeting—I could use Rovo Dev to parse them and improve the documentation.

What could have taken weeks of mind-numbing updates took just one day of processing and tweaking. That saved us huge time. Chatting with a colleague, he was impressed and said it would be awesome if Rovo Dev could help troubleshoot failures directly. “That would be pretty cool - let me look into it” I told him. And so…

Enhancing Rovo Dev with MCP servers

MCP is the Model Context Protocol - it’s a standardised way for AI Agents (like Rovo Dev CLI) to interact with other tools and services, and provide that data in a structured way that the agent can work with. For example, let’s say you’ve got a Kubernetes cluster running in Docker Desktop (or minikube) on your local machine. You could ask Rovo Dev what the status of your application is in your local cluster - and Rovo Dev will do it’s best to craft a series of kubectl commands and try and interpret the data. And that might even work - but there will come a time where the output isn’t quite right, or the command breaks or doesn’t work as expected.

Wouldn’t it be better if we could just get the status directly from Kubernetes?

The cost of building MCP servers is really low

As I mentioned, Atlassian has a strong culture of teams building and running their own tools. For example, teams often provide their own CLI tooling to work with or interact with their services (we even have a unified internal CLI framework with plugins). An MCP can be a simple wrapper around one of those CLIs or a thin wrapper around an existing API or SDK.

In our previous Kubernetes example, several open source MCP servers exist, including the official Kubernetes MCP server. This server uses the official Golang Kubernetes client, the same client as the Kubernetes CLI. This lets an MCP provide structured information to an AI using the same libraries as the CLI - no piping, grepping, or guesswork needed for the Agent to use the data.

MCPs provide tools, resources, and prompts for the AI to interact with. For the purpose of getting things done, we tend to stick to tools - they’re easy enough to write that allow the AI to take actions, such as getting the status of Pods in a cluster, or fetching logs from a service from Splunk.

The real power is in orchestration

When I started looking at the availability of internal MCP servers across at Atlassian, I was startled by just how many there were - many teams had created an MCP server to wrap around a piece of work that their team already owned. In my particular work stream, we use Spinnaker to orchestrate rollouts. We have a Spinnaker MCP available, so Rovo Dev can now query the state of a given application’s deployment. Our Observability team have created their own MCP to efficiently fetch logs for a service from Splunk, or to check out alerts from Jira.

By configuring a few MCPs in Rovo Dev, we now access:

  • The code base, where Rovo Dev operates.

  • Logs generated by that code base - Rovo Dev sees both the code producing logs and fetches them from Splunk.

  • Deployment state - Rovo Dev views the configuration powering the service deployment and its resulting state from Spinnaker.

These facets are crucial for running a service on our platform. Since Rovo Dev knows the codebase, it links the code directly to it's logs and rollout state.

We can wrap entire troubleshooting guides in Markdown and use them as tools to instruct Rovo Dev on which tools to use for investigations. For example:

# Deployment Troubleshooting Guide
This tool provides a systematic methodology for diagnosing deployment issues across multiple systems. It is read-only, idempotent, and template-driven.

Inputs (resolved from resource URI):
- Application: {{.AppName}}
- Namespace: {{.Namespace}}
- Environment: {{.Environment}}
{{- if .SuspectedIssue}}
- Suspected Issue: {{.SuspectedIssue}}
{{- end}}
{{- if .DeploymentPlatform}}
- Platform: {{.DeploymentPlatform}}
{{- end}}
{{- if .FailureStage}}
- Failure Stage: {{.FailureStage}}
{{- end}}

Prerequisites:
- Access to the service repository containing a descriptor.yml file
- Access to Spinnaker pipelines for the service/environment
- Access to Splunk logs for the service (if runtime analysis is required)

Related MCP tools:
- Spinnaker: get_spinnaker_pipelines
- Logging: splunk_search
- Alerts: get_jsm_alert

The tools such as get_spinnaker_pipelines and splunk_search are provided by seperate MCPs that are maintained by other teams.

In our MCP, which is written in Go, we define a simple struct that provides these parameters:

type troubleshootInput struct {
  AppName string `json:"app_name"`
  Namespace string `json:"namespace"`
Environment string `json:"environment"`
SuspectedIssue string `json:"suspected_issue,omitempty"`
DeploymentPlatform string `json:"deployment_platform,omitempty"`
FailureStage string `json:"failure_stage,omitempty"`
}
 

Some parameters, such as AppName would always be required. Others, such as suspected_issue would be up to the agent to populate. We can guide it a bit with some inference as we describe the tool:

"suspected_issue": {
Type: "string",
Description: "Free-form suspected issue (e.g., security_groups, image_pull, rollout)",
Examples: []any{"security_groups", "image_pull", "rollout"},
},

Those parameters are passed to the tool - and more importantly, we can also take conditional actions in the markdown to bring specific tools or information to bear on specific problems.

With the first iterations of our troubleshooting MCP, we were able to take issues from hours between multiple engineers, to just minutes. As our tooling expands and improves, we’ll be able to improve that further, and solve a wider variety of problems in code bases - leaving humans to improve the platform, and solve the really challenging engineering problems.

Orchestrating MCPs - Power and Responsibility for Every Engineer

Bringing multiple MCPs together unlocks tremendous potential for automation, troubleshooting, and operational insight. It’s essential to recognise that with this power comes a shared responsibility to uphold security and compliance standards, especially as AI agents gain broader capabilities.

When integrating MCPs, consider these key practices:

  • Principle of Least Privilege: Always ensure that your MCPs, whether running locally or remotely, operate with the minimum permissions necessary. For local MCPs, use read-only API keys wherever possible, and leverage built-in safeguards like read-only flags to prevent unintended changes.

  • Remote MCPs Require Extra Vigilance: Remote MCPs often need broader access to systems. It’s crucial to implement robust security checks, enforce user permissions, and maintain comprehensive auditing. Ensure your controls are tailored to your application’s business rules, and consistently monitor access patterns for any anomalies.

  • Protect Customer Data: Never allow AI agents or MCPs unrestricted access to production systems, especially those containing sensitive customer information. Always validate that your controls and permissions are in place and effective.

  • Stay Informed and Collaborate: Security is a moving target. Stay up to date with your organisation’s latest security guidance, and don’t hesitate to seek feedback from peers or security experts.

I hope this was helpful, and that these insights empower you to make the most of MCP orchestration, securely and confidently.

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events