The month of the October has been a rough time for Bitbucket (see http://status.bitbucket.org/history) and is an impediment to our teams. My questions are:What are the causes of these issues?What is Atlassian doing about it?When should we expect to see improvements?Being in IT we all have empathy for what is going on internally but need information to make the right decision if Bitbucket is right for us, or because of our uptime needs, if we need to pursue another option.

Community
Products
Bitbucket
Questions
There have been a number of outages / performance degradations recently for BitBucket. What is the root cause of these and what is Atlassian doing about them?

There have been a number of outages / performance degradations recently for BitBucket. What is the root cause of these and what is Atlassian doing about them?

The month of the October has been a rough time for Bitbucket (see http://status.bitbucket.org/history) and is an impediment to our teams. My questions are:

What are the causes of these issues?
What is Atlassian doing about it?
When should we expect to see improvements?

Being in IT we all have empathy for what is going on internally but need information to make the right decision if Bitbucket is right for us, or because of our uptime needs, if we need to pursue another option.

1 answer

1 accepted

6 votes

Answer accepted

Hi John,

I'm the engineering manager for Bitbucket Cloud and, yes, it's been a rough month as you note. For now the large obstacle is a piece of failing network hardware that has led to the problems experienced this week. Networking being what it is we cannot rule out the possibility that this network has been playing a silent role as either a contributor or the root cause of many of this months problems. Sadly, we can't be 100% certain of its role because we're not getting any error logs from the switch – only sporadic, externally measurable packet loss. We have raised support tickets with the vendor but the larger plan is to efficiently and safely move off that hardware. We're doing this today, but it's a tricky business.

In terms of what we are doing about it you should expect to see a reduction in service interruptions in the next couple of days. Today, we will be making those networking changes. We are also making architectural changes to provide some more resiliency against network fluctuations (possibly at the expense of some performance) in case there is a deeper problem. We will continue to do a full shakedown of the network stack and continue to follow up with the vendor.

In terms of future plans, this recent spate of interruptions has actually interfered somewhat with our larger capacity work being done over the next 6 months: specifically, we are in the midst of expanding our storage, adding to our Internet uplinks, and scaling up hardware (quadrupling compute to provide headroom for some future plans).

In addition to the long term infrastructure work we will continue to work on a more short term basis on smaller performance wins and user experience improvements from the code base itself. As a matter of fact, we just finished what we call "performance week" – a week dedicated to finding inefficiencies in the code and improving user experience. Unfortunately, we haven't been able to deploy those improvements because of these incidents.

Does that adequately answer your question?

Thanks,

Dan Bennett

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Dan- Thank you for the detailed and thoughtful response. From our point of view, the biggest obstacle has been the uptime of interactions that need to happen synchronously (i.e. pushing of code) or merging a pull request. If there are "delays" in webhooks, etc. it is less of an issue for us. The biggest emotional frustration for a developer is to have a feature coded and done and not be able to push it to the repo. We look forward to improvement in the uptime of the platform and offer any assistance we can provide. /jrt

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

There have been a number of outages / performance degradations recently for BitBucket. What is the root cause of these and what is Atlassian doing about them?

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events

Ask a question

Start a discussion

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

There have been a number of outages / performance degradations recently for BitBucket. What is the root cause of these and what is Atlassian doing about them?

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events