Diagnosing and Troubleshooting TCP Resets from Bitbucket Cloud due to Anycast Shift

Introduction

To improve performance, Bitbucket.org’s network 104.192.141.0/24 is anycast globally from 100+ 'edge locations' leveraging an AWS feature known as Global Accelerator - which we launched in October 2020 and announced on our blog.

Equal Cost Multi Path (ECMP) with Anycast

Sometimes your network, or your Internet Service Provider’s network may have multiple, equal-cost paths towards Bitbucket to chose from: a concept known in networking as Equal Cost Multi Path (ECMP).

Almost always, a network device such as a router will forward all the packets within a single TCP Session across the same path and therefore to the same Bitbucket edge location. This is achieved via an algorithm known as 'Hash-based Load Balancing' (RFC2992) because the choice of path is consistently made based on a hash-function of the source and destination IP and ports - which are consistent during a TCP session.

Anycast Shift and TCP Resets

A problem emerges if any part of your network or your Internet Service Provider’s network is equal-cost multi-pathing (ECMP) individual packets within a TCP session, rather than consistently multi-pathing entire TCP sessions via a hashing function.

In this scenario, different Bitbucket edge locations can inadvertently receive different parts of an existing TCP session. Because these edge locations will not have any knowledge of the TCP session, they will return a TCP Reset - which is an expected behavior and consistent with the appropriate IETF RFCs.

Not all TCP Resets are due to Anycast Shift

Anycast shift is an exceedingly rare problem. We have only seen it in 3 support cases among our many users.

The vast majority of issues involving TCP Resets are due to misconfigured Access Control Lists and Firewalls, which might be intentionally or inadvertantly configured to reject traffic to Bitbucket's IP addresses. We urge you to ensure HTTP(S) and Git/SSH ports are allowed to the IP addresses we publish here.

The second most common cause is TCP session failure due to Path MTU Discovery messaging being blocked by Firewalls and Routers, where a network tunnel of some form (e.g. a VPN) artificially reduces the Maximum Transmission Unit (MTU) of the TCP Session below the internet standard 1500 bytes without signalling the same to one of the parties. Cisco Systems has published a good article on this here.

Example Errors

Acknowledging that Anycast Shift is rare, some errors we have observed from Git and HTTP clients due to receiving a TCP Reset mid-operation are shown below as a guide.

These errors are most likely to be caused by other networking issues. They are not definitive signs that anycast shift is occurring.

OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to bitbucket.org:443
git fetch returned status code 128:
stderr: fatal: unable to access 'https://bitbucket.org/': Network file descriptor is not connected
git fetch returned status code 128:
stdout:
stderr: fatal: unable to access 'https://bitbucket.org/': TCP connection reset by peer
git@bitbucket.org' 'git-upload-pack
ssh_exchange_identification: read: Connection reset by peer
fatal: Could not read from remote repository.
curl -v https://bitbucket.org
About to connect() to bitbucket.org port 443 (#0)
Trying 104.192.141.1...
Connected to bitbucket.org (104.192.141.1) port 443 (#0)
Initializing NSS with certpath: sql:/etc/pki/nssdb
CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
NSS error -5961 (PR_CONNECT_RESET_ERROR)
TCP connection reset by peer
Closing connection 0
curl: (35) TCP connection reset by peer
ssh git@bitbucket.org > /dev/null
packet_write_wait: Connection to 104.192.141.1 port 22: Broken pipe

Resolving TCP Resets due to Anycast Shift

We advise you or your networking team (if you have one), and potentially also by working with your Internet Service Provider, to first identify any router which has more than one equal-cost path to Bitbucket.org's network 104.192.141.0/24. Linux and Unix tools like traceroute or mtr can be useful for identifying the device.

Once identified, a packet capture might be performed to discover if the device is indeed splitting packets within individual flows improperly. You, your networking team or your service provider, might then review product documentation or work with the device vendor to resolve the issue.

While this level of troubleshooting of customer and provider networks is beyond the scope of our support team, we are actively gathering more data on this issue, and would appreciate any insight you can share on the exact make, model and configuration of network device responsible, as this may help other customers in the future.

Please reach out to us via Bitbucket Cloud Support.

Temporary Workaround

You can directly or via automation systems (like Docker, Salt, Ansible, etc) modify your operating system's hosts file to direct traffic to Bitbucket’s legacy IP addresses - which are not anycast to the internet:

18.205.93.2
18.205.93.0
18.205.93.1

This can be achieved by adding an entry to /etc/hosts on Linux and Unix based systems (including MacOS) like so:

sudo vi /etc/hosts
# Add a line like so:
18.205.93.1 bitbucket.org
# :wq to exit vi
# or use any text editor of your choice

The final /etc/hosts will look like to this:

$ cat /etc/hosts
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1	localhost
255.255.255.255	broadcasthost
::1             localhost 

18.205.93.1 bitbucket.org

Atlassian has no plans to retire these IP addresses, but please be aware that ultimately the public DNS answer for Bitbucket.org is the authoritative IP address which clients should be using. Prolonged usage of a hosts override risks becoming out of date with the state of Bitbucket’s infrastructure and will not be as performant as our anycast edge.

2 comments

Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
February 1, 2021

Thank you for technical articles 

Wilmer Caiza May 19, 2021

Thanks you.

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events