What to do with your Mercurial repos when Bitbucket sunsets support

526 comments

Marek Parfianowicz
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 1, 2020

A cheat sheet how to convert your Mercurial repositories to Git:

 

1) Get the 'fast-export' tool first:

git clone https://github.com/frej/fast-export.git 

cd fast-export

git checkout tags/v180317 # because of the https://github.com/frej/fast-export/issues/132

 

2) Prepare author mappings (this is optional, but it's a good idea to clean it up):

cd my-hg-repo

hg log | grep user: | sort | uniq | sed 's/user: *//' > my-hg-repo-authors.map

Edit the file manually, you must use

"old author" = "new author" 

format (with double quotes).

 

3) Create new Git repository

mkdir my-git-repo 

cd my-git-repo

git init

git config core.ignoreCase false # fast-export complains if this is not set

 

4) Run the 'fast-export' tool

cd my-git-repo

../fast-export/hg-fast-export.sh -r ../my-hg-repo -A ../my-hg-repo-authors.map

 

5) Convert to Git LFS if your repository exceeds 2GB

Download the https://rtyley.github.io/bfg-repo-cleaner/

Prepare raw clone:

git clone --mirror my-git-repo my-git-repo.git

Run the cleaner (define file extensions you want to convert to LFS):

java -jar bfg-1.13.0.jar --convert-to-git-lfs "*.{jar,zip,dll,pdf}" --no-blob-protection my-git-repo.git

 

6) Push Git repository to Bitbucket

Create new Git repository on Bitbucket. Use a code snippet from the web page to add 'origin' and push changes. Example:

git remote add origin git@bitbucket.org:my-organisation/my-git-repo.git 

git push origin --all

git push origin --tags
Like Mike Roberts likes this
Paul Richards April 3, 2020

I've thrown together a few shell scripts that can help archive the metadata from BB Mercurial repositories.  The ones offered so far didn't pick all of the information I wanted, nor did they have the safeguard of allowing someone to examine the operations being performed before actually doing the deed.  These scripts are more in the paradigm of shell "pipe" type operations, so you can pick apart what is going to happen.

These are available at

git clone git@bitbucket.org:Rarified/bbmigrate.git

for anyone interested.

The JSON utility 'jq' is a prerequsite, otherwise a normal Linux/Unix environment should already have all the needed tools.

I'll be working on some scripts to push the archived data back to  BB  in Git form in the future, but since archival is (for me) the time-critical operation I thought these should be made available now.

Like # people like this
Franz Rodenacker April 6, 2020

We created a solution that completely automates the conversion of Bitbucket Mercurial repos to Git repos. This solution will read all Mercurial repos on a given Bitbucket account, convert them to Git and push them to new git repos on Bitbucket. You can use this automation to convert any number of Bitbucket hg repos to git with one click. 

For the actual conversion of each repo, we use the fast-export (https://github.com/frej/fast-export) project. We added some steps to automate the entire process:

  1. Creating a CSV file with a list of your hg repos
  2. Then - one-by-one
    1. Pulling each repo in the list
    2. Converting the repo to git
    3. Creating a new git repo on Bitbucket
    4. Pushing the local git repo to Bitbucket

In order to run the project you need to

  1. Install some dependencies
  2. Configure your Bitbucket account 
  3. Install our automation IDE, called the Linx Designer (https://linx.software)
  4. Download and configure the conversion solution (https://linx.software/docs/samples/bitbucket/gitmigration/gitmigration.lsoz)
  5. And finally, debug the solution to run it

The designer and the solution as well as all dependencies are completely free. A detailed description of the necessary steps can be found here:
https://linx.software/docs/samples/bitbucket/gitmigration/

Like mzeilfelder likes this
Pieter Dumon April 6, 2020

Is there any way we can archive pull requests on mercurial repos (=the comments in them)?

Rarified April 6, 2020

@Pieter Dumon The scripts I mentioned two posts ago will do exactly that.  I’m hoping to come up with a dependable way to apply the archived metadata back to a migrated Git repository in the near future.

Like Pieter Dumon likes this
geotech April 6, 2020

Hi,


Any solutions just to automatically download and backup all my repositories to my pc?

Rarified April 6, 2020

@geotech The scripts I posted (now 4 posts ago) do this.  Just run the first two steps to download all repositories created by a particular user.  If you’re on a Windows PC, you can install “Cygwin” to get a unix environment that will work with the scripts.

Franz Rodenacker April 6, 2020

The Linx solution I posted earlier today can be amended to just clone all your hg repos with one click. All you need to do is delete a few processes and you are good to go. 

From the "MigrateRepo" process, just delete the following sub-processes:

  • Convert_hgToGit
  • CreateGitIgnoreFile
  • CreateGitRemote
  • Cleanup
clach04 April 6, 2020

@geotech https://github.com/clach04/bitbucket_tools does that and only needs python (any version) and hg (any platform).

 

@Rarified what additional meta data do your scripts handle or is the trial-run option the main focus for you?

Paul Richards April 6, 2020

@clach04The saving of the intermediate metadata as well as the separation of analysis and execution was an important consideration of the design.  And the generated commands do not only retrieve the owner's repositories, they clone all of the repositories referred to as the source for pull requests.  Those repositories too will disappear after the purge.  By generating commands to be executed, I can alter those commands if particular repositories need special handling such as authentication, which I can't derive from the BB API.

Here is a schema-like list of items currently archived:

Data obtained from API/2.0/repositories/{workspace}

Retrieved as JSON streams, in addition to top level JSON stream at repositories/{workspace}:

[].{links}.{commits}.{href}: string
[].{links}.{forks}.{href}: string
[].{links}.{hooks}.{href}: string
[].{links}.{pullrequests}.{href}: string (one each for DECLINED, MERGED, OPEN, SUPERSCEDED category)
[].{links}.{watchers}.{href}: string

Retrieved as Mercurial local clones:

[].{links}.{clone}.[].{href}: string
=========

Data obtained from JSON streams at [].{links}.{pullrequests}.{href} above:

Retrieved as JSON streams:

[].{links}.{activity}.{href}: string
[].{links}.{comments}.{href}: string
[].{links}.{commits}.{href}: string
[].{links}.{statuses}.{href}: string

Retrieved as Mercurial local clones:

[].{source}.{repository}.{links}.{html}.{href}: string
=========

Eventually I hope the API will allow the metadata to be applied to new Git migrations of each Repository, with appropriate mapping of commit ids.  The local copies of the source repositories for pull requests I will try to blend into a single private Git repository (with a branch for each source) that can be a shadow to the new Git repository.

I didn't archive any of the non-repository metadata such as users since they won't be purged like the repositories are.

philipstarkey April 6, 2020

@Paul Richards my tool, https://github.com/philipstarkey/bitbucket-hg-exporter which is listed on @clach04 's page also does that. It also creates a rudimentary HTML archive that can be published (for example on GitHub pages).

Nominally used to migrate to GitHub, it can also be used to just archive everything locally.

@Pieter Dumon you might want to check it out. It's been used by quite a few different people now and only minor bugs (now all fixed) were found. 

Like Pieter Dumon likes this
Rarified April 6, 2020

I noticed that @clach04 has added a pointer to my shell scripts in his README.  Thanks!  As he or she points out, I only backed up the subset of metadata that our projects used, which does not cover all the metadata BitBucket stores for a repository (nor wiki content if it is present).  I hope the scripts are simple enough to make adding additional elements easy.   If someone has a complex repository that makes nearly full use of all BitBucket features, I’d be happy to look at how much work it would take to add missing items to back them up.

philipstarkey April 6, 2020

For those previously interested in my export/migration tool, I've now release v0.7.0 on PyPI. This version includes:

  • A bugfix for the case where the BitBucket API silently drops comment data if it contains certain characters (yep...unfortunately nothing you can do about it, but at least you'll have a record that the comment could not be exported)
  • A decrease in export time due to the exclusion of some API data that is mostly useless
  • The ability to use a local hg->git conversion tool which generally works way better than the GitHub source importer (I recommend @cbillington 's hg-export-tool that is based on hg-fast-export and was recently updated to fix a bug with UTF-8 characters and also correctly handles branches with multiple heads).

Many thanks to scpeters and @cbillington for the testing, and bugfixes.

barseghyanartur April 14, 2020

GitHub played it extremely well. With BitBucket sunsetting Mercurial and GitHub allowing unlimited private repositories for teams, there might be soon another thread called "Sunsetting BitBucket". :)

Like # people like this
jayvbestdraft April 16, 2020

no reason to stick with bitbucket anymore - github is also offering more free build minutes. didn't see that coming, did ya, bitbucket? Thanks for all these years of free hosting, atleast you could have survived with mercurial repos, now you are going to lose both sets of users.

Mike Roberts April 17, 2020

Hg -> Git with no hassles:

https://dev.to/zer0pants/hg-to-git-with-no-hassles-37fk

 

tl;dr

1. Import your repo from Bitbucket to Github

2. Import your converted repo from Github to Bitbucket.

 

By the way @Marek Parfianowicz - your guide at the top here was great.
Direct link for those looking for Mareks post )

Like Marek Parfianowicz likes this
crobar April 19, 2020

Hg -> Git with no hassles:

https://dev.to/zer0pants/hg-to-git-with-no-hassles-37fk

 

tl;dr

1. Import your repo from Bitbucket to Github

2. Leave it there and enjoy Github, it now offers free unlimited private repositories with unlimited collaborators

Laurent Doguin April 21, 2020

To those of you that do not want a forced git migration like proposed above, there is a gitlab fork supporting mercurial available on https://about.heptapod.host .

Like # people like this
Jerry Gardner April 22, 2020

I heard on the grapevine that Atlassian plans to extend the deadline from May 31st to June 30th...

Can anyone confirm this? I haven't heard anything official from Atlassian yet.

Tom Kane
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 22, 2020

Hi @Jerry Gardner.

Yes, we recently made the decision to extend the deadline by 30 days (from May 31 to June 30).

A team is working to send out an email notification shortly with more details.

- Tom (Engineering manager on Bitbucket)

Like # people like this
Like clach04 likes this
sacharin April 23, 2020

A couple of hours of googling showed that for my tasks I do not need a separate mercurial hosting. I use my deployer, so I did this https://stackoverflow.com/questions/8411894/set-up-a-mercurial-server-on-ubuntu

CharlieC April 23, 2020

@Jerry Gardnernice to see that Atlassian hasn't completely forgotten about us! Thanks for confirming this.

zao April 24, 2020

@Tom KaneWould it be possible to get a clarification on what will happen with all the extant stable software and libraries whose only hosting is in Bitbucket Mercurial repositories.

Will they be obliterated or are there any plans for preserving what tends to be load-bearing libraries in some scientific fields? It's not uncommon to have a software stack rely on some tool or library finished long ago with no active maintainers.

From the outside it's near impossible to enumerate and back those up in case any scientists need them in the future, and there is often important information contained in issues and commit history.

It'd be nice if you (BB) at least acknowledged that you understood that this is something that the service has been used for, the announcement is very oriented toward teams with CI and active development, leaving great uncertainty about all these legacy codebases.

Like paul_boddie likes this
marmoute April 24, 2020

@zoa , Independently of Atlassian, We (Octobus + Shoftware Heritage) have started a full backup of all public project using MercurialThat backup includes both the code source and the other metadata issues, pull request, wiki, ….  The goal is to keep these archive available to the public after Bitbucket remove them. You can read more about this project here:

* https://octobus.net/blog/2020-04-23-heptapod-and-swh.html
* https://www.softwareheritage.org/2020/04/23/rescuing-250000-endangered-mercurial-repositories/

Like # people like this

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events