I've thrown together a few shell scripts that can help archive the metadata from BB Mercurial repositories. The ones offered so far didn't pick all of the information I wanted, nor did they have the safeguard of allowing someone to examine the operations being performed before actually doing the deed. These scripts are more in the paradigm of shell "pipe" type operations, so you can pick apart what is going to happen.
The JSON utility 'jq' is a prerequsite, otherwise a normal Linux/Unix environment should already have all the needed tools.
I'll be working on some scripts to push the archived data back to BB in Git form in the future, but since archival is (for me) the time-critical operation I thought these should be made available now.
We created a solution that completely automates the conversion of Bitbucket Mercurial repos to Git repos. This solution will read all Mercurial repos on a given Bitbucket account, convert them to Git and push them to new git repos on Bitbucket. You can use this automation to convert any number of Bitbucket hg repos to git with one click.
For the actual conversion of each repo, we use the fast-export (https://github.com/frej/fast-export) project. We added some steps to automate the entire process:
@Pieter Dumon The scripts I mentioned two posts ago will do exactly that. I’m hoping to come up with a dependable way to apply the archived metadata back to a migrated Git repository in the near future.
@geotech The scripts I posted (now 4 posts ago) do this. Just run the first two steps to download all repositories created by a particular user. If you’re on a Windows PC, you can install “Cygwin” to get a unix environment that will work with the scripts.
The Linx solution I posted earlier today can be amended to just clone all your hg repos with one click. All you need to do is delete a few processes and you are good to go.
From the "MigrateRepo" process, just delete the following sub-processes:
@clach04The saving of the intermediate metadata as well as the separation of analysis and execution was an important consideration of the design. And the generated commands do not only retrieve the owner's repositories, they clone all of the repositories referred to as the source for pull requests. Those repositories too will disappear after the purge. By generating commands to be executed, I can alter those commands if particular repositories need special handling such as authentication, which I can't derive from the BB API.
Here is a schema-like list of items currently archived:
Data obtained from API/2.0/repositories/{workspace}
Retrieved as JSON streams, in addition to top level JSON stream at repositories/{workspace}:
[].{links}.{commits}.{href}: string [].{links}.{forks}.{href}: string [].{links}.{hooks}.{href}: string [].{links}.{pullrequests}.{href}: string (one each for DECLINED, MERGED, OPEN, SUPERSCEDED category) [].{links}.{watchers}.{href}: string
Retrieved as Mercurial local clones:
[].{links}.{clone}.[].{href}: string =========
Data obtained from JSON streams at [].{links}.{pullrequests}.{href} above:
Eventually I hope the API will allow the metadata to be applied to new Git migrations of each Repository, with appropriate mapping of commit ids. The local copies of the source repositories for pull requests I will try to blend into a single private Git repository (with a branch for each source) that can be a shadow to the new Git repository.
I didn't archive any of the non-repository metadata such as users since they won't be purged like the repositories are.
I noticed that @clach04 has added a pointer to my shell scripts in his README. Thanks! As he or she points out, I only backed up the subset of metadata that our projects used, which does not cover all the metadata BitBucket stores for a repository (nor wiki content if it is present). I hope the scripts are simple enough to make adding additional elements easy. If someone has a complex repository that makes nearly full use of all BitBucket features, I’d be happy to look at how much work it would take to add missing items to back them up.
For those previously interested in my export/migration tool, I've now release v0.7.0 on PyPI. This version includes:
A bugfix for the case where the BitBucket API silently drops comment data if it contains certain characters (yep...unfortunately nothing you can do about it, but at least you'll have a record that the comment could not be exported)
A decrease in export time due to the exclusion of some API data that is mostly useless
The ability to use a local hg->git conversion tool which generally works way better than the GitHub source importer (I recommend @cbillington 's hg-export-tool that is based on hg-fast-export and was recently updated to fix a bug with UTF-8 characters and also correctly handles branches with multiple heads).
Many thanks to scpeters and @cbillington for the testing, and bugfixes.
GitHub played it extremely well. With BitBucket sunsetting Mercurial and GitHub allowing unlimited private repositories for teams, there might be soon another thread called "Sunsetting BitBucket". :)
no reason to stick with bitbucket anymore - github is also offering more free build minutes. didn't see that coming, did ya, bitbucket? Thanks for all these years of free hosting, atleast you could have survived with mercurial repos, now you are going to lose both sets of users.
To those of you that do not want a forced git migration like proposed above, there is a gitlab fork supporting mercurial available on https://about.heptapod.host .
@Tom KaneWould it be possible to get a clarification on what will happen with all the extant stable software and libraries whose only hosting is in Bitbucket Mercurial repositories.
Will they be obliterated or are there any plans for preserving what tends to be load-bearing libraries in some scientific fields? It's not uncommon to have a software stack rely on some tool or library finished long ago with no active maintainers.
From the outside it's near impossible to enumerate and back those up in case any scientists need them in the future, and there is often important information contained in issues and commit history.
It'd be nice if you (BB) at least acknowledged that you understood that this is something that the service has been used for, the announcement is very oriented toward teams with CI and active development, leaving great uncertainty about all these legacy codebases.
@zoa , Independently of Atlassian, We (Octobus + Shoftware Heritage) have started a full backup of all public project using MercurialThat backup includes both the code source and the other metadata issues, pull request, wiki, …. The goal is to keep these archive available to the public after Bitbucket remove them. You can read more about this project here:
531 comments