How to import source code snapshots to BitBucket repository?

Michał Jazłowiecki March 10, 2013

I have created a Git repository in BitBucket and uploaded my source code there. I use this repository for further development of code.

Now, I have a quite large source code history in form of source code snapshots saved in multiple archives (ca. 1-2 archive per day, 5 days a week, 3 years total). Uncompressed sources from all archives take about 10 GiB of data. How do I import that history to my BitBucket/Git repository?

Source code development was rather linear (only one branch was made last December, it can be left unmaintained) with several milestones.

Feel free to ask further questions.

Kind regards,

1 answer

1 accepted

0 votes
Answer accepted
aMarcus
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 11, 2013

Hello,

This is indeed possible with a Git repository. Git is pretty smart about detecting files that have actually changed vs. files that simply have an updated timestamp. It'll even see file moves in most cases. Knowing that, you can basically follow this pattern:

  • Create a new directory in your path (ie. \Users\repo)
  • cd to \Users\repo and run
git init
  • Create a .gitignore file to be sure from this point you don't include legacy data you shouldn't have in a repo like SQL dumps, large images, compiled binaries, references to other libraries, etc. Documentation for how to structure this file can be found in the online Git book.
  • Unpack the contents of your initial repo archive into this directory
  • Add all the files to Git, so it can track them
git add -A .

  • Commit your changes after you verified them
git commit -m "Added archive {filename} from {date}"
git push origin master
  • Remove ALL of the files in the directory except the .git directory and its decendents (this is the Git filesystem). You'll need to look at some of the options out there on the web for Unix, OSX and Windows for the best way to do this.
  • Take the next archive, unpack it into the directory
  • Repeat the process of add, commit, push, remove, unpack, repeat...

Once you're done. You should have a repo that contains a commit for every archive with a commit message that will help you track where the data at that point in time came from. All the dates of the actual commits will be wrong (obviously). In the future, you can look for older point in time changes by running commands like

git log --grep='2012-02-01 archive'

As with any process like this, make sure you are doing all this from backup copies and don't delete anything until you're sure you've got it all!

Michał Jazłowiecki March 17, 2013

My Windows batch file that worked for me:

@echo off
setlocal enableextensions enabledelayedexpansion
rem Add Git client and GNU sed (http://gnuwin32.sourceforge.net/)
set PATH=C:\Program Files\Git\bin;C:\Program Files\GNUWin32\bin;%PATH%
set tempFile=%temp%\GitUpload.txt
if not exist repo\nul md repo
rem Each source code snapshot is unpacked to separate subfolder of "working" folder; their names contain date and time (24-hour) of the snapshot, eg.:
rem "eWydruki-20120405-1234" folder contains a snapshot made on April 5, 2012 at 12:34
pushd repo
git init
git config --local push.default simple
rem Iterate working folder's subfolders
for /f %%D in ('dir /b /ad /ogne ..\working\*') do (
del /f /q *.* > nul for /d %%E in (*.*) do if not {%%~nxE} == {.git} rd /s /q "%%~E" > nul
xcopy /y /q /e "..\working\%%~D" . > nul rem Parse folder name for date and time set fileDate=%%~nxD set timeStamp=!fileDate:~9,4!/!fileDate:~13,2!/!fileDate:~15,2! !fileDate:~18,2!:!fileDate:~20,2! rem Now !timestamp! contains entire timestamp of source code snapshot rem Parse AssemblyVersionInfo.cs file (it contains version strings) for %%F in (AssemblyInfo.cs AssemblyVersionInfo.cs Core\Properties\AssemblyInfo.cs) do ( if exist "%%~F" ( for /f "tokens=2,3 delims=:[]() " %%i in (%%~F) do ( if {%%i} == {AssemblyVersion} ( rem AssemblyVersion attribute has been located echo %%j > %tempFile%
rem Remove double quotes from around the AssemblyVersion attribute's value
rem RemoveDoubleQuotes.sed file contents: rem s/"//g sed -f ../RemoveDoubleQuotes.sed %tempFile% > %tempFile%~ rem "%tempFile%~" now contains a single line with version string for /f %%k in (%tempFile%~) do set version=%%k ) ) ) ) rem Now !version! contains version string git add -A . git commit -m "Application name, version !version!, snapshot made on !timeStamp!" --author="A. U. Thor <a.u.thor@server.com>" --date="!timeStamp!" ) popd if exist %tempFile% del /f /q %tempFile% > nul if exist %tempFile%~ del /f /q %tempFile%~ > nul endlocal
Michał Jazłowiecki March 17, 2013

Please note that while it might be very tempting to modify the above script to do the following:

  1. Create a local Git repository
  2. Bind the local Git repository to remote one
  3. Pull the remote changes to local Git repository
  4. Import all the source code snapshots to local Git repository (with --date parameter to commit)
  5. Push all contents of local Git repository to remote one

do not do this, as the commits made during step 4 would be ordered after the last commit in remote Git repository (as pulled in step 3). This would certainly mess up the current development version.

At the moment, I have two separate repositories (current development and historical), which I intend to merge (but I still do not know how).

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events