Much ado about scripting, Linux & Eclipse: card subject to change


HOWTO: partially clone an SVN repo to Git, and work with branches

Skip to the code

I've blogged a few times now about Git (which I pronounce with a hard 'g' a la "get", as it's supposed to be named for Linus Torvalds, a self-described git, but which I've also heard called pronounced with a soft 'g' like "jet"). Either way, I'm finding it way more efficient and less painful than either CVS or SVN combined.

So, to continue this series ([1], [2], [3]), here is how (and why) to pull an SVN repo down as a Git repo, but with the omission of old (irrelevant) revisions and branches.

Using SVN for SVN repos

In days of yore when working with the JBoss Tools and JBoss Developer Studio SVN repos, I would keep a copy of everything in trunk on disk, plus the current active branch (most recent milestone or stable branch maintenance). With all the SVN metadata, this would eat up substantial amounts of disk space but still require network access to pull any old history of files. The two repos were about 2G of space on disk, for each branch. Sure, there's tooling to be able to diff and merge between branches w/o having both branches physically checked out, but nothing beats the ability to place two folders side by side OFFLINE for deep comparisons. So, at times, I would burn as much as 6-8G of disk simply to have a few branches of source for comparison and merging. With my painfullly slow IDE drive, this would grind my machine to a halt, especially when doing any SVN operation or counting files / disk usage.

Using Git for SVN repos naively

Recently, I started using git-svn to pull the whole JBDS repo into a local Git repo, but it was slow to create and still unwieldy. And the JBoss Tools repo was too large to even create as a Git repo - the operation would run out of memory while processing old revisions of code to play forward.

At this point, I was stuck having individual Git repos for each JBoss Tools component (major source folder) in SVN: archives, as, birt, bpel, build, etc. It worked, but replicating it when I needed to create a matching repo-collection for a branch was painful and time-consuming. As well, all the old revision information was eating even more disk than before:

  • jbosstools' trunk as multiple git-svn clones: 6.1G
  • devstudio's trunk as single git-svn clone: 1.3G

So, now, instead of a couple Gb per branch, I was at nearly 4x as much disk usage. But at least I could work offline and not deal w/ network-intense activity just to check history or commit a change. Still, far from ideal.

Cloning SVN with standard layout & partial history

This past week, I discovered two ways to make the git-svn experience at least an order of magnitude better:

  1. Standard layout (-s) - this allows your generated Git repo to contain the usual trunk, branches/* and tags/* layout that's present in the source SVN repo. This is a win because it means your repo will contain the branch information so you can easily switch between branches within the same repo on disk. No more remote network access needed!
  2. Revision filter (-r) - this allows your generated Git repo to start from a known revision number instead of starting at its birth. Now instead of taking hours to generate, you can get a repo in minutes by excluding irrelevant (ancient) revisions.

So, why is this cool? Because now, instead of having 2G of source+metadata to copy when I want to do a local comparison between branches, the size on disk is merely:

  • jbosstools' trunk as single git-svn clone w/ trunk and single branch: 1.3G
  • devstudio's trunk as single git-svn clone w/ trunk and single branch: 0.13G

So, not only is the footprint smaller, but the performance is better and I need never do a full clone (or svn checkout) again - instead, I can just copy the existing Git repo, and rebase it to a different branch. Instead of hours, this operation takes seconds (or minutes) and happens without the need for a network connection.

Okay, enough blather. Show me the code!

Check out the repo, including only the trunk & most recent branch

# Figure out the revision number based on when a branch was created, then 
# from r28571, returns -r28571:HEAD
rev=$(svn log --stop-on-copy \ \
  | egrep "r[0-9]+" | tail -1 | sed -e "s#\(r[0-9]\+\).\+#-\1:HEAD#")

# now, fetch repo starting from the branch's initial commit
git svn clone -s $rev jbosstools_GIT

Now you have a repo which contains trunk & a single branch

git branch -a # list local (Git) and remote (SVN) branches

  * master

Switch to the branch

git checkout -b local/jbosstools-3.2.x jbosstools-3.2.x # connect a new local branch to remote one

  Checking out files: 100% (609/609), done.
  Switched to a new branch 'local/jbosstools-3.2.x'

git svn info # verify now working in branch

  Repository Root:

Switch back to trunk

git checkout -b local/trunk trunk # connect a new local branch to remote trunk

  Switched to a new branch 'local/trunk'

git svn info # verify now working in branch

  Repository Root:

Rewind your changes, pull updates from SVN repo, apply your changes; won't work if you have local uncommitted changes

git svn rebase

Fetch updates from SVN repo (ignoring local changes?)

git svn fetch

Create a new branch (remotely with SVN)

svn copy \ \


Fred said...

Took me more than 20hrs to get the git svn clone complete (and that was after several unsuccessful attempts). Probably because there are much more branches/tags than when you wrote the article.

$ git branch -a
* local/trunk

$ du -hs
2,1G .