Much ado about scripting, Linux & Eclipse: card subject to change

2011-01-29

Simplifying The p2 Process, Part 1: p2 Composite Repos

With the release of JBoss Tools 3.2 and JBoss Developer Studio 4.0 just around the corner, you may be thinking to yourself, "Self, how many update sites and SDK zips and runtimes will I need to download THIS time?"

Or maybe you're thinking, "Self, why is this so damn complicated?"

Well, folks, we heard your kvetching and we did something about it.

Composite Repos

While this is not a new concept to many, we embraced the composite update site this past year and it's made life a lot easier for iterative, agile development cycles. Last year, JBoss Tools 3.1 was built as a single Hudson job, with a second one for JBoss Developer Studio. This meant that any change in any of the components would cause a build to be launched, and 4-6hrs later, we'd have fresh bits. Yeah, far from ideal.

This year, we split up the monolith (and added a few new components!) so that now we have 34 update sites to compose into a single one against which builds can then be built. This composite update site looks like this:

compositeArtifacts.xml

<?xml version='1.0' encoding='UTF-8'?>
<?compositeArtifactRepository version='1.0.0'?>
<repository name='JBoss Tools Staging Repository' 
  type='org.eclipse.equinox.internal.p2.artifact.repository.CompositeArtifactRepository' 
  version='1.0.0'>
<properties size='2'>
<property name='p2.compressed' value='true'/>
<!-- get new time w/ `date +%s000` -->
<property name='p2.timestamp' value='1294205433000'/>
</properties>
<children size='34'>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-3.2_trunk.component--archives/all/repo/'/>
...
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-3.2_trunk.component--ws/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-pi4soa-3.1_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-teiid-designer-7.1_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-drools-5.2_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-savara-1.1_trunk/tools/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/xulrunner-1.9.1.2/all/repo/'/>
</children>
</repository>

compositeContent.xml

<?xml version='1.0' encoding='UTF-8'?>
<?compositeMetadataRepository version='1.0.0'?>
<repository name='JBoss Tools Staging Repository' 
  type='org.eclipse.equinox.internal.p2.metadata.repository.CompositeMetadataRepository' 
  version='1.0.0'>
<properties size='2'>
<property name='p2.compressed' value='true'/>
<!-- get new time w/ `date +%s000` -->
<property name='p2.timestamp' value='1294205433000'/>
</properties>
<children size='34'>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-3.2_trunk.component--archives/all/repo/'/>
...
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-3.2_trunk.component--ws/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-pi4soa-3.1_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-teiid-designer-7.1_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-drools-5.2_trunk/all/repo/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/jbosstools-savara-1.1_trunk/tools/'/>
<child location='http://download.jboss.org/jbosstools/builds/staging/xulrunner-1.9.1.2/all/repo/'/>
</children>
</repository>

So, now that JBoss Tools is built in 34 pieces, the bits that haven't changed aren't rebuilt over and over and builds are faster. If that sounds insanely obvious to you, well, we used to have a lot of inter-component cyclic dependencies. We eliminated those early in the development cycle for JBoss Tools 3.2, and have been able to build smarter and faster ever since.

Added benefits to this composite site are:

  • Newly built and published bits are instantly available from the composite site - sure, the same was true under last year's PDE "uberbuild" regime, but that's because everything was built fresh every time, which was slow and near-impossible to get people to run at home.

  • Developers can use this site to install latest updates to components they're interested in testing - again, this was true before; but now using the same site and searching for updates, developers and beta testers can get incremental updates to the components that have actually changed, rather than having to pull down 160M every day to get a few K of changes.

  • Tycho can be pointed at this site (see below) in order to resolve binary p2 dependencies, so building a component deep in the dependency chain can be done w/o having to first build its upstream dependencies - this wasn't a concern before because everything was built from source every time, so by definition everything was already on disk. But now, if a developer only cares about a single component, like ModeShape or GWT, they need only have that source (and some bootstrapping code) on disk. Smaller, faster, more agile. And way more likely to be built locally before checking in code than before, making the painful "who broke what and when?" process much less painful. Fewer moving pieces and local dev builds at home mean - in theory - fewer incomplete or breaking commits.

When we first moved to Tycho, we needed to build a series of components locally in order to just get to a deep component. For example, the Struts component needs VPE, which needs JST and XulRunner. JST also needs the Common component, which in turn needs the Tests component.

So, to build Struts locally, 5 other components would have to be built locally first. This worked, but was still a fairly large barrier to entry for most developers (much less contributors!)

But with this new composite site, building Struts can be done without this lengthy bootstrapping; instead we just point Tycho at this composite site, and it pulls down the 5 upstream components' jars from this p2 repo - because the upstream deps are already built in Hudson.

Here's what we added to our parent pom.xml to have the builds find the binaries:

<repository>
        <id>jbosstools-nightly-staging-composite-trunk</id>
        <url>http://path.to.the.site/staging/_composite_/trunk/ </url>
        <layout>p2</layout>
        <snapshots>
                <enabled>true</enabled>
        </snapshots>
        <releases>
                <enabled>true</enabled>
        </releases>
</repository>

So, using this composite update site, we can use Maven 3 with Tycho 0.10 to generate a single update site (staged here, then ultimately published here).


In part 2, I'll look at why we switched from using a collection of SDKs (Eclipse, EMF, DTP, GEF, M2E, RSE, TPTP, UMl2, WTP, XSD and more) against which to build - using the now-deprecated brute-force "just unzip into eclipse root folder or dropins" approach - to using a single target platform update site. SPOILER ALERT: Easier to update and maintain.

In part 3, I'll look back at the success we've had using associate sites instead of asking people to manually add 3rd party URLs when installing JBoss Tools. SPOILER ALERT: one URL is easier for people to use than 6.

In part 4, I'll talk a little about how to prevent your product build from getting updates from unofficial sources, and preload your product with the official sites from which to get updates. Because it's important to balance ease of use with prevention of unsupported features. SPOILER ALERT: may contain p2.inf instructions.

By the way, JBoss Tools 3.2.0.CR1 and JBoss Developer Studio 4.0.0.CR1 are available. Get 'em while they're hot (and sourceforge is not).

2011-01-27

HOWTO: partially clone an SVN repo to Git, and work with branches

Skip to the code

I've blogged a few times now about Git (which I pronounce with a hard 'g' a la "get", as it's supposed to be named for Linus Torvalds, a self-described git, but which I've also heard called pronounced with a soft 'g' like "jet"). Either way, I'm finding it way more efficient and less painful than either CVS or SVN combined.

So, to continue this series ([1], [2], [3]), here is how (and why) to pull an SVN repo down as a Git repo, but with the omission of old (irrelevant) revisions and branches.

Using SVN for SVN repos

In days of yore when working with the JBoss Tools and JBoss Developer Studio SVN repos, I would keep a copy of everything in trunk on disk, plus the current active branch (most recent milestone or stable branch maintenance). With all the SVN metadata, this would eat up substantial amounts of disk space but still require network access to pull any old history of files. The two repos were about 2G of space on disk, for each branch. Sure, there's tooling to be able to diff and merge between branches w/o having both branches physically checked out, but nothing beats the ability to place two folders side by side OFFLINE for deep comparisons. So, at times, I would burn as much as 6-8G of disk simply to have a few branches of source for comparison and merging. With my painfullly slow IDE drive, this would grind my machine to a halt, especially when doing any SVN operation or counting files / disk usage.

Using Git for SVN repos naively

Recently, I started using git-svn to pull the whole JBDS repo into a local Git repo, but it was slow to create and still unwieldy. And the JBoss Tools repo was too large to even create as a Git repo - the operation would run out of memory while processing old revisions of code to play forward.

At this point, I was stuck having individual Git repos for each JBoss Tools component (major source folder) in SVN: archives, as, birt, bpel, build, etc. It worked, but replicating it when I needed to create a matching repo-collection for a branch was painful and time-consuming. As well, all the old revision information was eating even more disk than before:

  • jbosstools' trunk as multiple git-svn clones: 6.1G
  • devstudio's trunk as single git-svn clone: 1.3G

So, now, instead of a couple Gb per branch, I was at nearly 4x as much disk usage. But at least I could work offline and not deal w/ network-intense activity just to check history or commit a change. Still, far from ideal.

Cloning SVN with standard layout & partial history

This past week, I discovered two ways to make the git-svn experience at least an order of magnitude better:

  1. Standard layout (-s) - this allows your generated Git repo to contain the usual trunk, branches/* and tags/* layout that's present in the source SVN repo. This is a win because it means your repo will contain the branch information so you can easily switch between branches within the same repo on disk. No more remote network access needed!
  2. Revision filter (-r) - this allows your generated Git repo to start from a known revision number instead of starting at its birth. Now instead of taking hours to generate, you can get a repo in minutes by excluding irrelevant (ancient) revisions.

So, why is this cool? Because now, instead of having 2G of source+metadata to copy when I want to do a local comparison between branches, the size on disk is merely:

  • jbosstools' trunk as single git-svn clone w/ trunk and single branch: 1.3G
  • devstudio's trunk as single git-svn clone w/ trunk and single branch: 0.13G

So, not only is the footprint smaller, but the performance is better and I need never do a full clone (or svn checkout) again - instead, I can just copy the existing Git repo, and rebase it to a different branch. Instead of hours, this operation takes seconds (or minutes) and happens without the need for a network connection.


Okay, enough blather. Show me the code!

Check out the repo, including only the trunk & most recent branch

# Figure out the revision number based on when a branch was created, then 
# from r28571, returns -r28571:HEAD
rev=$(svn log --stop-on-copy \
  http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
  | egrep "r[0-9]+" | tail -1 | sed -e "s#\(r[0-9]\+\).\+#-\1:HEAD#")

# now, fetch repo starting from the branch's initial commit
git svn clone -s $rev http://svn.jboss.org/repos/jbosstools jbosstools_GIT

Now you have a repo which contains trunk & a single branch

git branch -a # list local (Git) and remote (SVN) branches

  * master
    remotes/jbosstools-3.2.x
    remotes/trunk

Switch to the branch

git checkout -b local/jbosstools-3.2.x jbosstools-3.2.x # connect a new local branch to remote one

  Checking out files: 100% (609/609), done.
  Switched to a new branch 'local/jbosstools-3.2.x'

git svn info # verify now working in branch

  URL: http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x
  Repository Root: http://svn.jboss.org/repos/jbosstools

Switch back to trunk

git checkout -b local/trunk trunk # connect a new local branch to remote trunk

  Switched to a new branch 'local/trunk'

git svn info # verify now working in branch

  URL: http://svn.jboss.org/repos/jbosstools/trunk
  Repository Root: http://svn.jboss.org/repos/jbosstools

Rewind your changes, pull updates from SVN repo, apply your changes; won't work if you have local uncommitted changes

git svn rebase

Fetch updates from SVN repo (ignoring local changes?)

git svn fetch

Create a new branch (remotely with SVN)

svn copy \
  http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
  http://svn.jboss.org/repos/jbosstools/branches/some-new-branch