Writing a hg to subversion gate

Using a decentralized version control system (DVCS) like Mercurial (hg) or Git as a client for Subversion is very common. With the unique features of a DVCS a developer can have both the features of offline development and local branching while still being able to push to a subversion server. This approach is often used in environments in which subversion is the given version control system. While the approach of using this bi-directional push and pull mechanism, provided by git-svn or hgsubversion, works perfectly for one developer, it has limitations working in a team using the usual DVCS push and pull concepts.

The following article will outline the current limitations of bi-directional dvcs to subversion bridges and shows a simple approach to implement a solution for a certain instance of the problem.

The bi-directional bridge
The idea of finding a way to interchange commits back and forth between a DVCS and subversion was born early in the development of tools like Git and Mercurial. The basic idea is easy to grasp. To initialize a repository the DVCS requests all changes between revision 0 and the current HEAD from the subversion server and imports every change as a regular changeset into the local repository of the DVCS. In addition it maintains a mapping between the local, DVCS specific, changeset IDs and the revision numbers in the subversion repository. If a developer wants to push his newly created changesets from it’s local repository back to subversion, the bridge determines the latest pushed changeset and then iterates over the unpushed changes, committing them bit by bit to the subversion server.

Depending on the DVCS you use it will then delete the local changeset and reimport the comitted changesets from the subversion server to ensure that the changeset contains the right date and comitter information as well as the right mapping to the subversion revisionnumber.

To clarify what we just discussed. Assume that we are using git-svn. We are importing a subversion repository with just one commit using the command (I won’t get into detail how to use the command):

$ git svn clone svn://example.com/repo

This command results in a repository which in our example contains the following commit.

commit 7a730e9187becbe1979059cd9752fdea38e3cd9e
Author: david >david @cffdd316-8dd2-4046-8f43-d0df91842a18<
Date: Thu Jul 12 19:39:51 2007 +0000

Crescas in mille millia

Let’s asusme that we create a new commit on top:

commit 129e0e4239ac4d375f2a2132dee042a27f2fd70c
Author: David S. P. >dsp at php.ent<
Date: Fri Jul 13 12:23:42 2007 +0000

First Draft

If we push it using git svn dcommit, it’ll be committed into subversion and
reimported as:

commit 8200f32f61432004b488d063564ac9dae7bf6827
Author: david >david @cffdd316-8dd2-4046-8f43-d0df91842a18<
Date: Fri Jul 13 12:23:42 2007 +0000

First Draft

Tools like git-svn or hgsubversion are working perfectly fine as long as you use them just as a subversion client. There is a serious limitation in what you can push and pull from and to a subversion server. Particularly, problems arise if you are using the usual DVCS push and pull method to exchange changesets. Why so? If you push and pull from other DVCS repositories you might have to create a merge. Modern DVCS like Git, Mercurial or Bazaar represent history as a directed acyclic graph (DAG). Therefore a merge is a commit which has two (or more) parents, which means that a merge is the resulting connection of two parallel strands of history. Now this is a very comfortable and powerful way to describe history in parallel development. Sadly subversion doesn’t handles history and hence merges the same way (at least not priort svn 1.5). As a result, it is not possible to represent the Merge from a DVCS in subversion.

Different bi-directional bridges have different approaches to this problems. Git-svn will commit the Merge but not the commits which are part of the seconds strand of history, while hgsubversion will abort if it has to push a merge. The fact that hgsubversion aborts in case of a merge is our actual problem. We want to use Mercurial and therefore need to find a way to push a merge to subversion.

A usecase
You shouldn’t care much about these limitations. Usually, people are using the bi-directional bridge locally to be able to do offline commits or bisect a bug. In that case, everything will work fine.

But why do I write a complete blog post about a hg to svn bridge if the problem is already solved? The answer is pretty simple.

  1. Imagine you are working in a small team. Everybody in the team knows Mercurial and everybody likes to use it. Moreover you are working offline from time to time, and the members of the team sometimes have to exchange unfinished features with other members. In that case you will probably use Mercurial as your version control system. The problem here is the customer. He dictates the VCS and it has to be Subversion. Now subversion at all is not that bad, but in your particular case, it’s a huge drawback. You have to find a way to mirror your Mercurial repository to the Subversion server.
  2. A second use case is that you are working on an OpenSource project which uses github or bitbucket to host it’s repositories. As you are using the OpenSource framework also at work. You have to use Subversion and you want to use svn:externals to integrate your fancy framework into the existing Subversion repository

As you see, there is a use case for DVSC to Subversion mirror. As no such tool exists at the moment, we’ll try to implement a (frankly, very stupid) mirroring mechanism.

The idea and it’s limitations
We see that we need to mirror a existing Mercurial repository to Subversion.
We also know that we cannot use the existing tools. Our requirements are the following.

  1. Mirror a Mercurial repository into a Subversion repository
  2. Track latest synchronized changeset
  3. Handle merges

Note that we would have been able to do this with git-svn, while it’s not possible to do it with hgsubversion.

A few assumptions about the environment where we want to use our mirroring mechanism helps us to simplify the requirements.

  1. We do not need to preserve the author information
  2. We have a central Mercurial repository called Gate
  3. No commit will be done into the repository other than our mirroring (and believe me, things get ulgy for you if you try to…)
  4. We do not need to preserve all commits

So what is the end result? We just need to find a way to push all changesets and just ignore all merged changes, but commit the merge itself. This should be sufficient.

In our particular environment everyone has his own repository, but one person integrates all changes into one repository called the Gate. Commits reaching the gate are committed to Subversion.

The implementation
To make a long story short (it’s getting late). We are using Mercurials log command and it’s option to obtain a linear history that can be pushed. To get the the history, we use

hg log --follow-first

This will return the history omitting all merged changesets but including the merges itself. We silently drop the changesets that were merged, but retain the result. As we need to get just the SHA-1, we use the –template option to get the node. We then iterate over the history, updating our working copy to each changeset in the history, adding all newly created files and deleting all removed files and then comitting the current state of the working directory. So here is our final script


if test -f "LAST_COMMIT"
	lc=`cat LAST_COMMIT`

for hash in `hg log --follow-first --template "{node}\n" -r $lc:tip`
		if test $cont -eq 1

		echo "update to $hash"
		if ! hg log --template "{desc}\n"  -r $hash > COMMIT_MSG
			echo "Canno get log" >&2
			exit 127;

		hg up -C -r $hash
		for file in `hg log --template "{file_adds}\n" -r $hash`
				echo "add $file"
				svn add --parents $file
		for file in `hg log --template "{file_dels}\n" -r $hash`
				echo "del $file"
				svn rm $file
		svn commit -F COMMIT_MSG
		echo $hash > LAST_COMMIT

It looks scary, and yes it is scary. But for the moment it works. Our simple hg to subversion bridge is finished.

Posted December 1st, 2009 in Open Source, Version Control. Tagged: , , , , , .

Leave a response: