20181217

Splitting up a git repo -- Single directory

Splitting up a git repo -- Single directory

FreeBSD has a large, sprawling svn repo that was once a CVS repo. There are times that things in that repo have outlived their usefulness. Sometimes those items are best moved to a FreeBSD port. One easy way to manage a port is to toss it into a github repo and have the port point there. This article discusses how to do that. While it should be 'easy' to get a clean history has a few gotchas.

Clone the FreeBSD repo

If you are contemplating moving something out of the FreeBSD repo, one way to do that is to take the FreeBSD github mirror and trim it. When doing this, I always use a new repo, but in theory you could do it in an existing repo. Given how much is tossed away, it's best to use a fresh copy to avoid disaster. Disk space is cheap, right? Here's what I used to kick off pulling timed into a separate repo.

git clone https://github.com/freebsd/freebsd timed

First pass at trimming

When one googles the topic, git filter-branch comes up. The canonical answer is a good starting point:
git filter-branch --prune-empty --subdirectory-filter usr.sbin/timed master
will do the first pass. This will leave just timed as the top level directory. For the moment, we'll leave aside the stray timed files elsewhere in the tree. That gets 'complicated' which will explore in the second part of this blog. It would be good to drop a 'go back' tag here:
git checkout -b timed-trimmed
Now, this gives a decently trimmed tree. However, there are some problem. --prune-empty is a lie, or to be more charitable, it is incompletely implemented. It doesn't prune every single thing. Especially merge commits. Those are retained, but should be omitted. So the next step is use the very flexible history rewriting "feature" of git to remove them.

Next, use git rebase to rebase things. There may be more smooth ways to do this, but I find the first version in the tree with git log | tail, and then do my rebase like so:
git rebase -i HASH_OF_FIRST_COMMIT  master
Now the fun part starts. For each commit you suspect of being a merge commit, you have have to see if that hash is included in the output of the git log --merges command above. Remove all those commits. However, this can be hard. If you have a *LOT* to sort through, it's easier to make one pass for the obvious cvs2svn and MFC commits, but if you miss one it's not the end of the world. It's also a good idea to save an unmodified version of this file. It will come in handy later if your efforts lead to only a couple of missing commits.

You can 'fix' the todo as you go, though this is tricky. Basically, when you hit an error, it's because the prior commit deleted everything as part of its 'merge'. So, to back up one, you need to just do this set of magic, I'll show, then explain:
git rev-parse HEAD
 Get the hash this prints
git rebase --edit-todo
Add 'pick ' this will make sure we keep the commit we're about to toss.
git reset --hard HEAD^
This resets the current mess (which is still in the todo list) back to one before HEAD.  At this point, you're back one commit in the rebase and have effectively skipped the troublesome commit.
git rebase --cont
This reapplies what was HEAD and then proceeds. See Git Rebase Stepping Forward and Back for more info. It turns out things are a bit tricky in that you want to make sure you are dumping the bad commit, and keeping earlier ones, and the behavior, especially when multiple branch merges happen, is a bit variable.

In the timed example, I needed to do this about a dozen times. I suspect that will be fairly typical as I had similar issues when I created the ctm repo.

Test: did I do it right?

Next, you need to make sure that you did things correctly. There's always a chance you'll drop commits that shouldn't be dropped. I ran a diff against my base FreeBSD tree and noticed that at least one commit was lost, so I had to go find it. So I missed 3 or 4 commits, and I had to go back and try again, so I had to use git reflog to find the result of the filter-branch and start over.

After I redid things (which I've not reproduced here, the second time was much easier) I did the diff and found one commit missing. I found it in my original todo file, so was able to do the rebase again, add it to the end (to make sure it applies) and then move it to the right place chronologically....

Push the result for testing

I created a new upstream (that was my personal space, not FreeBSD which I don't have permission to push to). I created a new repo, then pushed the 'master' branch from this repo  upstream. I know I'm missing the docs (which ironically were copied away early on) and the rc files. Those will be covered in the second part of this.

No comments: