20190211

Strange Code

Now That's Weird

I was trying to compile some ancient code I pulled off the net. It is related to the Venix stuff I've been doing on and off of late.
put = bp->b_nleft;
if (put > cnt)
    put = cnt;
bp->b_nleft -= put;
to = bp->b_ptr;
asm("movc3 r8,(r11),(r7)");
bp->b_ptr += put;
p += put;
cnt -= put;
goto top;
So that's weird, right. What the heck is that movc3 doing in the middle of that code.

This code originally ran on BSD 4.1. The only system that version of Unix ran on was a VAX (later versions were more widely ported, but 4.1 was more of a limited distribution version). OK, looking up the movc3 instruction on vax references online, we see it is the "Move Character" instruction. r8 is the length. srcaddr is (r11) and dstaddr is (r7). So in effect, someone has done an inline of bcopy() here. Now, that's half of the problem. The other half is puzzling out what is in r7, r8 and r11 at the time of this call. In a perfect world, I'd just crank up the compiler to tell me. We live in an imperfect world where spinning up a 4.1 BSD system takes a substantial amount of time.

Fortunately, we can guess. cnt -= put gives us our first clue. We're decrementing by how much we copied, it seems. So r8 (the length) is put. OK. Now, we have this nice variable named 'to' that was most likely in the dstaddr (so r7), and we update it after (to appears only to be here for the side effect, so that's nice). But what's the from? The only thing it could logically be is 'p' since we += it by put as well.

So my best guess is that can be replaced by memcpy(to, p, put); and life will be good. My spidy sense also tells me that we don't need memmove here because they aren't overlapping ranges.

20181217

Adding additional revisions

Adding other directories

Sometimes you need to add commits from other places / directories to a repo you've slimmed down. This post uses the timed example to offer some advice.

SMM.doc

Fortunately for us, the SMM.doc directory was moved late in the game. As such, it was easy to edit the commit stream to remove that commit, and then replay all the commits that came after it. Fortunately, there was only one to remove the 3. clause (advertising clause). That was done by hand, committed and the original commit message pasted into the log. I then used git rebase to order this commit in the right place temporally.

etc/rc.d/timed

For this directory, I followed a different path. After looking at this file (or should I say how it is currently called libexec/rc.d/timed), I determined there were only a few real commits. Since there were only 10 commits, I just created a dumb script to run in the FreeBSD root of a github mirror repo:
#!/bin/sh

j=1
d=/tmp/timed-junk
for i in $(grep ^commit /tmp/3 | awk '{print $2;}' | tail -r); do
        git show $i etc/rc.d/timed | sed -e s=/etc/rc.d=/rc.d=g > $d/$(printf %04d $j)
        j=$(($j + 1))
done
Where /tmp/3 had 'git log etc/rc.d/timed' filtered to remove all the bogus commits (eg the merge ones).

Once I had these in place, I was able to then import them into my repo by cd'ing to the root and running
git am --patch-format=stgit /tmp/timed-junk/*
I oopsed and let a merge commit sneak through, and if you do that too, you can just delete the file in /tmp/timed-junk. Also, don't know why it didn't autodetect the format, but with an explicit format it just worked.

This produced 9 commits that resulted in the same timed file as was in svn. I cheated a little and omitted the movement commits, and since this is in git, $FreeBSD$ isn't expanded. This time, I didn't bother to sort them into the stream chronologically since I have no automation to do that and 9 commits by hand was more than I had time for.

Push the result

Since I rebased, I had to do a forced push. Should someone come along and want to make this a port, I'll do the sorting of commits then and do another forced push then publish the final results under FreeBSD's github account rather than my own personal one.

Splitting up a git repo -- Single directory

Splitting up a git repo -- Single directory

FreeBSD has a large, sprawling svn repo that was once a CVS repo. There are times that things in that repo have outlived their usefulness. Sometimes those items are best moved to a FreeBSD port. One easy way to manage a port is to toss it into a github repo and have the port point there. This article discusses how to do that. While it should be 'easy' to get a clean history has a few gotchas.

Clone the FreeBSD repo

If you are contemplating moving something out of the FreeBSD repo, one way to do that is to take the FreeBSD github mirror and trim it. When doing this, I always use a new repo, but in theory you could do it in an existing repo. Given how much is tossed away, it's best to use a fresh copy to avoid disaster. Disk space is cheap, right? Here's what I used to kick off pulling timed into a separate repo.

git clone https://github.com/freebsd/freebsd timed

First pass at trimming

When one googles the topic, git filter-branch comes up. The canonical answer is a good starting point:
git filter-branch --prune-empty --subdirectory-filter usr.sbin/timed master
will do the first pass. This will leave just timed as the top level directory. For the moment, we'll leave aside the stray timed files elsewhere in the tree. That gets 'complicated' which will explore in the second part of this blog. It would be good to drop a 'go back' tag here:
git checkout -b timed-trimmed
Now, this gives a decently trimmed tree. However, there are some problem. --prune-empty is a lie, or to be more charitable, it is incompletely implemented. It doesn't prune every single thing. Especially merge commits. Those are retained, but should be omitted. So the next step is use the very flexible history rewriting "feature" of git to remove them.

Next, use git rebase to rebase things. There may be more smooth ways to do this, but I find the first version in the tree with git log | tail, and then do my rebase like so:
git rebase -i HASH_OF_FIRST_COMMIT  master
Now the fun part starts. For each commit you suspect of being a merge commit, you have have to see if that hash is included in the output of the git log --merges command above. Remove all those commits. However, this can be hard. If you have a *LOT* to sort through, it's easier to make one pass for the obvious cvs2svn and MFC commits, but if you miss one it's not the end of the world. It's also a good idea to save an unmodified version of this file. It will come in handy later if your efforts lead to only a couple of missing commits.

You can 'fix' the todo as you go, though this is tricky. Basically, when you hit an error, it's because the prior commit deleted everything as part of its 'merge'. So, to back up one, you need to just do this set of magic, I'll show, then explain:
git rev-parse HEAD
 Get the hash this prints
git rebase --edit-todo
Add 'pick ' this will make sure we keep the commit we're about to toss.
git reset --hard HEAD^
This resets the current mess (which is still in the todo list) back to one before HEAD.  At this point, you're back one commit in the rebase and have effectively skipped the troublesome commit.
git rebase --cont
This reapplies what was HEAD and then proceeds. See Git Rebase Stepping Forward and Back for more info. It turns out things are a bit tricky in that you want to make sure you are dumping the bad commit, and keeping earlier ones, and the behavior, especially when multiple branch merges happen, is a bit variable.

In the timed example, I needed to do this about a dozen times. I suspect that will be fairly typical as I had similar issues when I created the ctm repo.

Test: did I do it right?

Next, you need to make sure that you did things correctly. There's always a chance you'll drop commits that shouldn't be dropped. I ran a diff against my base FreeBSD tree and noticed that at least one commit was lost, so I had to go find it. So I missed 3 or 4 commits, and I had to go back and try again, so I had to use git reflog to find the result of the filter-branch and start over.

After I redid things (which I've not reproduced here, the second time was much easier) I did the diff and found one commit missing. I found it in my original todo file, so was able to do the rebase again, add it to the end (to make sure it applies) and then move it to the right place chronologically....

Push the result for testing

I created a new upstream (that was my personal space, not FreeBSD which I don't have permission to push to). I created a new repo, then pushed the 'master' branch from this repo  upstream. I know I'm missing the docs (which ironically were copied away early on) and the rc files. Those will be covered in the second part of this.

20181116

Backing up git repos on multiple machines to central repo w/o collision

Git Tree Management

Here's a quick column about git. It's not a complete how-to or tutorial, but more an interesting way to manage multiple trees.

The problem: I have a dozen trees on a half dozen machines. I'd like at least backup all the branches in these trees to github. Trouble is, I don't want branch names to step on each other. This can happen for a number of reasons, let's say I called something 'junk' by habit on N trees and don't want a push to screw that up...

Git's world view: To understand git, you have to understand that it is a graph of versioned trees with labels. Each node in the tree has the familiar hash, and some of the hashes have refs that the git branch command groks. It's all just a directed graph with labels under the covers.

Normally, you when clone a repo, all its tags magically change from foo to origin/foo (for some value of origin).

Enter refspecs

Turns out this has been thought of before. The answer is simple:
fred% git push origin foo:fred/foo
will push the foo branch to your origin and rewrite its name to fred/foo. Don't forget to push master too.

Now when you go to barney and fetch, you'll have a bunch of remote branches named origin/fred/foo, etc.

Costs

Since I'm doing this with a number of git svn trees, the cost is kinda high since git svn creates new, unique git hashes for all the upstream revisions to git svn rebase. It also means that you'll need to learn how to use the --onto arg of git-rebase, since if you want to move a branch from one repo like this to another.
barney% git checkout -b foo fred/foo
barney% git rebase -i fred/master foo --onto master
since you're effectively creating a new name space on your local machine for the new branch. The rebase will now properly take just those commits from foo, and then play them back onto master on the current machine and leave you with a 'foo' branch for the results.

20181110

Most things in the VENIX emulator are working

SUCCESS

So I got tired of the terrible progress I was making chasing down issues. I thought if I could just create a simple program and get that working, I'd have much better luck.

So I wrote a simple K&R style C program:
int a=123;
int b;
extern char *etext, *edata, *end;
main() {
int c;
printf("CS: %x DS: %x ES: %x SS: %x\n", getcs(), getds(), getes(), getss());
printf("&data = %x &bss = %x &stack = %x\n", &a, &b, &c);
printf("etext = %x edata = %x end = %x\n", &etext, &edata, &end);
}
which printed the segment addresses and then locations of the segments.

and I ran it on my old Rainbow running Venix.

I made some interesting discoveries. First, that there are two kinds of stacks (low and high) in addition to there being two kinds of binary (OMAGIC and NMAGIC). So my loader was all wrong. Next, I discovered I needed to jump to a_entry, not 0, to make low stack binaries work (all the ones I'd been testing so far were low stack, but somehow mostly worked when the stack and text segments were swapped).

Armed with this knowledge, I built 4 binaries (no flags, -z, -i, -i -z) to test all 4 cases. The -z ones worked (yea!) while the non-z ones didn't. My loader was right in this case, but I was returning EFAULT for all the writes. Why? Because I had a check in there to make sure the address was between 0 and brk. High stack binaries also have a valid area from sp() to 0xffff. When I added that change, all 4 test programs worked.

Of course, getting them from the Rainbow to the server was a challenge. The key to remember here is that you needed to use 'set line /dev/com1.m' on the rainbow so that kermit would on login port. I also had to down-clock to 2400 baud to get it reliable.

So, I started testing a lot of programs that failed to work before. Sort(1) is now working. ls isn't, but comes closer (it tries really hard to interpret a modern FreeBSD dirent as a v7 one and that's not so good, but that's fixable). nm is still giving me problems, for reasons unknown. I have enough things working, though, that I can start to try out as, ld and friends. Maybe even cc (though I'd need to get both fork and signals working for that driver program). /bin/sh fails missing dup() (and likely a bunch of others).

So excellent progress in the last few days.

20181104

Even more VENIX emulator progress

So in looking at the traces for why cal wasn't working, I noticed something odd:

0212:100D: jmp 0x109e
0212:109C: rcrw $0xff,0xdceb(%bx,%si)
Invalid opcode c1
What? I'm not super-duper strong on Intel assembler, but I sure know that 109e is not 109c. So what's going on here. After adding some more debugging, I discovered this was opcode 0xe9, which is a jump relative with word (so take IP and add the next two bytes to it). So, the code looked OK:
                doJump(ip + fetchWord()); 
But looking more closely. It's oddly off by 2. Sow what's inside fetchWord()? Inside it effectively does ip++, twice. So, on the other compiler that was used for this code that I obtained from tkchia's reenigne repo had this flaw. it did fetchWord() + ip, rather than clang's ip + fetchWord(). So the fix was simple:
                t = fetchWord();
                doJump(ip + t);
which made the order of operations well defined. A quick audit of the code shows no other places where this is done.

I did that, and nothing else, and now cal(1) works. It produces correct calendars. As does od(1), uniq(1), pr(1) and others.

I've also implemented alarm(2), signal(2), lseek(2) and pause(2). With that, sleep(1) works (although it says 'Alarm clock' which suggests I need to actually establish a SIGALRM handler).

There's enough working I'm starting to need some kind of regression suite to make sure I don't regress and can publish the status of all the binaries... Maybe I could leverage an existing something...

20181103

VENIX/86 emulator taking shape...

Frequent readers will recall my obsession with Venix on the Rainbow.

For the past year or so in my off moments, I've been trying to put together a Venix binary emulator. This is part of a larger project to reconstruct the Venix sources from the ancient V7 sources plus clues left behind in various images found on the internet in time for the 50th anniversary of Unix next year.

Early in the summer I had the loader written. I could successfully load a VENIX image and start it executing. sync(1) was working, but it did little more than call the sync(2) and _exit(2) system calls. stdio programs were still basically not working, though odd things like ln(1) also worked.

My project is now one step closer to fruition. I have been able to get some of the basic programs in /bin working with my emulator. The last step before getting to this point was finding a bug in the 8086 CPU emulation where the pointers to things like AH and AL for the AX register were wrong so that "movb al, 1" would set ah to 1... To find that I ported ddb from FreeBSD over so I could print out registers and disassembled code that's about to execute and pore over the changes to the registers until I could spot a problem...

With that fixed, all the super simple programs run. echo(1), cat(1), tr(1), and basename(1) run. However, other simple ones like rm(1), touch(1), ls(1), and wc(1) all run into problems.

touch(1) seems to be related to a bad implementation of stat(2), for example, that I've not had the time to chase down. mv(1) and rm(1) seem that way as well. No clue what's up with wc(1) or ls(1).

I think I should come up with some kind of test script(s) to make sure the basics work.

Venix Github repo has the 86sim program in it.

I hope to soon be to the point where everything except maybe fork/exec works. I'll need those for not only /bin/sh, but also cc. cc(1) is the reason that I want to make this work so I can rebuild everything quickly for the VENIX restoration project....

20181027

Extracting part of the FreeBSD tree

Extacting the history CTM into its own

In the early days of the FreeBSD project, CTM provided a competitive advantage to the project by allowing those that weren't completely connected to the internet. CTM provided convenient way to get the sources via mag tape or mail (sometimes over UUCP links which were popular at the time).

Recently, the notion of removing from ctm from came up. Maybe it's a good idea, maybe it isn't a good idea. I thought it would be nice to know how hard it would be to extract the history of ctm into its own repo. So, I set out to see how hard it would be. Turns out, git makes it easy.

Initial setup

First, clone a tree.
git clone https://github.com/freebsd/freebsd
Next, we need to remove everything except ctm:
git filter-branch --prune-empty --subdirectory-filter usr.sbin/ctm
This will leave a repo with the complete history and a ctm directory at the top level. We're done, right?

Well, not so fast. We actually aren't done. There's a lot of crap still in the tree. Despite saying to get rid of empty commits, there are empty commits. Most of them are from merges to the tree that didn't actually touch ctm, but were merged and git has all of those in the tree.

Rebasing to clean away the sins

So, how do we clean this up? By rebasing of course. So, to make things easy, I tagged the first version in the new repo with a tag. I used 'base'. I also tagged the tip of master after the prune with 'orgmaster' in case I needed to start over (which I've not detailed here, but I'll just say it came in handy).
git rebase -i base master
Now, if you try this, it won't work. All the merges and cvs2svn 'fixups' are a problem. After a bit of trial and error, I figured out the list of things to remove from the todo list. Then I learned there was an easier way to find this list
git log --merges
will show the merges that still remain in the tree. Remove them from the todo list and then let the rebase proceed. There's likely a clever shell-script that can do all this, but I've not written one.

Sanity check

The sanity check step is easy. Diff the checked-out ctm repo with the freebsd repo's usr.sbin/ctm tree:
diff -ur ctm freebsd/usr.sbin/ctm
and it should be identical.  If not, you need to redo prior steps until it is done.

Pushing the result

So I created a new repo on github (in bsdimp/ctm).
git remote add upstream https://github.com/bsdimp/ctm
And then push it upstream:
git push upstream master
and you should be done. https://github.com/bsdimp/ctm should now have a repo that you can look at and have it match your local tree.

Next Steps

let the pull requests come in :)