The PDP-7 Where Unix Began

Serial Number of First Unix System

In preparation for a talk on Seventh Edition Unix this fall, I stumbled upon a service list from DEC for all known PDP-7 machines. From that list, and other sources, I believe that PDP-7 serial number 34 was the original Unix machine.
PDP-7 System from DEC sales literature

Building The Case

We know from simh sources, the restored PDP-7 Unix version 0 sources, and recollections from the time that the original machine used by Ken Thompson and Dennis Ritchie had, or likely had, the following hardware:
  1. 8k word of memory (option 149B)
  2. A tape reader (option 444B)
  3. A tape punch (option 75D)
  4. A 1MB disk drive (RB09 same as an RD10)
  5. A tty controller for a teletype (option 649)
  6. A standard video display (option 340)
  7. A custom video display (Unknown option number)
  8. A keyboard for input (also option 340)

We know from the service list that Bell Labs had three PDP-7s and one PDP-7A. Several of these machines had the standard options (the tape reader and teletype) and extra memory. Only one system, serial number 34, also had a disk drive, a custom unknown board that could be a Bell Custom display, and the standard display. In addition, that system shipped to Bell Labs in 1965 and appears to have been refurbished in 1969. This timeline matches the oral histories describing a discarded PDP-7 used to bring up the system in late 1969.

Here's the full table of all the systems shipped to Bell Labs with each system's options, taken from the 18 bit service list provided by Bob Supnik. You can check the list for other contenders.

Serial NumberOption #Option NameShip Date
PDP-7 #31100?
7PDP7 CPU unit?
75DPerforated paper tape punch and control
173Data interrupt multiplexer07-68
177BExtended arithmetic element1128?
444BPerforated tape reader and control
550ADECtape dual magnetic tape control12-67
649Teleprinter and control
CR01B100Cpm card reader and control
TU55Single DECtape transport12-67
TU55Single DECtape transport12-67
TU55Single DECtape transport03-69
PDP-7 #3401-69
75DPerforated paper tape punch and control07-65
149BCore memory module 8K, extends in 8K blocks07-65
177Extended arithmetic element07-65
340Precision incremental CRT display07-65
342Symbol generator for 340 display, first 64 characters07-65
370High speed light pen07-65
444BPerforated tape reader and control07-65
649Teleprinter and control07-65
CR01B100Cpm card reader and control12-66
PDP7CPU unit07-65
RC09RB09 disk?01-69
76 05477Custom Bell Labs Display?01-69
PDP-7 #4411-65
75DPerforated paper tape punch and control
149BCore memory module 8K, extends in 8K blocks11-65
177Extended arithmetic element
444BPerforated tape reader and control
649Teleprinter and control
PDP7CPU Unit11-65
PDP-7A #14903-69
149Core memory module 4K, extends subsequent 4K blocks
175Information collector expansion
175Information collector expansion03-69
177BExtended arithmetic element
340Precision incremental CRT display
347CCPU CRT subroutine interface
370High speed light pen
550DECtape dual magnetic tape control03-69
637Bit synchronous data communication system
CR01B100Cpm card reader and control
KA71AI/O device package
KA77AProcessor unit (PDP-7/A)
KB03Device selector expansion03-69
TU55Single DECtape transport03-69

Another surprise

V0 Unix could run on only one of the PDP-7s. Of the 99 PDP-7s produced, only two had disks. Serial number 14 had an RA01 listed, presumably a disk, though of a different type. In addition to the PDP-7 being obsolete in 1970, no other PDP-7 could run Unix, limiting its appeal outside of Bell Labs. By porting Unix to the PDP-11 in 1970, the group ensured Unix would live on into the future. The PDP-9 and PDP-15 were both upgrades of the PDP-7, so to be fair, PDP-7 Unix did have a natural upgrade path (the PDP-11 out sold the 18 bit systems though ~600,000 to ~1000). Ken Thompson reports in a private email that there were 2 PDP-9s and 1 PDP-15 at Bell Labs that could run a version of the PDP-7 Unix, though those machines were viewed as born obsolete.


Strange Code

Now That's Weird

I was trying to compile some ancient code I pulled off the net. It is related to the Venix stuff I've been doing on and off of late.
put = bp->b_nleft;
if (put > cnt)
    put = cnt;
bp->b_nleft -= put;
to = bp->b_ptr;
asm("movc3 r8,(r11),(r7)");
bp->b_ptr += put;
p += put;
cnt -= put;
goto top;
So that's weird, right. What the heck is that movc3 doing in the middle of that code.

This code originally ran on BSD 4.1. The only system that version of Unix ran on was a VAX (later versions were more widely ported, but 4.1 was more of a limited distribution version). OK, looking up the movc3 instruction on vax references online, we see it is the "Move Character" instruction. r8 is the length. srcaddr is (r11) and dstaddr is (r7). So in effect, someone has done an inline of bcopy() here. Now, that's half of the problem. The other half is puzzling out what is in r7, r8 and r11 at the time of this call. In a perfect world, I'd just crank up the compiler to tell me. We live in an imperfect world where spinning up a 4.1 BSD system takes a substantial amount of time.

Fortunately, we can guess. cnt -= put gives us our first clue. We're decrementing by how much we copied, it seems. So r8 (the length) is put. OK. Now, we have this nice variable named 'to' that was most likely in the dstaddr (so r7), and we update it after (to appears only to be here for the side effect, so that's nice). But what's the from? The only thing it could logically be is 'p' since we += it by put as well.

So my best guess is that can be replaced by memcpy(to, p, put); and life will be good. My spidy sense also tells me that we don't need memmove here because they aren't overlapping ranges.


Adding additional revisions

Adding other directories

Sometimes you need to add commits from other places / directories to a repo you've slimmed down. This post uses the timed example to offer some advice.


Fortunately for us, the SMM.doc directory was moved late in the game. As such, it was easy to edit the commit stream to remove that commit, and then replay all the commits that came after it. Fortunately, there was only one to remove the 3. clause (advertising clause). That was done by hand, committed and the original commit message pasted into the log. I then used git rebase to order this commit in the right place temporally.


For this directory, I followed a different path. After looking at this file (or should I say how it is currently called libexec/rc.d/timed), I determined there were only a few real commits. Since there were only 10 commits, I just created a dumb script to run in the FreeBSD root of a github mirror repo:

for i in $(grep ^commit /tmp/3 | awk '{print $2;}' | tail -r); do
        git show $i etc/rc.d/timed | sed -e s=/etc/rc.d=/rc.d=g > $d/$(printf %04d $j)
        j=$(($j + 1))
Where /tmp/3 had 'git log etc/rc.d/timed' filtered to remove all the bogus commits (eg the merge ones).

Once I had these in place, I was able to then import them into my repo by cd'ing to the root and running
git am --patch-format=stgit /tmp/timed-junk/*
I oopsed and let a merge commit sneak through, and if you do that too, you can just delete the file in /tmp/timed-junk. Also, don't know why it didn't autodetect the format, but with an explicit format it just worked.

This produced 9 commits that resulted in the same timed file as was in svn. I cheated a little and omitted the movement commits, and since this is in git, $FreeBSD$ isn't expanded. This time, I didn't bother to sort them into the stream chronologically since I have no automation to do that and 9 commits by hand was more than I had time for.

Push the result

Since I rebased, I had to do a forced push. Should someone come along and want to make this a port, I'll do the sorting of commits then and do another forced push then publish the final results under FreeBSD's github account rather than my own personal one.

Splitting up a git repo -- Single directory

Splitting up a git repo -- Single directory

FreeBSD has a large, sprawling svn repo that was once a CVS repo. There are times that things in that repo have outlived their usefulness. Sometimes those items are best moved to a FreeBSD port. One easy way to manage a port is to toss it into a github repo and have the port point there. This article discusses how to do that. While it should be 'easy' to get a clean history has a few gotchas.

Clone the FreeBSD repo

If you are contemplating moving something out of the FreeBSD repo, one way to do that is to take the FreeBSD github mirror and trim it. When doing this, I always use a new repo, but in theory you could do it in an existing repo. Given how much is tossed away, it's best to use a fresh copy to avoid disaster. Disk space is cheap, right? Here's what I used to kick off pulling timed into a separate repo.

git clone https://github.com/freebsd/freebsd timed

First pass at trimming

When one googles the topic, git filter-branch comes up. The canonical answer is a good starting point:
git filter-branch --prune-empty --subdirectory-filter usr.sbin/timed master
will do the first pass. This will leave just timed as the top level directory. For the moment, we'll leave aside the stray timed files elsewhere in the tree. That gets 'complicated' which will explore in the second part of this blog. It would be good to drop a 'go back' tag here:
git checkout -b timed-trimmed
Now, this gives a decently trimmed tree. However, there are some problem. --prune-empty is a lie, or to be more charitable, it is incompletely implemented. It doesn't prune every single thing. Especially merge commits. Those are retained, but should be omitted. So the next step is use the very flexible history rewriting "feature" of git to remove them.

Next, use git rebase to rebase things. There may be more smooth ways to do this, but I find the first version in the tree with git log | tail, and then do my rebase like so:
git rebase -i HASH_OF_FIRST_COMMIT  master
Now the fun part starts. For each commit you suspect of being a merge commit, you have have to see if that hash is included in the output of the git log --merges command above. Remove all those commits. However, this can be hard. If you have a *LOT* to sort through, it's easier to make one pass for the obvious cvs2svn and MFC commits, but if you miss one it's not the end of the world. It's also a good idea to save an unmodified version of this file. It will come in handy later if your efforts lead to only a couple of missing commits.

You can 'fix' the todo as you go, though this is tricky. Basically, when you hit an error, it's because the prior commit deleted everything as part of its 'merge'. So, to back up one, you need to just do this set of magic, I'll show, then explain:
git rev-parse HEAD
 Get the hash this prints
git rebase --edit-todo
Add 'pick ' this will make sure we keep the commit we're about to toss.
git reset --hard HEAD^
This resets the current mess (which is still in the todo list) back to one before HEAD.  At this point, you're back one commit in the rebase and have effectively skipped the troublesome commit.
git rebase --cont
This reapplies what was HEAD and then proceeds. See Git Rebase Stepping Forward and Back for more info. It turns out things are a bit tricky in that you want to make sure you are dumping the bad commit, and keeping earlier ones, and the behavior, especially when multiple branch merges happen, is a bit variable.

In the timed example, I needed to do this about a dozen times. I suspect that will be fairly typical as I had similar issues when I created the ctm repo.

Test: did I do it right?

Next, you need to make sure that you did things correctly. There's always a chance you'll drop commits that shouldn't be dropped. I ran a diff against my base FreeBSD tree and noticed that at least one commit was lost, so I had to go find it. So I missed 3 or 4 commits, and I had to go back and try again, so I had to use git reflog to find the result of the filter-branch and start over.

After I redid things (which I've not reproduced here, the second time was much easier) I did the diff and found one commit missing. I found it in my original todo file, so was able to do the rebase again, add it to the end (to make sure it applies) and then move it to the right place chronologically....

Push the result for testing

I created a new upstream (that was my personal space, not FreeBSD which I don't have permission to push to). I created a new repo, then pushed the 'master' branch from this repo  upstream. I know I'm missing the docs (which ironically were copied away early on) and the rc files. Those will be covered in the second part of this.


Backing up git repos on multiple machines to central repo w/o collision

Git Tree Management

Here's a quick column about git. It's not a complete how-to or tutorial, but more an interesting way to manage multiple trees.

The problem: I have a dozen trees on a half dozen machines. I'd like at least backup all the branches in these trees to github. Trouble is, I don't want branch names to step on each other. This can happen for a number of reasons, let's say I called something 'junk' by habit on N trees and don't want a push to screw that up...

Git's world view: To understand git, you have to understand that it is a graph of versioned trees with labels. Each node in the tree has the familiar hash, and some of the hashes have refs that the git branch command groks. It's all just a directed graph with labels under the covers.

Normally, you when clone a repo, all its tags magically change from foo to origin/foo (for some value of origin).

Enter refspecs

Turns out this has been thought of before. The answer is simple:
fred% git push origin foo:fred/foo
will push the foo branch to your origin and rewrite its name to fred/foo. Don't forget to push master too.

Now when you go to barney and fetch, you'll have a bunch of remote branches named origin/fred/foo, etc.


Since I'm doing this with a number of git svn trees, the cost is kinda high since git svn creates new, unique git hashes for all the upstream revisions to git svn rebase. It also means that you'll need to learn how to use the --onto arg of git-rebase, since if you want to move a branch from one repo like this to another.
barney% git checkout -b foo fred/foo
barney% git rebase -i fred/master foo --onto master
since you're effectively creating a new name space on your local machine for the new branch. The rebase will now properly take just those commits from foo, and then play them back onto master on the current machine and leave you with a 'foo' branch for the results.


Most things in the VENIX emulator are working


So I got tired of the terrible progress I was making chasing down issues. I thought if I could just create a simple program and get that working, I'd have much better luck.

So I wrote a simple K&R style C program:
int a=123;
int b;
extern char *etext, *edata, *end;
main() {
int c;
printf("CS: %x DS: %x ES: %x SS: %x\n", getcs(), getds(), getes(), getss());
printf("&data = %x &bss = %x &stack = %x\n", &a, &b, &c);
printf("etext = %x edata = %x end = %x\n", &etext, &edata, &end);
which printed the segment addresses and then locations of the segments.

and I ran it on my old Rainbow running Venix.

I made some interesting discoveries. First, that there are two kinds of stacks (low and high) in addition to there being two kinds of binary (OMAGIC and NMAGIC). So my loader was all wrong. Next, I discovered I needed to jump to a_entry, not 0, to make low stack binaries work (all the ones I'd been testing so far were low stack, but somehow mostly worked when the stack and text segments were swapped).

Armed with this knowledge, I built 4 binaries (no flags, -z, -i, -i -z) to test all 4 cases. The -z ones worked (yea!) while the non-z ones didn't. My loader was right in this case, but I was returning EFAULT for all the writes. Why? Because I had a check in there to make sure the address was between 0 and brk. High stack binaries also have a valid area from sp() to 0xffff. When I added that change, all 4 test programs worked.

Of course, getting them from the Rainbow to the server was a challenge. The key to remember here is that you needed to use 'set line /dev/com1.m' on the rainbow so that kermit would on login port. I also had to down-clock to 2400 baud to get it reliable.

So, I started testing a lot of programs that failed to work before. Sort(1) is now working. ls isn't, but comes closer (it tries really hard to interpret a modern FreeBSD dirent as a v7 one and that's not so good, but that's fixable). nm is still giving me problems, for reasons unknown. I have enough things working, though, that I can start to try out as, ld and friends. Maybe even cc (though I'd need to get both fork and signals working for that driver program). /bin/sh fails missing dup() (and likely a bunch of others).

So excellent progress in the last few days.


Even more VENIX emulator progress

So in looking at the traces for why cal wasn't working, I noticed something odd:

0212:100D: jmp 0x109e
0212:109C: rcrw $0xff,0xdceb(%bx,%si)
Invalid opcode c1
What? I'm not super-duper strong on Intel assembler, but I sure know that 109e is not 109c. So what's going on here. After adding some more debugging, I discovered this was opcode 0xe9, which is a jump relative with word (so take IP and add the next two bytes to it). So, the code looked OK:
                doJump(ip + fetchWord()); 
But looking more closely. It's oddly off by 2. Sow what's inside fetchWord()? Inside it effectively does ip++, twice. So, on the other compiler that was used for this code that I obtained from tkchia's reenigne repo had this flaw. it did fetchWord() + ip, rather than clang's ip + fetchWord(). So the fix was simple:
                t = fetchWord();
                doJump(ip + t);
which made the order of operations well defined. A quick audit of the code shows no other places where this is done.

I did that, and nothing else, and now cal(1) works. It produces correct calendars. As does od(1), uniq(1), pr(1) and others.

I've also implemented alarm(2), signal(2), lseek(2) and pause(2). With that, sleep(1) works (although it says 'Alarm clock' which suggests I need to actually establish a SIGALRM handler).

There's enough working I'm starting to need some kind of regression suite to make sure I don't regress and can publish the status of all the binaries... Maybe I could leverage an existing something...


VENIX/86 emulator taking shape...

Frequent readers will recall my obsession with Venix on the Rainbow.

For the past year or so in my off moments, I've been trying to put together a Venix binary emulator. This is part of a larger project to reconstruct the Venix sources from the ancient V7 sources plus clues left behind in various images found on the internet in time for the 50th anniversary of Unix next year.

Early in the summer I had the loader written. I could successfully load a VENIX image and start it executing. sync(1) was working, but it did little more than call the sync(2) and _exit(2) system calls. stdio programs were still basically not working, though odd things like ln(1) also worked.

My project is now one step closer to fruition. I have been able to get some of the basic programs in /bin working with my emulator. The last step before getting to this point was finding a bug in the 8086 CPU emulation where the pointers to things like AH and AL for the AX register were wrong so that "movb al, 1" would set ah to 1... To find that I ported ddb from FreeBSD over so I could print out registers and disassembled code that's about to execute and pore over the changes to the registers until I could spot a problem...

With that fixed, all the super simple programs run. echo(1), cat(1), tr(1), and basename(1) run. However, other simple ones like rm(1), touch(1), ls(1), and wc(1) all run into problems.

touch(1) seems to be related to a bad implementation of stat(2), for example, that I've not had the time to chase down. mv(1) and rm(1) seem that way as well. No clue what's up with wc(1) or ls(1).

I think I should come up with some kind of test script(s) to make sure the basics work.

Venix Github repo has the 86sim program in it.

I hope to soon be to the point where everything except maybe fork/exec works. I'll need those for not only /bin/sh, but also cc. cc(1) is the reason that I want to make this work so I can rebuild everything quickly for the VENIX restoration project....