20200627

Whither chroot?

Chroot Origins

This blog post will examine original artifacts to clear up some confusion about where chroot(2) and chroot(8) came from. The answer turns out to be simple, and the confusion was understandable. This shows the benefits of groups like TUHS in preserving Unix history, and how the kindness of Caldera and Lucent in releasing the historic Unix systems has helped in our understanding of the evolution of Unix. 

EDIT: After initially published, this was revised with more links to historic artifacts (inline and in the Appendix) and a screen shot of wikipedia. The Wikipedia chroot entry has since been updated.

tl;dr: chroot(2) came from 7th Edition Unix

chroot is system call 61 in 7th Edition Unix from Bell Labs. There is no chroot system call in 6th Edition or earlier. All derivatives of 7th edition have chroot(2) for at least 2 decades after the 7th Edition release in 1979.

What confusion?

Wikipedia has this in their entry for chroot:
which suggests that Bill Joy had something to do with its creation in the BSD world. Turns out it's confused because earlier literature on the topic is also confused.

What Sparked the Confusion?

Poul-Henning Kamp created the jail system for FreeBSD. This system takes a chroot environment to the next level in terms of security. As a security device, chroot was terrible because it's fairly easy to jailbreak out of a chroot if you are root. The short version is to open '/' to get a reference to it. Then chroot to some directory further down the tree. Then fchdir to the fd you saved from '/'. Now chdir(".."); a bunch of times. This will walk you back to the real root. Now chroot(".") and you are out. There's lots of variations on this theme, and dozens of papers in the literature and an almost infinite number of ways to leak references to FDs outside the jail...

One of the wonderful thing he did was to create an extensive set of docs and write a paper about the jail(2) facilities. In this paper Mr Kamp wrote:
[CHROOT]
Dr. Marshall Kirk Mckusick, private communication: ``According to the SCCS logs, the chroot call was added by Bill Joy on March 18, 1982 approximately 1.5 years before 4.2BSD was released. That was well before we had ftp servers of any sort (ftp did not show up in the source tree until January 1983). My best guess as to its purpose was to allow Bill to chroot into the /4.2BSD build directory and build a system using only the files, include files, etc contained in that tree. That was the only use of chroot that I remember from the early days.''
This paper was presented at the 2nd International System Administration and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and is published in the proceedings.

In 2000, the BSD SCCS tree was not publicly available. Dr McKusick had access to it as his role with the Computer Science Research Group (CSRG) that produce the 4BSD releases. This predated various litigation that suggested 32V had no copyright, and the Ancient Unix License that SCO granted for 32V, so it was necessarily private per agreements between AT&T and The University of California at Berkeley.

What Actually Happened in 1982?

What happened was a shuffling of the deck chairs. the commit log, made as root, from March 18, 1992 says:

rearrange for kirk

SCCS-vsn: 4.21
and introduces chroot to ufs_syscalls.c. If you read the diffs, it also introduced 'open', 'creat' and several others to this file. These system calls are known to be in the PDP-7 Unix implementation, so it's unlikely that they were really introduced in this commit. One problem that makes this harder to track is that SCCS didn't track renames, and ufs_syscalls.c was renamed to vfs_syscalls.c in 4.4BSD.  It's quite clearly in ufs_syscalls.c in 4.1cBSD:
/*
 * Change notion of root (``/'') directory.
 */
chroot()
{

        if (suser())
                chdirec(&u.u_rdir);
}
which is the identical code that was added by Bill Joy to ufs_syscalls.c. This was moved between 4.1BSD and 4.1c from sys4.c as part of the UFS work, and is different only by the BSD-stylistic change to add a blank line before the rest of the code if there's no local variables:
chroot()
{
        if (suser())
                chdirec(&u.u_rdir);
}
which, apart from the comment, is identical. Without beating a dead horse (too late?), this code is the same all the way back to 4BSD, 3BSD, 32V, 2.8BSD and finally to V7:
chroot()
{
        if (suser())
                chdirec(&u.u_rdir);
}
Since the code is identical from V7 all the way through 4.2BSD when it was, according to this footnote in the jail appeared, added. This is direct evidence that the footnote was in error.

So what was the rearrangement for Kirk? It was to move things around in the kernel to make the system calls more generic. It was code motion, nothing more, that Dr. McKusick was reporting in the private email to Mr Kamp. Now that the SCCS tree is public, via a translation to svn by John Baldwin, we can see the above.

chroot(2) Conclusions

Given that the code was moved around alot, it's an understandable mistake that Dr. McKusick made, which explains how the error could have happened. Given that the code is identical to v7 code, and it was somewhere in all the extant versions between the two (2BSD, 32V, 3BSD, 4.0BSD, 4.1BSD, 4.1cBSD and 4.2BSD), modulo a trivial whitespace change, we can conclude that Bill Joy did not introduce chroot into 4.2BSD, but instead it was moved around a lot from the original V7 code.

The FreeBSD chroot(2) manual has been updated to correct this mistake.

But what about chroot(8)?

But what about chroot(8)? There's some confusion about this as well. Until recently, chroot(8) said in FreeBSD:
HISTORY
     The chroot utility first appeared in 4.4BSD.
However, that too is in error (or was at least not precise enough). The error comes from the 4.4BSD release itself, which has identical text. In a sense this is not wrong. 4.4BSD was the first full release that chroot(8) appeared in in the Berkeley world. It's first appearance, though, in any BSD tape was in the interim 4.3BSD-Reno release.

But what about the AT&T world? There, more system calls are wrapped in programs to make it easier to use in shell scripts. It turns out that System III had a usr/src/cmd/chroot.c, which I won't quote here, that's a different chroot than appeared in BSD (the code looks completely different, apart from the elements that have to be the same...). So, the history has been corrected to read:
HISTORY
     The chroot utility first appeared in AT&T System III UNIX and
     4.3BSD-Reno.
to represent the first time in each of the two branches of Unix after the 7th Edition that it appeared.

And that concludes today's software archeology deep dive on chroot...

Appendix

Here's the evolution of the chroot(2) implementation, as see from TUHS. You'll need to search for 'chroot()' in each of these source files since the current TUHS web site doesn't allow line number links.
AT&T Unix: V7, 32V, System III, System V

I'd also like to plug the Historic Unix Repo, which also helps navigate and allows line numbers. Here's a link to the 4.1c version, for example. I recalled this after I'd found all the TUHS references, or I'd done all of them like that.

Adding a second disk with SIMH and 2.11BSD

Adding a Second Disk to a 2.11BSD system under SIMH

I recently followed some instructions to get 2.11BSD running under SIMH. That topic is covered elsewhere adequately. I may write something up in the future.

Before I started, my simh.init file looked like this (some items from install omitted)

SET CPU 11/93, 4M
SET CPU IDLE
SET RP  ENABLE
SET RP0 ENABLE, RP06, WRITEENABLED
ATTACH RP0 ./2.11BSD
SET XQ ENABLED
SET XQ TYPE=DEQNA
SET XQ MAC=08-00-2b-11-07-82
ATTACH XQ tap:tap0
; At the SimH promp type: unix
BOOT RP0

As part of my 2.11BSD patch level 0 restoration project (more on that later), I needed to add another disk I could install chroot images to test building. I'm running 2.11BSD pl 457 at the moment (I've not walked forward form the last snapshot tape). Fortunately, this version has disklabels, so I'm able to do this the easy way (though the old hard-coded stuff isn't too hard either).

First, I needed to add the raw disk in simh. I opted to have a second RP06 for simplicity. There's adequate space. My 'root image' for 2.11BSD I want to test is about 100MB, and the RP06 is 165MB. That should be adequate. I just needed to duplicate the RP0 lines:
SET RP1 ENABLE, RP06, WRITEENABLED
ATTACH RP1 ./extra-data
and restart simh (be sure to halt any running system before stopping simh). The important part here is to configure RP1 as writeable and an RP06. Otherwise, it will default to the smaller RP04. For me, that's too small.

Next, I had to check to see if there were /dev nodes for this device. The xp driver handles RP06 (and many other) disks.
3% root-> ls /dev/xp1*
/dev/xp1a  /dev/xp1c  /dev/xp1e  /dev/xp1g
/dev/xp1b  /dev/xp1d  /dev/xp1f  /dev/xp1h
so I'm in luck. The devices are there. Otherwise I'd have to run /dev/MAKEDEV in /dev to add them (or worse, do it by hand).

Next, I needed to label the disk. It's fortunate I'm running a new version because this was easy and I didn't have to rely on the hard-coded partitioning in the driver. However, even if I did, I'm using the whole disk so it wouldn't change my life much...
5% root-> disklabel -r -w xp1 rp06
which puts the standard rp06 label (from /etc/disktab) onto the drive. Chances are good that this will work on older versions. This gives the following label:
6% root-> disklabel -r xp1
# /dev/rxp1a:
type: unknown
disk: rp06
label:
flags: removeable badsect
bytes/sector: 512
sectors/track: 22
tracks/cylinder: 19
sectors/cylinder: 418
cylinders: 815
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0

8 partitions:
#        size   offset    fstype   [fsize bsize]
  a:     9614        0   2.11BSD     1024  1024         # (Cyl.    0 - 22)
  b:     8778     9614      swap                        # (Cyl.   23 - 43)
  c:   153406    18392   2.11BSD     1024  1024         # (Cyl.   44 - 410)
  d:   168724   171798   2.11BSD     1024  1024         # (Cyl.  411 - 814*)
  e:   322130    18392   2.11BSD     1024  1024         # (Cyl.   44 - 814*)
  g:   171798        0   2.11BSD     1024  1024         # (Cyl.    0 - 410)
  h:   340522        0   2.11BSD     1024  1024         # (Cyl.    0 - 814*)
Note one difference from modern FreeBSD: rxp1a. 2.11BSD still has the character/block split. Also the 'standard' layout looks a bit odd to modern eyes. But there's 4 sets of partitions here: a,b,c,d for a system disk with / on a, swap on b and /usr on c. a,b,e (same but with a larger /usr on e). g and d to split the disk in half for data storage. And h for the whole disk. This mirrors the partitions from when things were hard coded in the device driver (yikes! glad we don't have that legacy anymore). In those days, you had to be as flexible as you could and leave it to the sysadmin to make wise choices with the limited flexibility they hard. These days, I'd label a scratch disk with just one partition (and call it 'a'). Since I was being lazy, I thought I'd leave this label in place. It's a quaint curiosity, but also instructive of history.

So, next, I have to put a filesystem on it. That's done with newfs:
8% root--> newfs /dev/xp1h
newfs: /dev/xp1h: not a character device
9% root--> newfs /dev/rxp1h
newfs: /sbin/mkfs -m 2 -n 209 -i 4096 -s 170261 /dev/rxp1h
isize = 42560
m/n = 2 209
which gives me a new filesystem. This is quite a bit less chatty that I'm used to on FreeBSD. Also, even after noticing, I forgot you have to newfs and fsck the raw device, not the block device.

Now time to mount it and add it to fstab. Old-school write ups say to fsck /dev/rxp1h here, but given simh doesn't simulate the unreliability often found in the hardware of the time, I've skipped that part.
10% root--> mkdir /scratch
11% root--> mount /dev/xp1h /scratch
12% root--> vi /etc/fstab
"/etc/fstab" 3 lines, 79 characters
/dev/xp0a       /       ufs     rw              1       1
/dev/xp0b       none    swap    sw              0       0
/dev/xp0c       /usr    ufs     rw              1       2
/dev/xp1h       /scratch ufs    rw              1       1
I've scrunched the vi session into the above: I just added the last line. And now I have a /scratch filesystem that will survive reboot.

And now I'm ready to create a tape with my putative 2.11BSD pl 0 system (really at the moment a 2.11BSD pl 195 system with pl0 sources). But that's for another day.

20200618

FreeBSD's METALOG: unprivileged installs

What is METALOG?

When you 'make installworld -DNO_ROOT DESTDIR=blah', the system will create a $DESTDIR/METALOG file. This file contains all the permission and modes for the files. Normally, installworld requires root permission. -DNO_ROOT instructs the build system to install them as the user and to note what permissions, etc in a METALOG.

How to use METALOG

Creating a UFS partition with no privs

If you have your own tooling around image creation, you can use the METALOG to supply the permissions and other filesystem metadata to that process. makefs can be used by a non-privileged user to a UFS partition image. Coupled with mkimg, you can create an entire bootable system image without needing root. Look at the -F flag to makefs(8) for how to use this functionality.

Package Base Use

METALOG is also used by the pkgbase initiative to slice up the system. Part of the metadata that's included is what package each of the installed files belongs to. This is all transparent when you do a 'make packages' to generate these packags.

Taring up  an installworld

If you are looking for a quick and dirty way to udpate a VM, you can often just create a tarball from the METALOG. Tar was enhanced a number of years ago to understand mtree files. The METALOG is one giant MTREE file. To create a tarball that's a copy of the image with all the right permissions:

cd $DESTDIR
tar cfJ base.txz @METALOG
This will create a xz compressed base.txz similar to what the release images create. This one tarball has everything (unlike the base.txz from the release build process), and is about 800MB.

20200520

Random Post: DMR posts dws.s from pdp-7 Unix to net.unix-wizards

December 8, 1984 Dennis Ritchie posted the following:
I happened to dredge up an old notebook and found a listing
of the PDP-7 version of dsw.  Because several people have approached
me recently about reviving a version of PDP-7 Unix as a sort of
paleontological exhibit, and because the subject has been discussed
here, I thought people might be interested in seeing the code.
I first considered net.sources, but decided not to carry whimsy too far.                Dennis Ritchie
Notes:
1) The assembler has Knuth-style temporary labels but no literals.
2) The name of the current directory was evidently ".."
3) Formatting is faithfully reproduced.
4) "sys save" makes a core image.
------
" dsw
   lac djmp
   dac .-1
   oas cla
   cma
   tad d1
   dac t1
   sys open; dd; 0
1:
   lac d2
   sys read; dir; 8
   sna
   sys exit
   lac dir
   sna
   jmp 1b
   isz t1
   jmp 1b
wr:
   lac d1
   sys write; dir+1; 4
   lac d1
   sys write; o12; 1
   sys save
do:
   sys unlink; dir+1
   sys exit
d1: 1
d2: 2
o12: 012
t1: 0
djmp: jmp do
dd: 056056; 040040; 040040; 040040
dir: .=.+8







20200426

Thanks for the tips!

Quick Update

Thanks for the tips!

Warren Toomey pointed me at the Kermit in 4.3BSD: It was 4C(057). But it was a trimmed copy of C-Kermit: just the unix files were there. I don't know how I missed it. This version was copied far and wide, but I didn't have any 4.3BSD trees extracted.

Warren Toomey also pointed me to the net.sources group. I'd looked in the comp.sources.* groups, but found nothing. Turns out C-Kermit was posted to net.sources (twice). There was a warning not to post it, that came too late, not to post it. There was much pent up demand for C-Kermit 4, it seems. Both 4.0(025) and 4.2(030) were posted. I've not looked closely, but there's some minor differences between the two, maybe due to more glitches in the 4.0 conversion on the DECUS tape I found. I'll have to see if any of the other differences matter.  I've not looked at the 4.2 to see if it matches or not.

I got another copy of an early 4F version, but it may be the OS/2 version. I need to study it some more. It says it was 4F(088). I'll look into it more when I have time.

Another friend suggested a search that lead me to another site that has DECUS tapes online. So far I've found 4C(053) and 4C(058). The latter is awesome, if complete, because the only 058 releases I've been able to find are for the Amiga, with all other files stripped. There's files for 'i', 'm', 'u' and 'v' there, which I think are Amiga, Macintosh, Unix and VMS respectively. I haven't tried to build it, so I don't know if there's issues with conversion or not. And 053 is awesome too because it lets me look at the DECUS version to see what DEC added, and what later made it into C-Kermit.  I feel kinda bad about finding these, since I wrote a script to pull files down and it kinda ran amuck before I noticed and could fix it. Still, a pure 053 gets me one step closer to recreating 4C(052) for the BSW Venix binary.

I also found a few more 5A versions, but I'm unsure what I'll do with those... There's a lot of them out there, and many are hacked for this version or that of Unix or whatever.

So if you have any version not listed here, please let me know.

20200422

Finding Kermit 4x

Unix C-Kermit 4x Versions


As part of my efforts to reconstruct my Venix system to sources, I recently went on a hunt for old versions of Unix Kermit. There's a fair number of them, but not many from the kermit project web site. Prior to version 4, unix kermit was a command line only program. These versions are adequately represented in the Unix archive, mostly because they had a funky name that C-Kermit didn't overwrite. I've also ignored gkermit, which appears to be a command line version of C-Kermit for latter-day Linux systems (though it retained support for the common early systems).

So I concentrated only on the 'Version 4' series of releases. In the end I've found 2 missing versions that I can say with good confidence are the final version of them, three preliminary or interim versions and one modified version of a preliminary version. I also found a great diversity of 5A betas, which I've not written up here. The early history of the 4x releases is missing, which is why I went on this hunt. I like a challenge.

UPDATE: I'll be posting an addendum blog in the coming days because people have sent me pointers to other versions. Stay tuned.

Known History of Kermit Versions

Here's a brief table of relevant 4x versions. I've omitted the earlier versions (they are available at the kermit archive) and the later versions (they are basically too).

Version Date Comments
4.0 Missing No known copies
4.0(025)5 Feb 85First Release of Unix Kermit version 4.
4.2(030)5 Mar 85In kermit archive and on 1987 Usenix tape
4C(050)30 May 85First enumerated beta after 4.2 and version name change
4C(052)18 Jun 85Boston Software Works Venix/Kermit.
4C(053)21 Jun 85DECUS VMSLT Venix/Pro sources labeled 4C(053)+1 (DEC changes)
4C(056)12 Jul 86Testing release, no copies in Kermit Archive
4C(058)19 Mar 86Official release, no copies in Kermit Archive
4D(060)18 Apr 86Testing release, no copies in Kermit Archive
4D(061)8 Sep 86Testing release, no copies in Kermit Archive
4E(067)14 Sep 87Testing release, no copies in Kermit Archive
4E(070)29 Jan 8810th Edition Unix and iubioarchive copies
4E(072)24 Jan 89official release in kermit archive
4F(095)31 Aug 89official unreleased release in kermit archive

Kermit Version Naming Convention

C-Kermit basically started with version 4.0 for a variety of historical reasons. 4 was the first one that had a command line built into it. 4.0 went out to a limited group, and 4.2 was released as a wider-spread beta. BSD 4.2 was coming out at this time, so Kermit changed its naming convention to using a letter: 4C, 4D, etc. The number in () is a change count. Older systems didn't conform to a 'minor' number, but instead used a change count to which was either an actual count of changes, or builds or some other incrementing counter since the project started or was rewritten. So 4C(050) is the 50th release (or change) and comes before 4C(051). The Kermit project used the 'release number' convention where 'release' is poorly defined during this time period, so all the 'release numbers' documented in change logs may not have been put on an FTP site.

Venix from Boston Software Works

The Boston Software Works Rainbow Venix, the driver of my obsessions for the last few years, included a version of kermit. On startup it printed
C-Kermit, 4C(052) 12 Jun 85
which I thought would be easy enough to find. Little did I know it would be trouble. When I first was looking at this two or three years ago, I found the following on the Columbia Kermit Archive:

C-Kermit 4.0 (5 Feb 1985), the first interactive version, through 4D are missing.

which was discouraging. I let the matter sit there. I noticed a bit later this news item: C-Kermit 4.2 that talked about 4.2 being rediscovered. So I was able to get 4.2 and 4E, but not 4F running on my Rainbow which helped (4E is an improvement over 4C that BSW shipped, 4.2 is about the same), but it left me curious. 4E was right on the edge of size, and 4F appears to be just a bit too big to run, so I've put off trying to puzzle out if I can get that working. 5A won't even link.
We see from the above table that this is an unofficial release, so we may never get the actual bits for it. Let's see what the best we can find is, however is.

Side Tracked: Hunting for a Xenix-11 Tape

Misspiggy, a PDP-11/70 that Microsoft donated to Living Computer Museum in Seattle Washington was recently demoed running V7 Unix and the adventure game. In an offhand comment, they said they wanted to run Xenix-11 on this, since that's what Microsoft ran on it. They were looking for a copy to run since they apparently didn't have one. There's a catalog entry at LCM for a XENIX tape, but that's not surfaced.

But Warren Toomey over at TUHS shared with me a tape he thought might contain XENIX (it had XENIX in the filename). Turns out that tape was just two files of a V6  system that was otherwise unremarkable, and a copy of Venix that never booted (most likely it was made / hacked together on said V6 system since the filesystem was V6, but Venix is a V7/System III port that uses the newer filesystem layout). Disappointed, that it didn't pan out, Warren shared with me some Venix related files since he knew of my work...

A new archive of PRO Venix

One of archives appeared to be Venix for the PRO that DECUS had distributed after Venturcom abandoned its support for the Professional line of DEC personal computers. It was a series of 22 diskettes. I've not looked through them all, but I did find that two of the diskettes had Kermit on them! One binary and one source! Extracting the files lead to this discovery in ckcmai.c:
char *versio = "C-Kermit, 4C(053)+1 21 Jun 85";
which is quite close.

The changes that were included were a number of changes for VMS, Support for US Robotics 212 modems, better support for some DEC modems, some technical corrections for auto dialing, some cleanup of help messages, many tweaks to cope with Venix's Code Mapping feature (kinda like overlays, but somehow different than traditional overlays), better handling of hangup and various debugging fixes. At least according to the file DECnotes that was included on the exe diskette.

So this is a hacked version of a beta version of 4C. So we're not there yet, but closer than before. Let's go looking on the internet for more.

Google to the Rescue

c-kermit has rather unique filenames. In order to cope with the realities of a PDP-10 with one big giant directory of files (which also helped the master tapes it produced), it developed a naming scheme where each of the first few letters means something. K11 was the MACRO-11 version for the PDP-11, K10 was the BLISS and MACRO-10 version, CP4 was for CP/M, BBC was for the BBC Acorn, etc (there's 129 of them on one of the full KERMIT tapes I found). CK is the code for the Unix C version, later the general C version. The third letter further specified what system: c for all, u for unix, v for vms, 9 for OS9, i for Amiga, etc.

Next, there are a number of different versions stamped in different files. However, the one that's most interesting is in ckcmai.c. Since version 4C, it's been the name of the file where the version printed at starupt lives. Now 'ckcmai.c' is a fairly unique string, and plugging it into Google gives a lot of results. There's a lot of them, but it's easy to churn through them all. I've omitted 5.x and newer. There's about a dozen different 5A betas that can be found this way. I've also omitted released versions we already have (4E and 4F, even variants than what's available at kermitproject.org).

char *versio = "C-Kermit, 4D(061) 8 Sep 86";
V10 Unix also has this
char *versio = "C-Kermit, 4E(070) 29 Jan 88";
char *versio = "C-Kermit, 4E(070) 29 Jan 88";

which is promising, but is only a couple of variants. We'll need to widen our search. First thing to note is before the 4C release, ckcmai.c was called ckmain.c. Widening that, we find a copy of 4.2 both in the kermit archives and on 1987 Usenix tapes:
char *versio = "C-Kermit 4.2(030) PRERELEASE # 2, 5 March 85";
which matches the version in the kermit archive. Different BSD distributions contains a number of the 5A betas mentioned above, but not listed here.

Kermit Archive

The Kermit Software Archive has a number of interesting bits of history in it. However, it doesn't have C-Kermit before 4E in it. Some of the specialty ports have old versions, but they are so modified that reconstruction is limited. Acron kermit "Panos-Kermit" and Archimedes kermit "Arthur-Kermit" both were forked from 4C(052). There's a small discrepancy between acorn kermit and the website, though. The website says it's based on 4C(057), but the main file says derived from 4C(052). These might prove useful to get back to the putative 4C(052) that my Boston Software Works Rainbow Venix came with, but it's unclear how best to thread that speculative path currently, so I'll have to put that aside for another time. It would be nice to have these sources, but it would be a speculative trudge to try to reverse engineer them from the kermit binary I have and there's other, more important reverse engineering to do there. There is also a 4D(061) that's derived from 4D(061) (supposedly, we'll see later) hacked for Minix v1.

What's FISH?

Fish is last name of a rather prolific gentleman named Fred Fish. He pulled together a collection of freeware disks for the Amiga which were instrumental in distributing freeware for the Amiga through the mid 90s. A very early disk, #26, had a copy of C-Kermit on it, ported to the Amiga. However, I've had to discard this potentially useful line of inquiry. The Amiga version is missing all the Unix files and has also been somewhat modified in ways that aren't at all clear to me. The search engine at aminet also fails to find anything but the latest version of kermit, which limits its usefulness.

DECUS Tapes

You'll notice above that there's a hit on DEC tapes for RSX-11. I found it in another location as well when I did a variation of the search. iblibio has a similar sort of thing, and it was easier to grab than the classiccmp site which had a lot of extra stuff that I wasn't sure about needing. So I mirrored that instead and hit the mother load.
rsx84b: "C-Kermit 4.0(025) PRERELEASE TEST VERSION, 5 Feb 85"
rsx85a: "C-Kermit, 4C(056) 12 Jul 85"
rsx86a: "C-Kermit, 4D(060) 18 Apr 86"
rsx87b: "C-Kermit, 4D(061) 8 Sep 86"
rsts/sig87: "C-Kermit, 4D(061) 8 Sep 86"
rsx87b: "C-Kermit, 4E(067) 14 Sep 87"
Sadly, there's no central repository of DECUS tapes, and the central library that existed at one time has gone away in all the M&A activity after DEC was bought and the support DEC gave DECUS dried up as Compaq and HP valued it less and less.

Reader Contributed

I've received the following versions from a reader:
C-Kermit, 4F(088) 19 Jul 89
C-Kermit 5A(166) ALPHA, 17 Mar 91
I'm working to verify them. There are many 5A versions on the net (some modified, some not), but this one is earlier than them all. I may do a followup with 5A, or I may leave that to others. This section of the blog may update in the future.

Version matching

4.0(025)

We have one copy of this. It appeared on the RSX84b DECUS tape. It's mixed in with all the other files for the PDP-11, which is a bit strange. It only runs on BSD4.2. I believe this is the initial release that went out the door to the world. Google has found the Info Kermit archives, which I used to piece this together:
  1. The date on these files is 5 Feb 85.
  2. The ckermi.ann file contained a copy of the Info-Kermit digest he sent out on Feb 5, 1985 announcing this.
  3. Work began in the summer of 1984 and was teased on the Info-Kermit mailing list
  4. uxkermit was released in September as an interim release that improved the earlier unix kermit releases (I've found 3.0(0) dated 8/1/84 and 3.0(1) dated 11/5/84 in various places).
  5. Frank de Cruz announced on Nov 28, 1984 that "Although far from ready for release, some progress has been made on the new (version 4) Unix Kermit." in an email to Info-Kermit. This was his last word on the topic until Feb 5, 1985.
  6. Within 2 days of the "Unix Kermit 4.0 Announcement," there were 16 different ports were announced. Within a month, it's exploded to too many platforms to mention and 4.2 was readied.
All these things lead me to believe this is the legitimate the first public 4.0 release, and there are no others to be found. We're quite fortunate this made it onto whatever Kermit Tape the RSX SIG used for their RSX84b tape.

I've prepared a ckc025.tar file that captures the 4.0 state of the release.

4.2 versions

We have two copies of 4.2. There's one from the kermit archive, and a second from the Usenix 1987 show some differences. A quick diff shows ckusr2.c differences. However, ckusr2.c.orig matches exactly the version in the kermit archive. So we've found a confirmation that the version that showed up is good. So this is confirmation that copy of the code in the Kermit Archives is good.

4C diversity

So we have all or part of the 4C(052),  4C(053), 4C(056) and 4C(058). Since the DECUS tapes are otherwise most reliable, it would seem that 4C(056) is the best of the lot in terms of original sources. We know that the 052 and 058 copies aren't for Unix, so lack the cku*.c files and they've been heavily modified for their targets. 053 has all the unix files, but modified in a number of ways that are documented. So we don't have the actual, final 4C release, but do have the 4C(056) release.

I've prepared ckc056.tar to capture this. I've also put together a ckc053-decus.tar to capture the modified version from DEC. I've provided a link to the Amiga 4C(058) files above, so won't be creating anything special for that, since it seems to be of limited usefulness.

Looking at the changelog, 053 and 052 differ only in the declaration of dopar as CHAR, so it appears all I'd need to unwind from the 053+1 release is DEC's changes. A fun project for another day.: 

UPDATE: 4.3BSD has 4C(057) included as well, a new version. Thanks to Warren Toomey for bringing this to my attention. Will post followup.

4D(060)

We have 1 copy of this from the DECUS RSX86a tape. Spot checking of the diffs between this and 4D(061) more or less match the change log and suggest this is a try copy of this release.

I've created a ckc060.tar based on these files to capture this version.

4D(061)

We have three copies of what appears to be 4.0D(061). One is from the Usenix 87 tape. One is on the DECUS RSX87b tape. And one from the DECUS RSTS/e 87 SIG tape. The RSTS/e tape is identical to the RSX87b tape, apart from weird line endings and NULs at the end of files. Which one do we believe. In an ideal world, we'd do a diff, they'd be the same and we'd go home. That didn't happen. so let's dive in. Apart from files that are just in one directory, there's 4 differences between these two sources: ckuker.bwr, ckukern.mak, ckwart.c and ckwart.doc. ckcuker.bwr are fairly different, but the ndifferences start like this:
--- rsx87b/ckuker.bwr      1987-08-05 18:00:00.000000000 -0600
+++ usenix87/ckuker.bwr        1987-08-14 15:04:02.000000000 -0600
@@ -1,6 +1,6 @@
-C-Kermit Version 4D(061):
+C-Kermit Version 4D(060):
 Status, Bugs, and Problems
-As of: 12:07pm  Thursday, 19 March 1987
+As of: 7 July 1986
So it would appear the DECUS tape got the ckuker.bwr right, and something is wrong with the c-kermit on the Usenix 87 tape. If we look at ckukerm.mak, we see that it's also a regression on the Usenix tape (2.06 vs 2.05). ckwart.c and .doc have the same issue too (copyright 1985 vs 1984). so this suggests strongly that we can grab the sources from the Usenix 87 sources, but augment them with the DECUS tape for these 4 files. This will also let us eliminate the extra files from the DECUS tape not part of C-Kermit. Two copies from disparate locations gives us good confirmation this is the right resolution. Also, the timing of when these tapes were created (both in August of 87) limits how late the copies were.

However, there's one last wrinkle, though, in all this, which suggests there were actually two different 4D(061) releases. The date listed in the ckcmai.c file is "8 Sep 86" and the dates newer than this date in the files affected  suggest that the Usenix tape is a truer copy of 4D(061) as released, but perhaps that the DECUS tape was a later correction to fix a couple of minor 'oopses' in that release might be best seen as 4D(061) as intended. The change log is not helpful, other than saying one of the changes was for 2.9BSD on a Pro-380, which is one of the changes in ckukerm.mak (it lumps everything after 4D(061)-4E(066) together). So the DECUS tape likely represents a 4D(062) snapshot, likely reflecting the KERMIT distribution tape / single directory practiced at the time. To be honest, 062 is kinda arbitrary, though, since it could be any of the next couple of releases. I choose 062, though, because the changes were so limited, and it looks like a classic case of forgetting to bump a couple of numbers, which I imagine would only happen now and again.

Therefore, I've created a ckc061.tar based solely on the Usenix tape since I think the case is stronger for that. I've created a ckc062.tar based on the DECUS RSX87a tape.

There's also a 'minix1' version in the archives supposedly based on 4D(061). It's actually quote close to the now-found sources, with the following differences:

  1. #ifdef for the version strings
  2. Some newlines removed from some messages to fit them on the screen
  3. Compile nits: additional prototypes, some longs become ints, %D instead of %ld
  4. Mostly based on V7, but with tweaks for tty differences
  5. A logging function rewritten to be smaller
Based on the size, nature and extent of the diffs, we have another confirmation of the 4D(061) sources found have good fidelity to the likely release, and minix1 in the archive is a direct descendent of 4D(061) and not a different version. Since it's missing ckwart.c, it's impossible to know if it was from the slightly newer version on the DECUS tape or not (for the files in minix1.tar.gz, there's no way to know).

4E(067)

As with 4D(060), we have one copy of this version from the RSX87b DECUS tape. I've created a ckc067.tar to capture it. Not much more to say about this, except it arrived as xk* instead of ck* files. I've renamed all the xk to ck files in this process, since the makefiles still had the original ck file names in them. The xk thing was normal, from any number of announcements in Info-Kermit, including the 4E(066) announcement:

The files are in KER:XK*.* on CU20B.COLUMBIA.EDU (available via anonymous FTP) and XK* * on CUVMA (available via BITNET KERMSRV), and will be on Kermit Tape B, and should also show up at Oklahoma State U for UUCP access within a couple weeks. The new files don't replace the current C-Kermit files (CK*.*), and will not do so until all the systems demonstrably work. In order to use these files, you have to rename them to CK*.* (or ck*.*) so that the various Makefiles and other build procedures work, and the include (.h) files have the right names. There's a program to do this, XKTOCK.C, which should be fairly portable (if it doesn't work, the files can be renamed by hand).


so I've just done what was instructed. It appears that only 4E(066) and 4E(067) were distributed this way, as the files were renamed back for 4E(068). And 4E(068) lasted only for a couple of days because 4E(070) was released quickly after it to fix two fatal flaws, the summary of which is too good not to share
 . getcwd() not defined in BSD UNIX, breaking BSD versions.
 . Unconditional reference to SIGSTOP, breaking non-BSD versions.
So in effect, we have the last two beta versions before the final 4E(072) release (071 was also a brief flash in the pan)

4E(070)

We have two copies of this. They match almost exactly. The only differences between the two is that the 10th Edition Unix version has 10th Edition Unix (V10) changes. Since those are the only change, and the change is in context exactly the change you'd expect, we can say with a high degree of confidence iubioarchive copy is the original copy of 4E(070). It's unclear how important this release is, but I've made a ckc070.tar tarball based on this find after renaming the files to lower case and changing the line endings to Unix.

4E(072) and 4F(095)

This release is in the Kermit archive. It was the first 4x release to have been in the archive (apart from the later found 4.2), so we'll stop our journey here. Other than 4F(095), there's no more 4F versions available that I've been able to locate. There's references to 4F(077), 4F(080), 4F(085), 4F(090) and 4F(094) in Info-Kermit archives as well, but it only has announcements for 4F(085) and 4F(094) in it, suggesting the announcing traffic has gone elsewhere. 5A and 4F were developed in parallel after this, and 4F was never officially released... Ah, but sorting out that tangled history will wait for another day (and likely another person).

Conclusion

So a simple hunt turned up a number of new releases. A copy of the final 4.0 and 4D releases, as well as testing copies of 4C, 4D and 4E. Or: 4.0(020), 4C(053)+dec, 4C(056), 4D(060), 4D(061), 4D(062), 4D(067) and 4E(070). Plus I turned up another copy of 4.2 that matches the copy in the kermit project's archives. 7 apparently unmodified releases and one modified release. This turns out to be far more than I'd hoped for when I began this little snipe hunt. I've made the files I found available (see links above) for anybody that's interested.

If you have a clean copy of any of the versions in the 4x series of releases not listed here, please get in touch with the author. I'm looking for anything from 1990 or earlier.

After posting this, Frank da Cruz tried out the 4.0(025) edit.  He found a few transcription errors from the DECUS tape, patched them up and posted the result at The Kermit Project for all to see.

20200421

More Venix reconstruction work

More Venix reconstruction work

With the simulator working well enough to run many / most of the Venix binaries (the C compiler being a notable except), I thought I'd turn my hand to some reconstruction work. You know, the whole reason that I started this thing up.

System Calls

There's no easier code to write in Unix that does something useful than interfacing to system calls. These calls are usually 'load these registers (or this block) with those values and trap to the kernel'. Venix is no exception to this rule.

Venix has about 60 system calls it implements. They are so regular I thought I'd be able to write a generator for all the system calls, except maybe pipe. I thought this because FreeBSD generates the glue for all its system calls, though pipe has been an exception because it needs to return two values.

Little did I know there's really 74 .s files associated with the system calls. Only about 50 of the system calls are regular. The rest are irregular in a number of different ways.

Return Values and weird pointers

There are 5 system calls that require some special handling just because the return values are weird. These include time(2) which you pass a pointer to a long to put the value of time into (that's done in userland in Venix, rather than with a copyout call that other systems use). This mirrors what's done on the PDP-11, so it's no real surprise here. pipe(2) also falls into this category. You pass it an array, and the system call caller is responsible for stuffing the data back into this array. wait(2) is the same way.

stime(2) is similar, but in the opposite direction: It loads the values from a pointer into a register rather than having the kernel just copy that value into the kernel. That's weird because plenty of other things do it with pointers.

Variations on a theme

dup(2) can be generated automatically, but dup2(2) can't. dup2 is the variant where you set the new fd rather than allowing the kernel to pick one for you. Rather than having two system calls, you just add 64 to the fd and call dup. What's weird is that dup(2) is documented to take one argument, but the dup.o file, when disassembled, clearly passes two arguments. This means that there's tack garbage for one of them (a bug!). dup2(2) makes sense to pass two args, but dup(2)? Really? So that's the first bug I've found in the generated code.

brk(2) and sbrk(2) are similar, but they also have to keep track of where the actual break point in the address space is.  And it's a little weirder than that for some NMAGIC binaries that put the stack at the top of memory (right?) and have the heap grow between the top of bss (ebss) and the bottom of the stack. I suspect bugs in this area of my emulator since the C compiler is one of the few binaries with this sort of odd arrangement.

Then there's the exec(2) family of calls. They are all a bit different in terms of calling them, but in assembler you can morph them all into one system call. Sweet, eh? Turns out to be hard in 'C' to pull this off portably, but this predates those worries. Both PDP-11 and the 8086 port use this same trick.

4 arguments are hard.

There's 3 system calls that have 4 arguments. 2 (lseek(2) and locking(2)) do it one way, the other does it a second way (ptrace(2)). And the 2 that appear to do it right (in that it follows the same convention as the 3 arg call) have the same bug.

Most of the 3 arg calls look something like the following:
_read:
        push    bp
        mov     bp,sp
        mov     bx,#3
        mov     ax,*4(bp)
        mov     dx,*6(bp)
        mov     cx,*8(bp)
        int     0xf1
which is simple and straight forward. The 2 call variants don't load anything into cx, the one arg calls skip dx, etc.

But the 4 arg ones are weird. In that they are really 3 args with one of the args being too fat. Let's look at lseek, which has three args, but one of them is a long. lseek takes an int, a long and a second int, let's see how it does it's thing:

_lseek:
        push bp
        push si
        mov bp,sp
        mov bx,#19
        mov ax,*6(bp)
        mov dx,*8(bp)
        mov cx,*10(bp)
        mov si,*12(bp)
        int 0xf1
Notice how the offsets are all buggered up. This code will work, and the offsets are correct, but only because the code botched the preamble to setup bp so we can access the args. The push si interrupts that, so bp has the wrong value, so you have to offset everything by 2. Another bug found through the powers of disassembly. So I mostly generate lseek, then hand tweak it to make sure it's the same file.

Signals

Then there's signals. They are hard, and they do all this wonderful weird stuff with trampolines and the like. This one file is by far the longest one.

Getting the same .o

A few years ago, I found the Minix disassembler dis88 floating around. I've been steadily hacking on it to produce good quality disassembled code. It's tough, though, since there's so many different rules. As I'm doing this reconstruction, I'm learning more. I'll go into those on another post.

But to make things as testable as possible, I've created a gensys script. This generates all the system calls I can, and tries to test the ones I can't. It does this by using the emulator (86sim) to run the assembler. we then compare the disassembled output between the original and the new one and report diffs. No diffs, I'm done! Like I said, the emulator is coming along nicely.

I tried running this same process on the Rainbow, and it was so slow I could only do one or two items in the time it took me to iterate through 5 or 6 different problems and rebuild everything.  The emulator is starting to save time for the investment in writing it...  We'll see if it is all worth it in the end.

20200419

Venix emulation update

April 2020 Venix update

I've had a bit of spare time lately and have refocused back onto Venix.

I've found a number of bugs, implemented fork/exec (brokenly, but well enough to mine 'cc' for what commands it's trying to do) and I have a status update.

Toolchain

The biggest bug I fixed was in lseek. I'd forgotten to translate from the emulated FD to the host's FD. When I did that, a lot of things started working, including cpp, as and ld.

This means I have almost all of the toolchian working. c0 and copt aren't working (though I don't know how to use copt, so maybe it is). That's unfortunate, but we're close.

I can compile hello world on Venix, and then snag the .s file. Once I have that, I can assemble it in emulation and ld to produce a working binary (which also works in emulation). cpp also produces the same output as running on Venix.

So this is huge since things are so much faster on my host environment.

crt0.s

OK. Now that I have a working as on my box, I thought to go about recreating the .s files likely to have been on Venix. I started with crt0.s. I was able to disassemble the old one and use that to recreate a good .s file. It assembles almost to the same binary. The only difference is in the unused parts of the relocation entries. I don't know why that is, but it doesn't seem to affect things since I can generate the same hello world executable from either the stock crt0.o or the one that I've recreated. I'm guessing the differences don't matter.

Next Steps

I need to do a reorg to get proper fork/exec behavior. This will involve factoring out the memory, register sets and fd tables to their own context while retaining a process table, etc in the main application. Once I do that, I can make fork/exec work properly and maybe get cc working finally.

I need to fix sbrk for the c0 binary. It loads without a stack size, so that puts the stack at the end of memory. The program itself takes up 18k of text 59k of data/bss, which leaves just 5k for combined heap / stack. Emulated sbrk doesn't check for collisions, so c0 may fail due to not properly failing an sbrk request. There may be other details.

I should pass `basename $0` as arg0, rather than the whole path. This will save ~40 bytes on the initial stack.

I need to create the notion of a prefix directory so we search there first for existing files. This will allow cc to work not in a chroot since I've created a virtual chroot of sorts. Linux emulation on FreeBSD does this as well.

I should add a proper command line parser.

Finally, I'm going to start plowing through all the system calls to at least get those going in the restored libc sources.

Final Words

The project has made some nice progress and is coming along nicely. I've also done some investigations with using pcc to see if it generates better code or not. No assembler or loader appear to be around, unless I've missed something that Minix uses (which I may have). I do know it doesn't generate code that the Venix as(1) program can eat. Maybe I'll have to try creating a Venix back end... There's also a ia-16 project based on gcc6 that I have generating code, but it's ELF based so there'd be some work. I have it building on FreeBSD and it seems to be decent.

Finally, I went nuts looking for old versions of Kermit. I found a few. More on that in a different blog.