20200806

Recovering 2.11BSD, fighting the patches

Recovering 2.11BSD, fighting the patches

2.11BSD was released March 14, 1991 to celebrate the 30th anniversary of the PDP-11. It was released 15 months after 2.10.1BSD was released in January 1989.

2.11BSD was quite the ambitious release. The high points include:
  1. The kernel logger (/dev/klog)
  2. The  namei cache and argument encapsulation calling sequence
  3. readv(2)/writev(2) as system calls rather than emulation/compatibility routines
  4. Shadow password file implementation (the May 1989 4.3BSD update)
  5. A TMSCP (TK50/TU81) driver with standalone support (bootblock and standalone driver)
  6. Pronet and LH/DH IMP networking support
  7. The portable ascii archive file format (ar, ranlib)
  8. The Unibus Mapping Register (UMR) handling of the network was rewritten to avoid allocating excessive UMRs.
  9. The necessary mods to the IP portion of the networking were made to allow traceroute (which is present in 2.11BSD) to run.
  10. Long filenames in the file system
In addition to a completely new syscall setup. Item 10 required a full new install with restoring from a back up tape. Item #7 would be a thorn in my side.

However, following on the heels of 2.10.1BSD as quickly as it did, there's a number of changes that followed quickly after the release. So, by November of 1994 195 patches had been released for the tree. Well, there were more patches than that (patches 2 and 3 had several parts, and my work has uncovered a number of hidden patches along the way, but more about those later).

The Problem(s)

Well, if we have patch 195, and all 195 patches, what's the problem? Why can't you do a simple for loop and patch -R to get back to original? And for that matter, why were no copies of the original saved?

Turns out the root of both of these problems can be summarized as 'resource shortage'. Back in the day when this was released, 100MB disks were large. The release came on 2 mag tapes that held 40MB each. Saving a copy of these required a substantial amount of space. And it was more important to have the latest release, not the original release, for running the system. It was more efficient and better anyway.

In addition to small disk space, these small systems were connected via USENET or UUCP. These connections tended to be slow. Coupled with the small size of the storage on the PDP-11s running 2.11BSD, the patches weren't what we think of as modern patches. The patches started before the newer unified diff format was created. That format is much more efficient that the traditional context diffs. In addition, compress(1) was the only thing that could compress things, giving poor compression ratios. The UUCP transport of usenet messages also mean that the messages had to be relatively short. So, this mean that the 'patches' were really an upgrade process, that often included patches. But just as often, it included instructions like this from patch 4:
Fix:
        Apply the patch "/tmp/dif" (multiple files updated by it) and
        install the new scripts ":splfix.mfps" and "splfix.movb+mfps".
which included a shar file that needed to be run through /bin/sh (or unshar, if you were lucky). You then needed to follow the instructions, which included patching. Patch 14 illustrates the problem, it includes the instructions that say to run /tmp/c created by the shar. It looks like this:
#! /bin/sh -v
cd /usr/src/usr.bin
rm lint/*
zcat PORT/lint.tar.Z | tar xf -
cd lint
rm :yyfix llib-*
cp -p /tmp/libs libs
So this is a very efficient way to remove files. Often the diffs were larger than the original files, so it was more efficient to remove and replace. But this presents two problems: what files were there before? And even in this case, what files were removed? These operations were not uncommon, and destroyed information. It's a bit like running the sausage mill backwards to rebuild the pig.

Another problem for this project is phantom patches in the first 80 patches. A number of changes appear only in the patches that come with the catch-up kit. These aren't a surprise, as a number of patches going backwards had unexpected slight line number anomalies in them. After patch 80, patch generation appears to be better managed, but at the time of this writing, the author hasn't tried to reapply all the patches to get back to the p195 tape from the reconstructed release tape. The author also has not audited the comp.bugs.2bsd mailing list between the March 1991 release and the November 1994 p195 tape. Given the surprises so far (tftp and tftpd were updated in May 1991, patches were posted in comp.bugs.2bsd, but no formal patch was released and this change does appear in the catch-up patches), additional discoveries like this cannot be ruled out.

The Path Forward

Taken at face value it's hopeless. There's no way to definitively reconstruct the lost information. But there's some glimmers of home. First, there weren't a lot of files deleted like this. lint, pcc, ld, ranlib and a few others were removed. However, what's missing can be rather critical. Next, we have both the prior release (2.10.1BSD). A good first approximation of what is there in the release will be what's in the last release. Many files aren't changed often in the 2BSD. So we have some data present there, even if some was destroyed.

There were also a number of patches. These can provide data about the missing files. There's all the patches in the 2.11BSD series, as well as a number of patches posted to comp.bugs.2bsd after 2.10.1BSD was released. We know from the release notes that these were included in the 2.11BSD release.

Next, we have 4.2BSD, 4.3BSD and even 4.4BSD and the SCCS files that go along with them. This means that we can see changes that were going on in the 4BSD series, and we know much code came from there in 2.11BSD, both before and after the release.

Next, we have dates. We are fortunate that all the files are dated in the tree. This means we have a way of testing if we've missed anything in our reconstruction. If we reconstruct something that's dated after the release of 2.11BSD, we have a very good reason to believe that we've missed something.

There was a catch-up kit that was produced around patch 80. It contained a number of commands to re-removed what should have been removed, as well patch everything up. This gives us a second way of checking: can we apply this kit to get to patch 80? Does it work? And if we replay the individual patches we should get the same thing. And it should be the same as we got unapplying the patches back from 195. A word of caution: as with all other things around this project, we must always be open to the possibility of a mistake in the catch-up kit. There are at least a few files that it tries to remove that aren't necessarily there.

Finally, we know that the system was released as binaries. Which means that the system built. So we can test the reconstruction by compiling it. We should be able to build the final system. But we need to do that with the final system, so some way must be found to bootstrap it for this test. I'll talk more about that in another blog.

The Slog Backward

The slog backwards proceeded with fits and starts. For the first 20 or so patches (that is patch 177-195), it was fairly straight forward. I was able to use them to develop the framework to proceed. It wasn't until I hit the a.out binary format changes that I had problems (the first, looking back, of these is 176).

The Framework

I settled on a fairly simple framework going backwards. There are three types of patches in the archive. First, there are the pure patches. These have a simple patch set to apply, and maybe some verbal description of commands to run. Often, these commands were just the minimal amount to rebuild so you didn't have to suffer through a day's long compile. Sometimes, they included sneaky instructions. The patches 89-99 all contain the following text:
The /GENALLSYS script is obsolete and has been removed
so in addition to applying the patches, the script needed to be removed. To complicate things, some patches were generated from the root directory, while others were generated in the leaf node. To this end when the script detects a simple patch it tries to locate a hints/patch#.setup script. If one is found, then that script is sourced. Setup scripts are expected to be idempotent so we can reuse them for the forward trip. They all do a cd to where the patch was generated from. After the setup, if any, the mk211bsd script will apply the patch.

However, when there's other things to do, that won't work. So, there's a second way to assist. If there's a hints/patch#.unpatch, then that will be run and it is responsible for moving the patch backwards. About 20 patches need this assistance. For example, patch 118 replaces ucb/Mail with a port from 4.3. So, I need to reconstruct what was there. In this case, I copy the ucb/Mail files from 2.10.1 and then check for patches in comp.bugs.2bsd (there aren't any) and in the patch stream (patch 10 tweaks it). So, I have to reapply all those patches in the unpatch script.

The second type of file is the shar file. Here, one has to run the shar file through /bin/sh (or unshar) and use the extracted files to do something. Often times there's a short script to run and a patch to apply, but other times it's more complicated. In these cases, you must provide an hints/patch#.unshar file. This file takes care of unsharing things, and running whatever you need to do to unapply it. This varies a lot. In some cases, you extract a patch (or set of patches) and reverse apply them. In other cases, you need to remove files. In still others you need to snag them from 2.10.1 and touch them up (like we did for ucb/Mail).

The final type is a tar ball. These files are treated like patches, with the usual unpatch protocol. However, the unpatch program script has to know how the tar file encoded its patches.

Interesting Patches

There's 195 patches (and a few sub-patches). Describing all of them would be quite tedious. Only about 80 hints are needed (and a few additional files for reconstructed fixups). I'll describe some of the more interesting ones briefly here. Most of the interesting ones are in the last 50 or so. And by interesting, I mean ones that destroyed information such that it had to be reconstructed or that otherwise affected the reconstruction.

Patch 185 replaces m4, so we have to snag the one from 2.10.1BSD to replace it, as well as unapply a patch in the shar file. Patch 115 did a fixup on the Makefile, so we have to reapply it so that patch 115 will unapply correctly.

Patch 184 fixes the makefiles to use pipelines instead of temporary files now that the assembler can cope with pipelines. At the same time, it removed a few obsolete compatibility system calls that need to be added back from 2.10.1BSD. Adding them back messes up the directory times, which make depends on, so workarounds have to be deployed in build2.sh, the bootstrap script (make in 2.11BSD assumes times in the future mean the directory is up to date and doesn't need to be rebuilt, while newer makes will ignore that time and descend into the directory always).

Patch 180 updates symcompact.c and strcompact.c by replacing them entirely. These files weren't in 2.10.1BSD, but came in with patch 172, so needed to be recovered from there. It also adds symdump.c which needs to be removed. There's no data loss here, but it illustrates the need to look at all the patches to see what might need to be recovered (and what doesn't).

Patch 178 removes an unused header, and fixes up another one. The headers were resynchronized in patch 175, so a small fixup is needed here. Need to investigate this, and other 'fixup' patches. One unknown is whether 2.11BSD shipped with the unsynchronized /usr/include and /usr/src/include or not. The reconstruction tries to get the right ones in each place, but uses the /usr/include ones as the ones in /usr/src/include when we can't otherwise reconstruct.

Patches 158-176 remove the 8 character limit to symbol names in programs. These are quite disruptive, but for the most part just large patches. ranlib(1) was grabbed from 2.10.1 and patched to cope with the new archive format to undo these patches. ld(1) needed similar work (and once I started rebuilding things, I had to fix so it would compile). It also needed some fixes from comp.bugs.2bsd. Initially, I'd kept them separate, but I eventually needed to merge them together. ranlib(1) and ld(1) represent the largest effort in retro-programming to reconstruct (code was cut and pasted from other programs that survived and minimally changed to be functionally correct).

Patches 151 and 152 rework the assembler. A number of patches to as were posted to comp.bugs.2bsd that needed to be applied to 2.10.1's assembler to make things work out. A couple of minor tweaks were needed after all these changes were applied, though, to make the patches work out.

Patch 142 unapplied the patch, but then needs to unapply changes to all the kernel config files. This a recurring issue for the generated files, so some fancy scripting is needed to apply them to all the kernel config directories. Patches 124, 121, 93 (which has to remove some directories), 84, 83, 72, 42, 36, 4, and 2. There's a number of issues with patching the kernel, not least was that overlay structures were often hacked on the fly and preserving them is hard. The restoration has only tried to get GENERIC and possibly KAZOO kernel configs correct, and the others it has largely ignored. I'll do a full blog on all the issues to detail what is or isn't recovered.

Patch 132 brings in named, so it needed to be recovered from 2.10.1, plus patches from patch 106.

Patch 123 is actually missing a part. It updates the documentation for how to install the system. This means we need to apply some kind if fixup. But all we know for sure is the SCCS id changed, so that's all we fix up. If I find more data to support other changes, I'll update to include that, but so far nothing has surfaced.

Patch 118 replaces ucb/Mail with the one from 4.3BSD. Revert to 2.10.1BSD's version and apply a patch from patch 10. There's no publicly posted patches to 2.10.1BSD for ucb/Mail.

Patch 80 is the catch-up kit patch. A lot of files were added from USENET. Undoing it is straight forward, making it go forward will tough, but we have the files at least. This is both a blessing and a curse to the project. On the one hand, we have a second way of cross checking things. On the other, the cross checks reveal many problems, including updated kermit being missing entirely (though, to be fair, /usr/src/new and /usr/src/local weren't considered part of the system until patch 195, so prior to that patches to this part of the tree were irregular).

Patch 40 is an announcement of corrupt files in the original release in /usr/games. I've chosen to restore them to what was intended for the original tape.

Patch 17 massively reworked pcc. It's not quite right in the reconstruction yet, however, since the 'redo catch-up kit' test fails to delete some expected files, what shipped in 2.11BSD isn't quite right. It may never be right, and that may not matter since it didn't work anyway.

Patch 14 fixes lint. It was totally broken in 2.11 (but worked in 2.10.1). Undoing it may also be incorrect for a similar reason as patch 17. Since neither one of these worked, it may be quite hard to know what 2.11 shipped with. A case could be made, though, that it doesn't matter...

Patch 2 removed tmscpboot.s. The rest of the patch undoes easily, but tmscpboot.s was new in 2.11BSD. We know it is derived from tkboot.s which comes from Ultrix-11, but I've not yet been able to tweak it to work yet...

Current Status

At this point, I've reconstructed a possible release tape. A series of them, actually. All of them fail the tests set forth above (though in fewer and fewer ways each iteration). I'm able to use the different tests to suggest fixes. I've recovered 5 lost patches (which I write about elsewhere). There's at least 5 more I believe based on the attempts to apply the catch-up kit. I discovered the first 5 based entirely on bad dates. The current reconstruction is decent (maybe 30-40 files still aren't quiet right, and a similar number are reconstructions that work, but it's impossible to know if they are missing bug fixes). The tmscpboot stuff is still missing, and there's a number of minor fixups that need to be reconciled with the known missing patches. All very careful, detailed work, so it may be some time to work through them.

No comments: