20211002

Spelling Fixes -- Some Advice

 Some Thoughts on Spelling Fixes

I've been looking at a number of FreeBSD pull requests lately. Many of them are spelling fixes. I'd like to offer some suggestions for people submitting them. The same advice applies to typo fixes and grammar corrections.

  1. Keep number of changes small.
  2. Use separate commits per directory
  3. Use descriptive commit messages
  4. Set your email correctly
  5. Don't correct code

Keep number of changes small

When submitting like this, limit the number of changes to 10-20. More than about 15 changes becomes hard to review. Every single change has to be verified for correctness, and having too many will make your pull request more likely to be overlooked.

Use separate commits per directory

When submitting a number of changes, do one change per subdirectory. When subdirectories are nested, it's OK. For example, if you have changes to bin/ls and bin/rm, do two commits. But if you have changes to both usr.sbin/wpa/wpa_priv and usr.sbin/wpa/wpa_cli, then it's OK to do those as one commit.

Use descriptive commit messages

The commit message "fix spelling" is too generic to be useful. If you are fixing just one word, a better commit message would be "Spell interrupt correctly". It also allows the reviewer to make sure that you are changing the right thing. And it also gives enough detail that people skimming the logs don't need to look at the diffs to know what changed. If you are fixing multiple spelling errors, then a generic message is more appropriate.

Set your email correctly

When you push your branch to github, make sure that you've set the email you want in the commit message. It saves a lot of time. If you are using github's editor, make sure that your profile has this information set correctly.

Don't correct code

Don't make spelling corrections in code variables, #defines, etc. These will likely be ignored. The risk from a comment or an error message being corrected is tiny, while code changes could be attempts to subtly change the system or introduce security impacting issues through a supply chain attack. It's better to work with someone in the community to get these corrected than to correct them via a pull request.

20210828

A new path: vm86-based venix emulator

 Venix Emulator Update

It's been a while since I've had time to work on the Venix emulator. When I set it aside over a year ago, I'd taken it as far as I could with the 8088 emulator I'd found online. It had no FP emulation and there were
a number of things misbehaving that I couldn't quite get right. And exec was proving hard to implement. Despite being written in C++, the original emulator resisted my efforts to make multiple instantiations.

So I set it aside last May, thinking I might get back to it when the qemu bsd-user changes FreeBSD has done have been upstreamed.

The first of August I took some time off from work and got the bsd-user changes in shape to upstream. Well, the first 10% of the changes that were the hardest since it was replacing what was there with something that minimally worked. This helped me learn qemu's x86 CPU much better and it got me thinking that qemu's user-mode stuff might be the way to go.

About this time I also found a vm86 test program in the FreeBSD tree. So I got to wondering, could I do a vm86 implementation of Venix?

So, I stole the bulk of my old 86sim-based Venix implementation, installed a i386 VM using bhyve on my FreeBSD/amd64 box and write a quick little test program. The test program worked, so in a fit of "why not give this a try" I ported the pcvenix.cc from 86sim to being driven from SIGSEGV in vm86 mode. Hello world quickly worked.

Enter VM86VENIX

So, I reworked fork and exec and the a.out loader a bit. I was able to get the C compiler going in this new setup. The 'cc' command is just a fancy script that strings together the pre-processor, compiler, optimizer, assembler and linker. Except on Venix it wasn't a shell script I could hack to run natively on FreeBSD. It was this weird binary that did all the forking, execing, redirecting, etc inline. More on that in a minute.

So vm86 mode is a special mode in 32-bit CPUs that lets you execute old 16-bit code in the context of a normal process. It's super easy to setup, but often of limited use.

Thankfully, the i386 ELF designers thought ahead. The starting address for binaries in ELF is this weird 0x00401430, which is just above 4MB. This means that one can map anything into low memory and it will work. FreeBSD has a security stop on mapping anything at location 0, however, but the rest of the first 4MB is available. The old 8086 could only see the first 1MiB of that, but since Venix binaries are at worst 'small-mode' the largest address space for a process was 128kiB. Plenty of room to find a place to map it.

So, I wrote a loader that would load the old Venix a.out binaries into this space. Or rather I hacked the loader I already had to do the mapping. I was able to reuse all the loader code from the 86sim-based emulator I had before.

I then shamelessly stole the setup code from the FreeBSD vm86 testing binary, which was little more than establishing signal handlers and zeroing the context and setting up a stack and IP as well as the segment registers. With that in hand, I was able to use sigreturn() to set the processor flags such that it would jump to where I wanted to go in the Venix binary. I'd been afraid of vm86 mode after reading through doscmd years ago, but there was no need for the fear: all the cruft in doscmd (I reread it after this) was the accumulation of cruft over the years for DOS, BIOS and other weirdness that evolved around the IBM PC, XT, AT and the plethora of clones which had nothing to do with vm86 mode, per se.

Every INT xx instruction would trap to the kernel. The kernel would note down the registers and send the process a SIGSEGV for these accesses. I was able to then look at CS:IP in the mcontext and decode the instruction that faulted. INT xx is encoded as the bytes 0xCD 0xXX for almost all values of X (INT3 has its own opcode 0xCC). In the signal handler, I could decode this opcode. I knew from past work that INT 0xf1 was the system call, so I hooked up the old venix system call handlers to this and I was back to where I was with 86sim. Further in fact because floating point worked.

Signal handlers have an implicit sigreturn with the context passed to the signal handler at the end. I needed to skip over the faulting instruction after performing the system call, and the process would then resume executing in 16-bit mode after the INT 0xf1. This was straight forward to implement.

I decided to implement fork as a real fork. It would copy the address space, all the open FDs, etc. This proved to be an easy way to cheat so I didn't have to create a context object and use threads to simulate processes.

Exec proved to be just a call to my loader that started all this off. The only thing I've not implemented is close on exec, but the rest was easy.

With these implemented, I could run the C compiler's cc command and generate trivial binaries. But if I needed to include anything, it would fail. I created a VENIX_ROOT env variable. For all opens, it would try VENIX_ROOT / name and then just the plain name if the name arg to open started with a '/'. This was enough for the preprocessor to include .h files and for the canonical hello world to build.

There was just one vexing problem: cc -o hello hello.c worked. However cc -O -o hello hello.c didn't.

Tracking down a silly bug

Well, there was another annoying thing: /bin/sh didn't work. I traced that to the fact I've not implemented passing an environment to the processes, and /bin/sh was choking on that. OK. Fine. I'll implement that later. /bin/csh worked, however, so I was happy. My happiness was short lived, alas, because I'd run a  command and I'd get weird output:
% ls
ls: Sig 44
%

 That's weird. So I added tracing. I tried the cc command, which in this version is a simple program that orchestrates all the different parts of the compiler using fork, exec, dup and the strategic close/open pair to setup stdin etc. All the tracing looked good as well, we'd see something like:

123: fork() 124
123: wait()
124: ... lots of stuff
124: exit 0
123: wait pid 124 status 0

 and then 123 would proceed to delete all the temp files and exit. It was like it was getting an error, despite its children exiting without an error. Every time it was like this, but only when I ran the optimizer. When I'd re-run the ls test, I'd get different Sig values as well.

Available 8086 compilers in 1985

So, I'd assumed that the compiler was derived from the V7 compiler. However, a number of hints in names suggested it wasn't. And the Venix manual had way more exceptions for 8086 than for pdp-11 when it described the options and operation for Intel, so I assumed it wasn't V7 derived. So I started hunting around for C compilers.

MIT produced one at the time for the PC. It could run on the VAX and generated a.out binaries that a conversion program would convert to .COM or .EXE files. I thought this might be where the Venix compiler came with. But after playing around with it for a few hours it was clear it wasn't. First, it had a shell script cc, not a program. Second, it had a number of VAX specific instructions sprinkled inline, and that wasn't going to run on the Rainbow :).

I took a look at the portable C compiler. This I think was the real genesis of what was shipped with Venix, but old versions that support 8086 are hard to come by, even in the successor portable C compiler project that's been going for 20-odd years now. I got the cc program from that compiling with Venix. It was a bit easier to fuss around with than the V7 one (but the V7 one would be close enough to hit the bug I found out later). Looking at the old System III sources that one can find on the internet, there's a copy of the portable C compiler there, rather than the C compiler from DMR as you'll find in the 7th edition. I used the cc program from there to try to build things. I hit the jackpot: it failed faster!

So I instrumented the pcc program and discovered that the status printed after wait() in the program didn't match the status that I'd returned from the kernel. Progress!

The wait(2) system call...

I'd implemented the wait(2) system call as part of getting fork/exec working. I did it from the VENIX manual that's available online. I looked at the first part of the manual:
SYNOPSIS
        wait(statusp)
        int *statusp;

which shows wait taking a pointer. So I assumed this was what the kernel received in the first arg that's passed into the kernel (DX). I assumed that DS:DX pointed to an integer where I'd return the status. Most of the time, DX was something that looked like a pointer on the stack, so I just did a copyout. The problem is that's not right.

So, I took another look at the manual. At the end of the man page I saw:

8086      BX=7
              int 0xf1
              AX = pid of process
              DX = status

and then it hit me.  DX isn't a pointer to anything. After int 0xf1 AX is the pid of the process (the normal return value) and DX is the status. Disassembling wait.o confirmed this:if statusp is not 0, dx is copied back to *statusp. Doh! The classic pointer vs value mistake. Fixing my implementation to take out the copyout and replace it with setting DX in the processor context made pcc work. And cc worked. And the silly test programs I wrote in the middle to debug things worked. Woo Hoo!

Once I fixed this, all weird combinations of compilations suddenly worked for me. I could optimize, strip, etc and there were no oddities.

Conclusion

It helps to read the manual carefully!

I need to try to build the system. There's shell scripts to do that that don't depend on environment variables working if run natively, so I'll see if they work and see how much of the system I can generate via this route. Stay tuned.

My TODO list still contains getting env working (I don't think it is hard, but I think I need to filter things because my default env is larger than the stack on these old x86 machines). I also need to look at rebasing my emulator as a *-user qemu emulator (even if they don't take it upstream). Maybe even add PC/IX and Xenix/86 support as well so that other researchers can play around with this.

20210501

On Updating QEMU's bsd-user fork

 QEMU bsd-user

bsd-user is a 'user mode' emulation tool. It emulates FreeBSD's system calls on FreeBSD well, and $OTHER-BSD system calls elsewhere to varying degrees of success. It's primary mission has been to build FreeBSD packages using user-mode emulation to speed the process over using system mode. It speeds things up because the compilers and other huge CPU hogs can be built natively.

Of late, it has languished, A few years ago, I started to rebase it to the then-tip of qemu in the hopes of upstreaming. At the time, we'd forked off of qemu 1.0 or so. During this time the then-current qemu was 4.0. I got things rebased to around 3.1 before running out of steam. Rebasing patch trains of 1000 commits is hard, and trying to selectively squash commits wasn't much better.  So that's where things stalled. All bug fixes to qemu bsd-user had gone on in our own private branch.

Recently, I'd been asked about it again, so I started to dust things off. I got my name listed as the maintainer so I could push patches upstream a little more easily, and then started contributing by doing basic cleanup in the hopes of redoing 'logically' what had been done to split things up. Those efforts too have come to naught.

So, in one final act of desperation, I copied the 3.1-rebased bsd-user directory directly into qemu 6.0 and got it building. There were lots of little changes I needed, but nothing super huge. I've not done extensive testing, but the basics seem to work.

Trouble was, that diff was 35k lines. Too big to upstream in one go. So, I set out to see what could be done.

First, I labeled the 'yeet it up  to current' branch as 'blitz'. It's a fast hack to get something we can move forward on. In the future, releases and such will be cut from there until I can get it into the upstream tree. Blitz is the German word for 'fast' and has connotations of doing something quick and dirty well enough to move on.

Next, I created another branch from 6.0 called 'kaizen'. Kaizen is the Japanese business practice of continuous improvement. Find the most painful or most expensive part of your business, fix that and iterate. This branch I'll be putting 'diff reduction' patches for upstream, as well as start to move things over from blitz, starting with the loader. I've disconnected everything except x86 from this branch. In upstream qemu, bsd-user core dumps right away, so I'm not turning off anything that's working.

So the plan is that I'll focus on keeping x86 buildable, and get it working as quickly as I can and then add all the system calls from the blitz branch. I'll add them one group at a time, and do the reorgs and new file creation as well. I'll get these reviewed and upstreamed. Once all the system calls are in place, I'll start adding additional architectures as well, getting those patches reviewed too. Finally, I'll get the NetBSD and OpenBSD hosting stuff updated, as well as take a stab at updating their system call tables and seeing how well it works. The work that Stacey Son and others did tried to preserve all this, but it's been a long time since any of it was tested.

I have an agreement in principle with the qemu upstream to do all this work. So approximately monthly, I'll be landing a new branch with the latest diff reductions. I'll rebase kaizen and blitz after each drop and before I upstream. For the moment, this work will go into my gitlab fork (since it has all the CI setup on it) and from time to time I'll publish back to the github qemu-bsd-user repo. Be advised: both the blitz and kaizen branches will rebase often, so you may need to do weird things to update. Though, if you are tracking them with changes, please be in touch so we can coordinate work.

With luck, by this time next year, the kaizen and blitz branches will be nothing but a distant memory and we'll be on to keeping things up to date in qemu head, maybe with doing some refactoring with linux-user where it makes sense.

20210416

Customizing Emacs for Git Commit Messages

 Customizing Emacs for Git Commit Messages

I do a lot of commits to the FreeBSD project and elsewhere. It would be nice if I could setup emacs in a custom way for each commit message that I'm editing.

Fortunately, GNU Emacs provides a nice way to do just that. While I likely could do some of these things with git commit hooks, I find this to be a little nicer.

First up, we need to do something when we pull up the commit message to edit. By convention, git uses the file COMMIT_EDITMSG, though the exact location of this file depends on where the git tree you have checked out is. Emacs has a hook that's run when emacs starts editing a file called 'find-file-hook'. So:
(add-hook 'find-file-hook 'imp-git-hook)

will do the job nicely. Next up, we need to define this function to do something useful when run (indeed, failure to define it will result in an error when visiting all files).

(defun imp-git-hook ()
  (when (string= (file-name-base buffer-file-name) "COMMIT_EDITMSG")
    (freebsd-git-setup))) 

buffer-file-name is a local variable that has the full path name to the buffer, if any. file-name-base is like basename(1) in that it returns the name of the file without the extension, rather than its whole path.

But what is 'freebsd-git-setup'? It's a little function I wrote to set the fill column to 72 (I usually have it at 80, but that produces commit messages that are just a bit wide when git adds leading spaces) and adds a sponsorship tag to my commits:

(defun freebsd-git-setup ()
 (save-excursion
  (setq fill-column 72)
  (if (re-search-forward "^Sponsored by:" nil t)
    t
   if (re-search-forward "^\n#" nil t)
    (replace-match "\nSponsored by:\t\tNetflix\n\n#")))))

But it only adds the sponsorship tag if one isn't there yet.

This is a proof-of-concept function. No doubt it will have to evolve over time. The project adds 'Differential Revision' tags as the last tag in a commit message because differential requires (required?) that. And it wouldn't be good to add this for non-FreeBSD commits, so I'll have to find a way to filter for that... But I thought this would be useful, if nothing else than to my future self as a roadmap for how to do things like this.


20210131

EPSON QX-10 20MB Hard disk

 EPSON QX-10 20MB Hard Disk

I've been looking for some DEC Rainbow 3rd party hard drives of late. QCS (Quality Computer Services) made an external hard disk for the DEC Rainbow. There's advertisements in Digital Review and other trade magazines at the time. It uses a SASI interface, and likely had a DEC Rainbow specific add-in card that they rewarmed from other designs...

One recently came up for sale on E-Bay. I thought I'd buy it to check it out. There was no interface card with it, alas. But it was a box with a WD1006 SASI to MFM controller in it that could handle two different drives. The drives were LUN0 and LUN1.

SASI, for those that don't know, pre-dates SCSI-1. It's kinda sorta SCSI-1 compatible, if you turn off parity, don't allow the drive to signal attention and restrict yourself to a subset of commands. It also doesn't have INQUIRY so you kinda have to know the size of the drive before hand. Most SASI controller drivers of the day wrote a label to drive with this information since it was always possible to read LBA0 w/o knowing anything else about the drive. Some controllers had ways to at least return a size, though that varied a lot...

Since SASI is kinda hard to interface to modern SCSI controllers, I used a MFM reader board I got from David Gesswein over at https://www.pdp8.net/mfm/mfm.shtml to read the drive. I had hoped to find that it was from an old Rainbow and I'd complete my collection of drivers for third party drives...

Much to my surprise, I was able to read it without any errors until it hit the manufacturing tracks (480-489). I pulled a full image, then downloaded it to my FreeBSD box for analysis.

hexdump -C told me it was a CP/M disk (I recognized the directory format). It was clear right away it wasn't a DEC Rainbow disk, however.

The first thing I noticed was the "Bi-Tech Multi Drive Support V4.02" string which indicated who made the driver for it. I also noticed strings like the following
PT.COM for EPSON QX-10 PeachText 5000 date changed - 02/03/84

and similar references to the QX-10 or EPSON CP/M.

So, this was from a Epson QX-10 CP/M system. Looks to be a soft-water service company from South Bend Indiana. All their books and correspondence from the mid 1980s was on it, along with some interesting disk support software. There's even some bits of Z80 assembler, but they are too disjointed to know what they were for.

I've not been able to get cpmtools to read the disk in a structured way, however, so it's hard to share just the interesting bits. Still working on it.

If you have one of these machines, or are interested in preserving software from it, please let me and we may be able to work something out.


 

 

20201006

How to Recover From a BIOS Upgrade

Recovering From Firmware Upgrade

Recently, I booted Windows on my laptop for the first time in a while to play Portal 2 with my son. It asked me to upgrade, and I said 'sure, upgrade the BIOS.'

And then I couldn't boot FreeBSD...  The BIOS upgrade deleted all the BootXXXX variables. So it only booted Windows. I'm stuck, right? I have to download a FreeBSD image and boot off the USB drive. Or did I?

Note: Even the update program called updating the firmware updating the BIOS. BIOS is the generic term for the bit of code that runs before the OS. Sadly, it's also the term people use to describe the pre-UEFI boot environment on PCs, so it can be a confusing term to use. Firmware seems a bit better, but it's also ambiguous because different bits of hardware (like wireless cards) also need firmware loaded.

How to Recover

I was in windows. I needed to mount the system partition (EFI). So, I opened the Administrative Console and got a command prompt from there on the 'Tools' tab. This lead to the familiar C: prompt.  I have no W drive. There I was able to copy FreeBSD's boot loader like so:

C:\WINDOWS\system32> mountvol w: /s
C:\WINDOWS\system32>w:
W:\> cd EFI\Microsoft\Boot
W:\EFI\Microsoft\Boot> ren bootmgfw.efi bootmgfw-back.efi
W:\EFI\Microsoft\Boot> copy W:\EFI\FreeBSD\loader.efi bootmgfw.efi
W:\EFI\Micorsoft\Boot> 

I then rebooted from the menu.

I had remembered from my efibootmgr hacking that the boot loader was here. After I booted to FreeBSD, I was able to confirm:

% sudo efibootmgr -v
Boot to FW : false
BootCurrent: 0001
Timeout    : 0 seconds
BootOrder  : 0001, 2001, 2002, 2003
+Boot0001* Windows Boot Manager HD(1,GPT,f859c46d-19ee-4e40-8975-3ad1ab00ac09,0x800,0x82000)/File(\EFI\Microsoft\Boot\bootmgfw.efi)
                                   nvd0p1:/EFI/Microsoft/Boot/bootmgfw.efi /boot/efi//EFI/Microsoft/Boot/bootmgfw.efi
 Boot2001* EFI USB Device 
 Boot2002* EFI DVD/CDROM 
 Boot2003* EFI Network 


Unreferenced Variables:

I was then able to add FreeBSD back with efibootmgr. I mount the ESP on /boot/efi:

sudo efibootmgr --create --loader /boot/efi/EFI/freebsd/loader.efi --kernel /boot/kernel/kernel --activate --verbose --label FreeBSD
Boot to FW : false
BootCurrent: 0001
Timeout    : 0 seconds
BootOrder  : 0000, 0001, 2001, 2002, 2003
 Boot0000* FreeBSD HD(1,GPT,f859c46d-19ee-4e40-8975-3ad1ab00ac09,0x800,0x82000)/File(\EFI\freebsd\loader.efi)
               nvd0p1:/EFI/freebsd/loader.efi /boot/efi//EFI/freebsd/loader.efi
           HD(6,GPT,68f0614d-c322-11e9-857a-b1710dd81c0d,0x7bf1000,0x1577e000)/File(boot\kernel\kernel)
               nvd0p6:boot/kernel/kernel /boot/kernel/kernel
+Boot0001* Windows Boot Manager HD(1,GPT,f859c46d-19ee-4e40-8975-3ad1ab00ac09,0x800,0x82000)/File(\EFI\Microsoft\Boot\bootmgfw.efi)
                                   nvd0p1:/EFI/Microsoft/Boot/bootmgfw.efi /boot/efi//EFI/Microsoft/Boot/bootmgfw.efi
 Boot2001* EFI USB Device 
 Boot2002* EFI DVD/CDROM 
 Boot2003* EFI Network 


Unreferenced Variables:
%

Once this is in place, I needed to undo what I'd done to Windows:

% cd /boot/efi/EFI/Microsoft/Boot
% sudo mv bootmgfw-back.efi bootmgfw.efi

I was then able to reboot to FreeBSD. And fun fact: since the boot order is 0000, 0001, that means I can boot to Windows by just typing 'quit' at the loader prompt. This causes the boot loader to exit with an error, which causes the BIOS to try the next BootXXXX variable, in this case windows.

And there it is: I was able to recover my system without downloading a USB image...


20201001

FreeBSD Subversion to Git Migration: Pt 2 Primer for Users

FreeBSD git Primer for Users

Today's blog is actually a preview of a git primer I'm writing for the FreeBSD project. It covers what a typical user will need, including those relatively rare users that may have some changes to the base. Please let me know what you think, and ways it can be improved. I'm keen on especially clear and useful pointers for the topics as well.

Also note: The cgit-beta mirror mentioned below is currently for testing purposes only.

If you have a lot of suggests, you can make them directly on the original for this on hackmd.

Scope

If you want to download FreeBSD, compile it from sources and generally keep up to date that way, this primer is for you. If you are looking to do more with the tree, contribute back, or commit changes, then you will need to wait for a later blog where I cover that. It covers getting the sources, updating the sources, how to bisect and touches briefly on how to cope with a few local changes. It covers the basics, and tries to give good pointers to more in-depth treatment for when the readers finds the basics insufficient.

Keeping Current With FreeBSD src tree

First step: cloning a tree. This downloads the entire tree. There’s two ways to download. Most people will want to do a deep clone of the repo. However, there are times that you may wish to do a shallow clone.

Branch names

The branch names in the new git repo are similar to the old names. For the stable branches, they are stable/X where X is the major release (like 11 or 12). The main branch in the new repo is ‘main’. The main branch in the old github mirror is ‘master’. Both reflecting the defaults of git at the time they were created. The main/master branch is the default branch if you omit the ‘-b branch’ or ‘–branch branch’ options below.

Repositories

At the moment, there’s two repositories. The hashes are different between them. The old github repo is similar to the new cgit repo. However, there are a large number of mistakes in the github repo that required us to regenerate the export when we migrated to having a git repo be the source of truth for the project.

The github repo is at https://github.com/freebsd/freebsd.git
The new cgit beta repo is at https://cgit-beta.freebsd.org/src.git
These will be $URL in the commands below.

Note: The project doesn’t use submodules as they are a poor fit for our workflows and development model. How we track changes in third-party applications is discussed elsewhere and generally of little concern to the casual user.

Deep Clone

A deep clone pulls in the entire tree, as well as all the history and branches. It’s the easiest to do. It also allows you to use git’s worktree feature to have all your active branches checked out into separate directories but with only one copy of the repository.

git clone $URL -b branch [dir]

is how you make a deep clone. ‘branch’ should be one of the branches listed in the previous section. It is optional if it is the main/master branch. dir is an optional directory to place it in (the default will be the name of the repo you are clone (freebsd or src)).

You’ll want a deep clone if you are interested in the history, plan on making local changes, or plan on working on more than one branch. It’s the easiest to keep up to date as well. If you are interested in the history, but are working with only one branch and are short on space, you can also use --single-branch to only download the one branch (though some merge commits will not reference the merged-from branch which may be important for some users who are interested in detailed versions of history).

Shallow Clone

A shallow clone copies just the most current code, but none or little of the history. This can be useful when you need to build a specific revision of FreeBSD, or when you are just starting out and plan to track the tree more fully. You can also use it to limit history to only so many revisions.

git clone -b branch --depth 1 $URL [dir]

This clones the repository, but only has the most recent version in the respository. The rest of the history is not downloaded. Should you change your mind later, you can do ‘git fetch --unshallow’ to get the old history.

Building

Once you’ve downloaded, building is done as described in the handbook, eg:

% cd src
% make buildworld
% make buildkernel
% make installkernel
% make installworld

so that won’t be coverd in depth here.

Updating

To update both types of trees uses the same commands. This pulls in all the revisions since your last update.

git pull --ff-only

will update the tree. In git, a ‘fast forward’ merge is one that only needs to set a new branch pointer and doesn’t need to re-create the commits. By always doing a ‘fast forward’ merge/pull, you’ll ensure that you have an identical copy of the FreeBSD tree. This will be important if you want to maintain local patches.

See below for how to manage local changes. The simplest is to use --autostash on the ‘git pull’ command, but more sophisticated options are available.

Selecting a Specific Version

In git, the ‘git checkout’ command can not only checkout branches, but it can also checkout a specific version. Git’s versions are the long hashes rather than a sequential number. You saw them above in the conflict when it said it couldn’t apply “646e0f9cda11”.

When you checkout a specific version, just specify the hash you want on the command line (the git log command can help you decide which hash you might want):

git checkout 08b8197a74

and you have that checked out.

However, as with many things git, it’s not so simple. You’ll be greeted with a message similar to the following:

Note: checking out '08b8197a742a96964d2924391bf9fdfeb788865d'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 08b8197a742a hook gpiokeys.4 to the build

where the last line is generated from the hash you are checking out and the first line of the commit message from that revision. Also, a word about hashes: they can be abbreviated. That’s why you’ll see them have different lengths in different commands or their outputs. These super long hashes are often unique after 6 or 10 characters, so git lets you abbreviate and is somewhat inconsistent about how it presents them to users.

Bisecting

Sometimes, things go wrong. The last version worked, but the one you just updated to does not. A developer may ask to bisect the problem to track down which commit caused the regression.

If you’ve read the last section, you may be thinking to yourself “How the heck do I bisect with crazy version numbers like that?” then this section is for you. It’s also for you if you didn’t think that, but also want to bisect.

Fortunately, one uses the ‘git bisect’ command. Here’s a brief outline in how to use it. For more information, I’d suggest https://www.metaltoad.com/blog/beginners-guide-git-bisect-process-elimination or https://git-scm.com/docs/git-bisect for more details. The man page is good at describing what can go wrong, what to do when versions won’t build, when you want to use terms other than ‘good’ and ‘bad’, etc, none of which will be covered here.

‘git bisect start’ will start the bisection process. Next, you need to tell a range to go through. ‘git bisect good XXXXXX’ will tell it the working version and ‘git bisect bad XXXXX’ will tell it the bad version. The bad version will almost always be HEAD (a special tag for what you have checked out). The good version will be the last one you checked out.

A quick aside: if you want to know the last version you checked out, you should use ‘git reflog’:

5ef0bd68b515 (HEAD -> master, origin/master, origin/HEAD) HEAD@{0}: pull --ff-only: Fast-forward
a8163e165c5b (upstream/master) HEAD@{1}: checkout: moving from b6fb97efb682994f59b21fe4efb3fcfc0e5b9eeb to master
...

shows me moving the working tree to the master branch (a816…) and then updating from upstream (to 5ef0…). In this case, bad would be HEAD (or 5rf0bd68) and good would be a8163e165. As you can see from the output, HEAD@{1} also often works, but isn’t foolproof if you’ve done other things to your git tree after updating, but before you discover the need to bisect.

Back to git bisect. Set the ‘good’ version first, then set the bad (though the order doesn’t matter). When you set the bad version, it will give you some statistics on the process:

% git bisect start
% git bisect good a8163e165c5b
% git bisect bad HEAD
Bisecting: 1722 revisions left to test after this (roughly 11 steps)
[c427b3158fd8225f6afc09e7e6f62326f9e4de7e] Fixup r361997 by balancing parens.  Duh.

You’d then build/install that version. If it’s good you’d type ‘git bisect good’ otherwise ‘git bisect bad’. You’ll get a similar message to the above each step. When you are done, report the bad version to the developer (or fix the bug yourself and send a patch). ‘git bisect reset’ will end the process and return you back to where you started (usually tip of main). Again, the git-bisect manual (linked above) is a good resource for when things go wrong or for unusual cases.

Ports Considerations

The ports tree operates the same way. The branch names are different and the repos are in different locations.

The github mirror is at https://github.com/freebsd/freebsd-ports.git
The cgit mirror is https://cgit-beta.freebsd.org/src.git

As with ports, the ‘current’ branches are ‘master’ and ‘main’ respectively. The quarterly branches are named the same as in FreeBSD’s svn repo.

Coping with Local Changes

Here’s a small collections of topics that are more advanced for the user tracking FreeBSD. If you have no local changes, you can stop reading now (it’s the last section and OK to skip).

One item that’s important for all of them: all changes are local until pushed. Unlike svn, git uses a distributed model. For users, for most things, there’s very little difference. However, if you have local changes, you can use the same tool to manage them as you use to pull in changes from FreeBSD. All changes that you’ve not pushed are local and can easily be modified (git rebase, discussed below does this).

Keeping local changes

The simplest way to keep local changes (especially trivial ones) is to use ‘git stash’. In its simples form, you use ‘git stash’ to record the changes (which pushes them onto the stash stack). Most people use this to save changes before updating the tree as described above. They then use ‘git stash apply’ to re-apply them to the tree. The stash is a stack of changes that can be examined with ‘git stash list’. The git-stash man page (https://git-scm.com/docs/git-stash) has all the details.

This method is suitable when you have tiny tweaks to the tree. When you have anything non trivial, you’ll likely be better off keeping a local branch and rebasing. It is also integreated with the ‘git pull’ command: just add ‘–autostash’ to the command line.

Keeping a local branch

It’s much easier to keep a local branch with git than subversion. In subversion you need to merge the commit, and resolve the conflicts. This is managable, but can lead to a convoluted history that’s hard to upstream should that ever be necessary, or hard to replicate if you need to do so. Git also allows one to merge, along with the same problems. That’s one way to mange the branch, but it’s the least flexible.

Git has a concept of ‘rebasing’ which you can use to avoids these issues. The ‘git rebase’ command will basically replay all the commits relative to the parent branch at a newer location on that parent branch. This section will briefly cover how to do this, but will not cover all scenarios.

Create a branch

Let’s say you want to make a hack to FreeBSD’s ls command to never, ever do color. There’s many reasons to do this, but this example will use that as a baseline. The FreeBSD ls command changes from time to time, and you’ll need to cope with those changes. Fortunately, with git rebase it usually is automatic.

% cd src
% git checkout main
% git checkout -b no-color-ls
% cd bin/ls
% vi ls.c     # hack the changes in
% git diff    # check the changes
diff --git a/bin/ls/ls.c b/bin/ls/ls.c
index 7378268867ef..cfc3f4342531 100644
--- a/bin/ls/ls.c
+++ b/bin/ls/ls.c
@@ -66,6 +66,7 @@ __FBSDID("$FreeBSD$");
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#undef COLORLS
 #ifdef COLORLS
 #include <termcap.h>
 #include <signal.h>
% # these look good, make the commit...
% git commit ls.c

The commit will pop you into an editor to describe what you’ve done. Once you enter that, you have your own local branch in the git repo. Build and install it like you normally would, following the directions in the handbook. git differs from other version control systems in that you have to tell it explicitly which files to use. I’ve opted to do it on the commit command line, but you can also do it with ‘git add’ which many of the more in depth tutorials cover.

Time to update

When it’s time to bring in a new version, it’s almost the same as w/o the branches. You would update like you would above, but there’s one extra command before you update, and one after. The following assumes you are starting with an unmodified tree. It’s important to start rebasing operations with a clean tree (git usually requires this).

% git checkout main
% git pull --no-ff
% git rebase -i main no-color-ls

This will bring up an editor that lists all the commits in it. For this example, don’t change it at all. This is typically what you are doing while updating the baseline (though you also use the git rebase command to curate the commits you have in the branch).

Once you’re done with the above, you’ve move the commits to ls.c forward from the old version of FreeBSD to the newer one.

Sometimes there’s merge conflicts. That’s OK. Don’t panic. You’d handle them the same as you would any other merge conflicts. To keep it simple, I’ll just describe a common issue you might see. A pointer to a more complete treatment can be found at the end of this section.

Let’s say the includes changes upstream in a radical shift to terminfo as well as a name change for the option. When you updated, you might see something like this:

Auto-merging bin/ls/ls.c
CONFLICT (content): Merge conflict in bin/ls/ls.c
error: could not apply 646e0f9cda11... no color ls
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 646e0f9cda11... no color ls

which looks scary. If you bring up an editor, you’ll see it’s a typical 3-way merge conflict resolution that you may be familiar with from other source code systems (the rest of ls.c has been omitted):

<<<<<<< HEAD
#ifdef COLORLS_NEW
#include <terminfo.h>
=======
#undef COLORLS
#ifdef COLORLS
#include <termcap.h>
>>>>>>> 646e0f9cda11... no color ls

The new code is first, and your code is second. The right fix here is to just add a #undef COLORLS_NEW before #ifdef and then delete the old changes:

#undef COLORLS_NEW
#ifdef COLORLS_NEW
#include <terminfo.h>

save the file. The rebase was interrupted, so you have to complete it:

% git add ls.c
% git rebase --cont

which tells git that ls.c has changed and to continue the rebase operation. Since there was a conflict, you’ll get kicked into the editor to maybe update the commit message.

If you get stuck during the rebase, don’t panic. git rebase --abort will take you back to a clean slate. It’s important, though, to start with an unmodified tree.

For more on this topic, https://www.freecodecamp.org/news/the-ultimate-guide-to-git-merge-and-git-rebase/ provides a rather extensive treatment. It goes into a lot of cases I didn’t cover here for simplicity that are useful to know since they come up from time to time.

Updating to a New FreeBSD Branch

Let’s say you want to main the jump from FreeBSD stable/12 to FreeBSD current. That’s easy to do as well, if you have a deep clone.

% git checkout main
% # build and install here...

and you are done. If you have a local branch, though, there’s one or two caveats. First, rebase will rewrite history, so you’ll likely want to do something to save it. Second, jumping branches tends to encounter more conflicts. If we pretend the example above was relative to stable/12, then to move to main, I’d suggest the following:

% git checkout no-color-ls
% git checkout -b no-color-ls-stable-12   # create another name for this branch
% git rebase -i stable/12 no-color-ls --onto main

What the above does is checkout no-color-ls. Then create a new name for it (no-color-ls-stable-12) in case you need to get back to it. Then you rebase onto the main branch. This will find all the commits to the current no-color-ls branch (back to where it meets up with the stable/12 branch) and then it will replay them onto the main branch creating a new no-color-ls branch there (which is why I had you create a place holder name).

20200919

FreeBSD Subversion to Git Migration: Pt 1 Why?

 FreeBSD moving to Git: Why

With luck, I'll be writing a few blogs on FreeBSD's move to git later this year. Today, we'll start with "why"?

Why?

There's a number of factors motivating the change. We'll explore the reasons, from long term viability of Subversion, to wider support for tools that will make the project better. Today I'll enumerate these points. There are some logistical points around how the decision was made. I'll not get into the politics about how we got here, though. While interesting for insiders who like to argue and quibble, they are no more relevant to the larger community that the color of the delivery truck that delivered groceries to your grocer this morning (even if it had the latest episode of a cool, scrappy cartoon cat that was involved in a multi-year arc wooing the love of his life by buying food at this store).

Apache has moved on, so has llvm

The Apache Foundation used to be the care taker and main user for Subversion. They used Subversion for all their repos. While they still technically the caretaker of Subversion, they've moved all their repositories to git. This is a worrying development because the foreseeable outcome of this will be less Subversion development. This will mean the FreeBSD project will need to undertake to support Subversion if we remain on it in the long term. FreeBSD is now the last, large Open Source project using Subversion. LLVM has made its transition to git recently. There are very real concerns about the health and viability of the Subversion ecosystem, especially when compared to the thriving, vibrant git ecosystem.

Better CI support

Git have more support for newer CI tools than subversion. This will allow us, once things are fully phased in, to increase the quality of the code going into the tree, as well as greatly reduce build breakages and accidental regressions. While one can use CI tools outside of git, integration into a git workflow requires less discipline on the part of developers, making it easy for them to fix issues found by CI as part of the commit/merge process before they affect others.

Better Merging

Git merging facilities are much better than subversion. You can more easily curate patches as well since git supports a rebase workflow. This allows cleaner patches that are logical bits for larger submissions. Subversion can't do this.

Git also allows integration of multiple git repositories with subtree rewriting via 'git subtree merge'. This allows for easier tracking of upstream projects and will allow us to improve the vendor import work flow.

Robust Mirroring

Git can easily and robustly be mirrored. Subversion can be mirrored, but that mirroring is far from robust. One of the snags in the git migration is that different svn mirrors have different data than the main repo or each other. Mirroring in git is built into the work flow. Since every repo is cloned, mirroring comes along for free. And there's a number of third party mirroring sites available, such as GitHub, GitLab and SourceForge. These sites offer collaboration and CI add ons as well.

Git can sign tags and commits. Subversion cannot. We can increase the integrity of the source of truth though these measures.

Features from 3rd Party Sites

Mirroring also opens up more 3rd party plug ins. Gitlab can do some things, Github other things, in terms of automated testing and continuous integration. Tests can be run when branches are pushed. Both these platforms have significant collaboration tools as well, which will support groups going off and creating new features for FreeBSD.  While one can use these things to a limited degree, with subversion mirrored to GitHub, the full power of these tools isn't fully realized without a conversion to git.

The wide range of tools available on these sites, or in stand-alone configurations, will allow us to have both pre-commit checks, as well as long-term post-commit tests running on a number of different platforms. This will allow the project to leverage existing infrastructure where it makes financial sense to let others run the tests, while still allowing the project to retain control of the bits that are critical to our operations.

Improved User-Submitted Patch Tracking and Integration

One area we've struggled with as a project is patch integration. We have no clear way to submit patches that will be acted on in a timely fashion, or at least that's the criticism. We do have ways, but they are only partially effective at integrating patches into the tree. Pull requests, of various flavors, offer a centralized way to accept patches, and have tools to review them. This should lower the friction to people submitting patches, as well as making it easier for developers to review the patches. Other projects have reported increased patch flow when moving to git. This can also be coupled with automated testing and other validation of the patch before the developer looks at it, which addresses one of the big issues with past systems: very low signal to noise ratio. While not a panacea, it will make things better and more widely use the scarce developer time.

Collaboration

Some have said that git, strictly speaking, isn't a pure source code management system. It is really a collaboration tool that also supports versioning. This may sound like a knock on git, but really it's git's greatest strength. Git's distributed model allows for easier and more frequent collaboration. Whole webs sites have been built around this concept, and show the power of easy collaboration to amplify efforts and build community.

Skill Set

All of this is before the skill set arguments are made about kids today just knowing git, but needing to learn subversion. That has had increasing been a source of friction. This argument is supported both by anecdotal evidence of people complaining about having to learn Subversion, about professors having to take time to teach it, etc. In addition, studies in the industry have shown a migration to git away from other alternatives. Git now has between 80% and 90% of the market, depending on which data you look at. It's so widely used that our non-use of it is a source of friction for new developers.

Developers are already moving

More and more of the active developers on the project use git for all or part of their development efforts. This has lead to a minor amount of friction because all these ways are not standardized, have fit and finish issues when pushing the changes into subversion, and cause friction. The friction is somewhat offset by the increase in productivity they offer these developers. The thinking is that having git as source of truth will unlock the potential of git even more to increase productivity.

Downsides

First, git has no keyword expansion support for implementing $FreeBSD$. While one can quibble over the git add-ins that do this sort of thing, the information added isn't nearly as useful as Subversions'. This will represent a loss. However, tagged keyword expansion has been going out of style for a long time. It used to be that source code control systems would embed the commit history in files committed, only to discover weird things happened when they were imported into other projects so the practice withered. We ran into a similar issue years ago with $Id$ and all the projects switched to $FooBSD$ instead. This was useful when source code tracking was weak and companies randomly imported versions: it told us what versions were there. Now, companies tend to do this with git, which has better tracking to the source abilities. The value isn't zero, but it's much lower than when we adopted it in the 90s. Git also doesn't support dealing with all the merge conflicts it causes very well. Since the tool rewards people for importing in a more proper way, more companies appear to be using it that way, lessening the need for explicit source marking. [edit: some don't consider this a loss at all].

Second, git doesn't have a running count of commits. One can work around this in a number of ways, but there's nothing fundamental that can be used as a commit number as reliably as subversion. Even so, the workarounds suffice for most uses, and many projects are using git and commit counts successfully today.

Third, the BSDL git clients are much less mature than the GPL ones. Until recently, there was no C client that could be imported into the tree. While one might debate whether or not that's a good idea, there's a strong cultural heritage of having all you need in the base system that's hard to shrug off. OpenBSD recently wrote got which has an agreeable license (but a completely different command line interface, for good or ill). It has its issues, which aren't relevant here, but is maturing nicely. Even with the current restrictions, it is usable. There is an active port of got to FreeBSD due to the large number of OpenBSDisms that are baked in (some necessary, some gratuitous). The OpenBSD people are open to having a portable version of got, so this is encouraging.

Finally, Change Is HARD. It's easy to keep using the same tools, with the same work flows with with the same people as you did yesterday. Learning a new system is difficult. Migrating one work flow to another is tricky. You have to balance the accumulated knowledge and tooling benefits vs the cost it will take to move to something new. The git migration team views moving from subversion to git as the first step in many of improving and refining FreeBSD's work flow. As such, we've tried to create a git workflow that will be familiar to old users and developers, and at the same time allow for further innovation in the future.

Conclusion

Although it's not without risk and engineering trade offs, the bulk of the evidence strongly suggests that moving to git will increase our productivity, improve project participation, grow the community, increase quality and produce a better system in the end. It will give us a platform that we can also re-engineer other aspects of the project. This will come at the cost of retooling our release process, our source distribution infrastructure and developers needing to learn new tools. The cost will be worth it, as it will pay dividends for years to come.

20200831

How to transport a branch from one git tree to another

git branch transport

Recently, I had need to move a branch from one repo to another.

Normally, one would do a 'get push' or 'get fetch' to do this. It's fundamental to git. If you are looking for how to do this, you should see something else.

So what abnormal thing am I doing?

git svn.

So I have two git svn trees. They both point to FreeBSD's upstream subversion repo. git svn is cool and all, but has some issues.

First, it creates unique branches between the two trees. The git hashes are different. This makes it harder to move patches between trees. It's possible, but ugly (too ugly to document here).

Second, sometimes (though rarely) git svn loses its mind. If it does this and you have a dozen branches in flight when it happens, what do you do?

Third, git svn fetch, the first time, takes over a day. So recreating the tree isn't to be undertaken lightly.

How to export / import the branch

Git has a nice feature to send mail to maintainers. You can take all the patches on a branch and email-bomb a mailing list. This is a direct result of the Linux (and friends) work flow. Leaving aside the wisdom of that work flow, git support for it is helpful. Because git also implements a 'hey, I was mail bombed, please sift through it and apply it to my tree' functionality.  Between the two we have the makings for a solution to my problem.

The 'export' side it 'git format-patch'. It normally exports it as a bunch of files. However the '--stdout' flag allows you to export it as a stream.

The 'import' side is 'git am'. One could use 'git apply' but it requires you write a loop that 'git am' already does it.

So, recently, when I had another instance of my every year or two 'git svn screwed the tree up' experience, I was able to transport my branches by putting these together and understanding a bit about what is going on with git.

The Hack

bad-tree% git format-patch --stdout main..foo | \
    ssh remote "(cd some/path && git checkout main && \
        git checkout -b foo && git am)"

so that's it.  We format the patches for everything from the mainline through the end of foo. It doesn't matter where 'foo' is off main. The main.. notation is such that it does the right thing (see gitrevisions(7) for all you could do here).

On the receiving side, we cd to git svn repo (or really any repo, this technique generalizes if you're moving patches between projects, though you may need to apply odd pipeline filters to change paths), make sure we have a clean tree with 'git checkout main' (though I suppose I could make that more robust). We create a new branch foo and then we apply the patch. Easy, simple, no muss, no fuss.

BTW I use '&&' above in case I've fat fingered anything, or there's already a foo branch, etc, it will fail as early as possible.

But what about when things go wrong...

Well, things go wrong from time to time. Sadly, this blog doesn't have all the answers, but you'll basically have to slog through it manually. The few times it has happened to me, I've copied the patch over, and consulted git-am to see how each of the weird things I hit had to be dealt with.

'git am --show-current-patch' is the most helpful command I've found to deal with the occasional oops.

I also have found that fewer patches on a branch I have to do this with, the more likely it will apply on the remote side.

One Last Hack

So I was helping out with the OpenZFS merge and as part of that created a bunch of changes to the FreeBSD spl in OpenZFS. I did this the FreeBSD tree, but needed to then create a pull request. I used a variation of this technique to automate sending the branch from one place to another.

cd freebsd
git format-patch --stdout main..zstd sys/contrib/openzfs | \
    sed -e 's=/sys/contrib/openzfs/=/=g' | \
    (cd ../zfs ; git checkout  master && git branch -b zstd && \
        git am)

This nicely moved the patches from one tree to the other to make it easier for me to create my pull request. Your milage may vary, and I've had to play around with the filter to make sure I didn't catch unwanted things in it... I've not taken the time to create a regexp for the lines that I need to apply the sed to for maximum safety, but so far I've gotten lucky that the above path isn't an any of the files I want to transport this way...

Final Though

One could likely also use subtree merging to accomplish this. I've had issues in the past, though, when there wasn't a common ancestor. Caveat Emptor. It's not the right tool for this problem, but it often is for related problems.

Postscript

Git rebase was originally implemented this way, but with a -3 on the command line, which is quite handy.

20200816

A 35-year-old bug in patch found in efforts to restore 29 year old 2.11BSD

 A 35 Year Old Bug in Patch.

Larry Wall posted patch 1.3 to mod.sources on May 8, 1985. A number of versions followed over the years. It's been a faithful alley for a long, long time. I've never had a problem with patch until I embarked on the 2.11BSD restoration project. In going over the logs very carefully, I've discovered a bug that bites this effort twice. It's quite interesting to use 27 year old patches to find this bug while restoring a 29 year old OS...

After some careful research, this turned out to be a fairly obscure bug in an odd edge case caused by "the state of email in the 1980s." which can be relegated to the dustbin of history...

What is the bug?

Why has no-one else noticed this bug? Well, it only happens when we're processing the last patch hunk in a file, and that patch hunk only deletes lines and the 'new-style' context diff omits the replacement text (since it's implied). Oh, and you also have to be doing a -R patch as well?  That's pretty obscure, eh?

I found it with the following patch from the 2.11BSD series (patch 107). It ends like so:
*** /usr/src/bin/sh/mac.h.old   Fri Dec 24 18:44:31 1982
--- /usr/src/bin/sh/mac.h       Mon Jan 18 08:45:24 1993
***************
*** 59,63 ****
  #define RQ    '\''
  #define MINUS '-'
  #define COLON ':'
-
- #define MAX(a,b)      ((a)>(b)?(a):(b))
--- 59,61 ----

which seems fairly routine and pedestrian, no? However, this hunk runs afoul of a very old bug in the patch code when one tries to reverse apply (-R) it. I got the following output:
--------------------------
|*** /usr/src/bin/sh/mac.h.old  Fri Dec 24 18:44:31 1982
|--- /usr/src/bin/sh/mac.h      Mon Jan 18 08:45:24 1993
--------------------------
Patching file usr/src/bin/sh/mac.h using Plan A...
No such line 62 in input file, ignoring
Hunk #1 succeeded at 53 with fuzz 1 (offset -6 lines).
done

Which looks odd. Why is it complaining about a line that isn't there? why did it misapply the patch 6 lines earlier? It thinks it succeeded, but really added back the MAX macro line too early.

Where is the bug?

While debugging this, I quickly discovered that inverse patch file look weird (patch will generate it for you in the .rej file)

***************
*** 59,61 ****



--- 59,63 ----
  #define RQ    '\''
  #define MINUS '-'
  #define COLON ':'
+
+ #define MAX(a,b)      ((a)>(b)?(a):(b))

Notice the blank lines, they will become important later. They shouldn't be there. The start of the patch should look like:

***************
*** 59,61 ****
--- 59,63 ----

with things snugged together. That's our first clue as to what's going wrong. Since this applies only to reverse patches, we need to make sure that pch_swap is doing what it's supposed to be doing. It's the thing that touches the internal representation when the -R flag is given to 'rewrite' the normalized form of the patch.

Setting breakpoints shows that pch_swap is producing garbage out, because it's getting garbage in. for some reason, the 3 extra blank lines come into this routine for swapping. So it's not a bug in reversing patches. Which is good: this bug doesn't but if it isn't the last hunk in the patch file.

So what is inserting those blank lines?  A little debugging later, lands us on the following code (in FreeBSD, other implementations are similar) in another_hunk() in pch.c:
    len = pgets(true);
    p_input_line++;
    if (len == 0) {
        if (p_max - p_end < 4) {
            /* assume blank lines got chopped */
            strlcpy(buf, "  \n", buf_size);
        } else {
            if (repl_beginning && repl_could_be_missing) {
                repl_missing = true;
                goto hunk_done;
    }
            fatal("unexpected end of file in patch\n");
        }
    }
This is a little hard to follow, but it basically says that if pgets() returns 0 (which it does at the end of the file), then we try to bail out. If p_max - p_end < 4, it will insert a blank line. Otherwise, it will assume the replacement text is missing if we've started looking at the replacement and it could be missing. Fairly straight forward.

p_max gets set to the largest possible extent of the patch in other code in another_hunk() when the "--- 59,61 ---" line is parsed in the original patch. In this case, p_max is 9 and p_end is 6 (it's set to p_end + 61 - 59 + 1). For normal diffs, we'd expect there to be an additional 3 lines of context here. But we don't have that with this diff since they are omitted.

So why '4' in the second 'if' in the quoted code above? what's so magic about it? Indeed, if we hack the patch to have 6 lines of context instead of 3, it applies correctly. So what gives? If we remove that entire if, the patch applies correctly as well. So that's a possible fix, but what are we losing by doing this?

The Fix

As noted, if we just remove the second if entirely and replace it with the lines from the 'else' clause, the patch applies. Now I need to justify just removing the if. An alternate fix would be to say if p_end != repl_beginning apply the heuristic, but otherwise don't. However,  I think that fix is worse because the whole if isn't needed.

The oldest patch version I can find is patch 1.3 which Larry Wall posted May 8, 1985 to mod.sources in the old USENET hierarchy (well, I guess it's all old now, so maybe the pre-reorg hierarchy). The SCCS comments in the file suggest it was started around Christmas the prior year, but I can't find any of those versions extant. The code is clearly there:
            ret = pgets(buf,sizeof buf, pfp);
            if (ret == Nullch) {
                if (p_max - p_end < 4)
                    Strcpy(buf,"  \n"); /* assume blank lines got chopped */
                else
                    fatal("Unexpected end of file in patch.\n");
            }
though I don't think that the bug actually bit that version since it didn't try to fill in the blanks. The 2.0 version, released on Oct 27, 1986 does have code very similar to the code we use today:
           ret = pgets(buf, sizeof buf, pfp);
           if (ret == Nullch) {
               if (p_max - p_end < 4)
                   Strcpy(buf, "  \n");  /* assume blank lines got chopped */
               else {
                   if (repl_beginning && repl_could_be_missing) {
                repl_missing = TRUE;
                       goto hunk_done;
                   }
                   fatal1("Unexpected end of file in patch.\n");
               }
           }
which has this bug for the same reason modern code has this bug...

So 'assume blank lines got chopped' is really only relevant to other types of patches (old-style context diffs I believe). One could also perhaps fix this only for old-style context and normal diffs. However, I think that's the wrong fix too. It's one of many patches that deals with 'diff going from A to B gets distorted in some predictable ways' that we no longer have to deal with.

So why was the code added? I've sent an email to Larry Wall, but I've not heard back from him (he's gone onto perl fame, and doesn't usually mess with patch issues since maybe 1990, so I'm not hopeful of a reply from him). Absent that, though, I can relate my limited experiences of USENET in the late 1980s that are likely relevant. Email was viewed by many authors as a way to get text from point A to point B over very expensive date links, sometimes. As such, there was little compunction for making minor edits to the email that was sent to facilitate these goals. The 'shar' programs of the era recognized this problem and pre-pended X to all the lines in files that were run through them.  A common issue was leading white space being deleted, and this solved that. Other issues with mailers, and mail software, included white space being inserted at the start of every line for replies. patch(1) itself deals with this case by trying to adjust for indented patch files by removing just enough leading white space to dig the applicable part of the diff out of these distorting influences. The notion of fuzz and other heuristics in patch cope, in part, with these difficulties. It's small wonder that in addition to all these issues, it coped with a few lines of trailing white space being deleted, corrupting the patch.

We no longer live in a world where patches are subjected to such hostile conditions. Rather than tweak this heuristic designed to cope with BITNET, UUCP, SMTP, VMS, VM and any number of other mailers in the wild to deal with my case, I would suggest that we should delete the heuristic as no longer relevant. Patch files no longer are subject to this level of mischief. And if they are, adding a few blank lines to the end of patch that's corrupt seems like a much smaller universe of issues than having basic functionality broken. This runs the risk of breaking no, well-formed patches. The new-style context diffs that are padded ignore this padding. unified diffs and other variants patch supports doesn't need this padding and will ignore it. 'ed' scripts don't take this code path. 'old' style context diffs are a extremely rare bird these days.

Side note: Old Style Context Diffs

So what program produced the "so called" old-style context diffs? The earliest diff I could find that produced context diffs was in 4.0BSD. The patch program looks for "*** XX,YY" for old style, but "*** XX,YY ***" for new. Looking at 4BSD sources, we see that they produce the former style. Releases through 4.2 included this style. Starting with release 4.3BSD, the new style was produced. So any system that was 4.2BSD based had the old style, and everything since 4.3BSD has had the new style (including gnu diff, which never produced the old style that I can tell). All diff programs since then have produced new-style context diffs (or the newer unified diffs that are even shorter). 4.3BSD was released in 1986, after the first release of patch, but before 2.0 which accounts for its understanding both variants.

FreeBSD Fix

I've committed the fix for FreeBSD here. It should be trivial to adopt for other versions of patch that I've reviewed.

Conclusion

So, a minor glitch I'd noticed in my reconstruction of 2.11BSD as released lead me to find a bug in patch that's been in the code for 35 years (and been a bug at least 34 of those years). The bug is an extreme edge case that triggers a heuristic for deleted trailing blank lines that in turn causes a problem reversing the patch, but only when it's the last one at the end of a patch file and only if it just deletes lines. Still, it's been rare that I've found and fixed bugs in my career that are 35 years old that I thought I'd write this up. It's also nuts that I found this using 27 year old patches...

Addendem

On hacker news, I see that modern gnu-patch doesn't suffer from this issue. It would appear that gnu-patch had corrected this some time ago. I was looking at an old version when I thought that it hadn't fixed it...

20200806

Recovering 2.11BSD, fighting the patches

Recovering 2.11BSD, fighting the patches

2.11BSD was released March 14, 1991 to celebrate the 20th anniversary of the PDP-11. It was released 15 months after 2.10.1BSD was released in January 1989.

2.11BSD was quite the ambitious release. The high points include:
  1. The kernel logger (/dev/klog)
  2. The  namei cache and argument encapsulation calling sequence
  3. readv(2)/writev(2) as system calls rather than emulation/compatibility routines
  4. Shadow password file implementation (the May 1989 4.3BSD update)
  5. A TMSCP (TK50/TU81) driver with standalone support (bootblock and standalone driver)
  6. Pronet and LH/DH IMP networking support
  7. The portable ascii archive file format (ar, ranlib)
  8. The Unibus Mapping Register (UMR) handling of the network was rewritten to avoid allocating excessive UMRs.
  9. The necessary mods to the IP portion of the networking were made to allow traceroute (which is present in 2.11BSD) to run.
  10. Long filenames in the file system
In addition to a completely new syscall setup. Item 10 required a full new install with restoring from a back up tape. Item #7 would be a thorn in my side.

However, following on the heels of 2.10.1BSD as quickly as it did, there's a number of changes that followed quickly after the release. So, by November of 1994 195 patches had been released for the tree. Well, there were more patches than that (patches 2 and 3 had several parts, and my work has uncovered a number of hidden patches along the way, but more about those later).

The Problem(s)

Well, if we have patch 195, and all 195 patches, what's the problem? Why can't you do a simple for loop and patch -R to get back to original? And for that matter, why were no copies of the original saved?

Turns out the root of both of these problems can be summarized as 'resource shortage'. Back in the day when this was released, 100MB disks were large. The release came on 2 mag tapes that held 40MB each. Saving a copy of these required a substantial amount of space. And it was more important to have the latest release, not the original release, for running the system. It was more efficient and better anyway.

In addition to small disk space, these small systems were connected via USENET or UUCP. These connections tended to be slow. Coupled with the small size of the storage on the PDP-11s running 2.11BSD, the patches weren't what we think of as modern patches. The patches started before the newer unified diff format was created. That format is much more efficient that the traditional context diffs. In addition, compress(1) was the only thing that could compress things, giving poor compression ratios. The UUCP transport of usenet messages also mean that the messages had to be relatively short. So, this mean that the 'patches' were really an upgrade process, that often included patches. But just as often, it included instructions like this from patch 4:
Fix:
        Apply the patch "/tmp/dif" (multiple files updated by it) and
        install the new scripts ":splfix.mfps" and "splfix.movb+mfps".
which included a shar file that needed to be run through /bin/sh (or unshar, if you were lucky). You then needed to follow the instructions, which included patching. Patch 14 illustrates the problem, it includes the instructions that say to run /tmp/c created by the shar. It looks like this:
#! /bin/sh -v
cd /usr/src/usr.bin
rm lint/*
zcat PORT/lint.tar.Z | tar xf -
cd lint
rm :yyfix llib-*
cp -p /tmp/libs libs
So this is a very efficient way to remove files. Often the diffs were larger than the original files, so it was more efficient to remove and replace. But this presents two problems: what files were there before? And even in this case, what files were removed? These operations were not uncommon, and destroyed information. It's a bit like running the sausage mill backwards to rebuild the pig.

Another problem for this project is phantom patches in the first 80 patches. A number of changes appear only in the patches that come with the catch-up kit. These aren't a surprise, as a number of patches going backwards had unexpected slight line number anomalies in them. After patch 80, patch generation appears to be better managed, but at the time of this writing, the author hasn't tried to reapply all the patches to get back to the p195 tape from the reconstructed release tape. The author also has not audited the comp.bugs.2bsd mailing list between the March 1991 release and the November 1994 p195 tape. Given the surprises so far (tftp and tftpd were updated in May 1991, patches were posted in comp.bugs.2bsd, but no formal patch was released and this change does appear in the catch-up patches), additional discoveries like this cannot be ruled out.

The Path Forward

Taken at face value it's hopeless. There's no way to definitively reconstruct the lost information. But there's some glimmers of home. First, there weren't a lot of files deleted like this. lint, pcc, ld, ranlib and a few others were removed. However, what's missing can be rather critical. Next, we have both the prior release (2.10.1BSD). A good first approximation of what is there in the release will be what's in the last release. Many files aren't changed often in the 2BSD. So we have some data present there, even if some was destroyed.

There were also a number of patches. These can provide data about the missing files. There's all the patches in the 2.11BSD series, as well as a number of patches posted to comp.bugs.2bsd after 2.10.1BSD was released. We know from the release notes that these were included in the 2.11BSD release.

Next, we have 4.2BSD, 4.3BSD and even 4.4BSD and the SCCS files that go along with them. This means that we can see changes that were going on in the 4BSD series, and we know much code came from there in 2.11BSD, both before and after the release.

Next, we have dates. We are fortunate that all the files are dated in the tree. This means we have a way of testing if we've missed anything in our reconstruction. If we reconstruct something that's dated after the release of 2.11BSD, we have a very good reason to believe that we've missed something.

There was a catch-up kit that was produced around patch 80. It contained a number of commands to re-removed what should have been removed, as well patch everything up. This gives us a second way of checking: can we apply this kit to get to patch 80? Does it work? And if we replay the individual patches we should get the same thing. And it should be the same as we got unapplying the patches back from 195. A word of caution: as with all other things around this project, we must always be open to the possibility of a mistake in the catch-up kit. There are at least a few files that it tries to remove that aren't necessarily there.

Finally, we know that the system was released as binaries. Which means that the system built. So we can test the reconstruction by compiling it. We should be able to build the final system. But we need to do that with the final system, so some way must be found to bootstrap it for this test. I'll talk more about that in another blog.

The Slog Backward

The slog backwards proceeded with fits and starts. For the first 20 or so patches (that is patch 177-195), it was fairly straight forward. I was able to use them to develop the framework to proceed. It wasn't until I hit the a.out binary format changes that I had problems (the first, looking back, of these is 176).

The Framework

I settled on a fairly simple framework going backwards. There are three types of patches in the archive. First, there are the pure patches. These have a simple patch set to apply, and maybe some verbal description of commands to run. Often, these commands were just the minimal amount to rebuild so you didn't have to suffer through a day's long compile. Sometimes, they included sneaky instructions. The patches 89-99 all contain the following text:
The /GENALLSYS script is obsolete and has been removed
so in addition to applying the patches, the script needed to be removed. To complicate things, some patches were generated from the root directory, while others were generated in the leaf node. To this end when the script detects a simple patch it tries to locate a hints/patch#.setup script. If one is found, then that script is sourced. Setup scripts are expected to be idempotent so we can reuse them for the forward trip. They all do a cd to where the patch was generated from. After the setup, if any, the mk211bsd script will apply the patch.

However, when there's other things to do, that won't work. So, there's a second way to assist. If there's a hints/patch#.unpatch, then that will be run and it is responsible for moving the patch backwards. About 20 patches need this assistance. For example, patch 118 replaces ucb/Mail with a port from 4.3. So, I need to reconstruct what was there. In this case, I copy the ucb/Mail files from 2.10.1 and then check for patches in comp.bugs.2bsd (there aren't any) and in the patch stream (patch 10 tweaks it). So, I have to reapply all those patches in the unpatch script.

The second type of file is the shar file. Here, one has to run the shar file through /bin/sh (or unshar) and use the extracted files to do something. Often times there's a short script to run and a patch to apply, but other times it's more complicated. In these cases, you must provide an hints/patch#.unshar file. This file takes care of unsharing things, and running whatever you need to do to unapply it. This varies a lot. In some cases, you extract a patch (or set of patches) and reverse apply them. In other cases, you need to remove files. In still others you need to snag them from 2.10.1 and touch them up (like we did for ucb/Mail).

The final type is a tar ball. These files are treated like patches, with the usual unpatch protocol. However, the unpatch program script has to know how the tar file encoded its patches.

Interesting Patches

There's 195 patches (and a few sub-patches). Describing all of them would be quite tedious. Only about 80 hints are needed (and a few additional files for reconstructed fixups). I'll describe some of the more interesting ones briefly here. Most of the interesting ones are in the last 50 or so. And by interesting, I mean ones that destroyed information such that it had to be reconstructed or that otherwise affected the reconstruction.

Patch 185 replaces m4, so we have to snag the one from 2.10.1BSD to replace it, as well as unapply a patch in the shar file. Patch 115 did a fixup on the Makefile, so we have to reapply it so that patch 115 will unapply correctly.

Patch 184 fixes the makefiles to use pipelines instead of temporary files now that the assembler can cope with pipelines. At the same time, it removed a few obsolete compatibility system calls that need to be added back from 2.10.1BSD. Adding them back messes up the directory times, which make depends on, so workarounds have to be deployed in build2.sh, the bootstrap script (make in 2.11BSD assumes times in the future mean the directory is up to date and doesn't need to be rebuilt, while newer makes will ignore that time and descend into the directory always).

Patch 180 updates symcompact.c and strcompact.c by replacing them entirely. These files weren't in 2.10.1BSD, but came in with patch 172, so needed to be recovered from there. It also adds symdump.c which needs to be removed. There's no data loss here, but it illustrates the need to look at all the patches to see what might need to be recovered (and what doesn't).

Patch 178 removes an unused header, and fixes up another one. The headers were resynchronized in patch 175, so a small fixup is needed here. Need to investigate this, and other 'fixup' patches. One unknown is whether 2.11BSD shipped with the unsynchronized /usr/include and /usr/src/include or not. The reconstruction tries to get the right ones in each place, but uses the /usr/include ones as the ones in /usr/src/include when we can't otherwise reconstruct.

Patches 158-176 remove the 8 character limit to symbol names in programs. These are quite disruptive, but for the most part just large patches. ranlib(1) was grabbed from 2.10.1 and patched to cope with the new archive format to undo these patches. ld(1) needed similar work (and once I started rebuilding things, I had to fix so it would compile). It also needed some fixes from comp.bugs.2bsd. Initially, I'd kept them separate, but I eventually needed to merge them together. ranlib(1) and ld(1) represent the largest effort in retro-programming to reconstruct (code was cut and pasted from other programs that survived and minimally changed to be functionally correct).

Patches 151 and 152 rework the assembler. A number of patches to as were posted to comp.bugs.2bsd that needed to be applied to 2.10.1's assembler to make things work out. A couple of minor tweaks were needed after all these changes were applied, though, to make the patches work out.

Patch 142 unapplied the patch, but then needs to unapply changes to all the kernel config files. This a recurring issue for the generated files, so some fancy scripting is needed to apply them to all the kernel config directories. Patches 124, 121, 93 (which has to remove some directories), 84, 83, 72, 42, 36, 4, and 2. There's a number of issues with patching the kernel, not least was that overlay structures were often hacked on the fly and preserving them is hard. The restoration has only tried to get GENERIC and possibly KAZOO kernel configs correct, and the others it has largely ignored. I'll do a full blog on all the issues to detail what is or isn't recovered.

Patch 132 brings in named, so it needed to be recovered from 2.10.1, plus patches from patch 106.

Patch 123 is actually missing a part. It updates the documentation for how to install the system. This means we need to apply some kind if fixup. But all we know for sure is the SCCS id changed, so that's all we fix up. If I find more data to support other changes, I'll update to include that, but so far nothing has surfaced.

Patch 118 replaces ucb/Mail with the one from 4.3BSD. Revert to 2.10.1BSD's version and apply a patch from patch 10. There's no publicly posted patches to 2.10.1BSD for ucb/Mail.

Patch 80 is the catch-up kit patch. A lot of files were added from USENET. Undoing it is straight forward, making it go forward will tough, but we have the files at least. This is both a blessing and a curse to the project. On the one hand, we have a second way of cross checking things. On the other, the cross checks reveal many problems, including updated kermit being missing entirely (though, to be fair, /usr/src/new and /usr/src/local weren't considered part of the system until patch 195, so prior to that patches to this part of the tree were irregular).

Patch 40 is an announcement of corrupt files in the original release in /usr/games. I've chosen to restore them to what was intended for the original tape.

Patch 17 massively reworked pcc. It's not quite right in the reconstruction yet, however, since the 'redo catch-up kit' test fails to delete some expected files, what shipped in 2.11BSD isn't quite right. It may never be right, and that may not matter since it didn't work anyway.

Patch 14 fixes lint. It was totally broken in 2.11 (but worked in 2.10.1). Undoing it may also be incorrect for a similar reason as patch 17. Since neither one of these worked, it may be quite hard to know what 2.11 shipped with. A case could be made, though, that it doesn't matter...

Patch 2 removed tmscpboot.s. The rest of the patch undoes easily, but tmscpboot.s was new in 2.11BSD. We know it is derived from tkboot.s which comes from Ultrix-11, but I've not yet been able to tweak it to work yet...

Current Status

At this point, I've reconstructed a possible release tape. A series of them, actually. All of them fail the tests set forth above (though in fewer and fewer ways each iteration). I'm able to use the different tests to suggest fixes. I've recovered 5 lost patches (which I write about elsewhere). There's at least 5 more I believe based on the attempts to apply the catch-up kit. I discovered the first 5 based entirely on bad dates. The current reconstruction is decent (maybe 30-40 files still aren't quiet right, and a similar number are reconstructions that work, but it's impossible to know if they are missing bug fixes). The tmscpboot stuff is still missing, and there's a number of minor fixups that need to be reconciled with the known missing patches. All very careful, detailed work, so it may be some time to work through them.