20200816

A 35-year-old bug in patch found in efforts to restore 29 year old 2.11BSD

 A 35 Year Old Bug in Patch.

Larry Wall posted patch 1.3 to mod.sources on May 8, 1985. A number of versions followed over the years. It's been a faithful alley for a long, long time. I've never had a problem with patch until I embarked on the 2.11BSD restoration project. In going over the logs very carefully, I've discovered a bug that bites this effort twice. It's quite interesting to use 27 year old patches to find this bug while restoring a 29 year old OS...

After some careful research, this turned out to be a fairly obscure bug in an odd edge case caused by "the state of email in the 1980s." which can be relegated to the dustbin of history...

What is the bug?

Why has no-one else noticed this bug? Well, it only happens when we're processing the last patch hunk in a file, and that patch hunk only deletes lines and the 'new-style' context diff omits the replacement text (since it's implied). Oh, and you also have to be doing a -R patch as well?  That's pretty obscure, eh?

I found it with the following patch from the 2.11BSD series (patch 107). It ends like so:
*** /usr/src/bin/sh/mac.h.old   Fri Dec 24 18:44:31 1982
--- /usr/src/bin/sh/mac.h       Mon Jan 18 08:45:24 1993
***************
*** 59,63 ****
  #define RQ    '\''
  #define MINUS '-'
  #define COLON ':'
-
- #define MAX(a,b)      ((a)>(b)?(a):(b))
--- 59,61 ----

which seems fairly routine and pedestrian, no? However, this hunk runs afoul of a very old bug in the patch code when one tries to reverse apply (-R) it. I got the following output:
--------------------------
|*** /usr/src/bin/sh/mac.h.old  Fri Dec 24 18:44:31 1982
|--- /usr/src/bin/sh/mac.h      Mon Jan 18 08:45:24 1993
--------------------------
Patching file usr/src/bin/sh/mac.h using Plan A...
No such line 62 in input file, ignoring
Hunk #1 succeeded at 53 with fuzz 1 (offset -6 lines).
done

Which looks odd. Why is it complaining about a line that isn't there? why did it misapply the patch 6 lines earlier? It thinks it succeeded, but really added back the MAX macro line too early.

Where is the bug?

While debugging this, I quickly discovered that inverse patch file look weird (patch will generate it for you in the .rej file)

***************
*** 59,61 ****



--- 59,63 ----
  #define RQ    '\''
  #define MINUS '-'
  #define COLON ':'
+
+ #define MAX(a,b)      ((a)>(b)?(a):(b))

Notice the blank lines, they will become important later. They shouldn't be there. The start of the patch should look like:

***************
*** 59,61 ****
--- 59,63 ----

with things snugged together. That's our first clue as to what's going wrong. Since this applies only to reverse patches, we need to make sure that pch_swap is doing what it's supposed to be doing. It's the thing that touches the internal representation when the -R flag is given to 'rewrite' the normalized form of the patch.

Setting breakpoints shows that pch_swap is producing garbage out, because it's getting garbage in. for some reason, the 3 extra blank lines come into this routine for swapping. So it's not a bug in reversing patches. Which is good: this bug doesn't but if it isn't the last hunk in the patch file.

So what is inserting those blank lines?  A little debugging later, lands us on the following code (in FreeBSD, other implementations are similar) in another_hunk() in pch.c:
    len = pgets(true);
    p_input_line++;
    if (len == 0) {
        if (p_max - p_end < 4) {
            /* assume blank lines got chopped */
            strlcpy(buf, "  \n", buf_size);
        } else {
            if (repl_beginning && repl_could_be_missing) {
                repl_missing = true;
                goto hunk_done;
    }
            fatal("unexpected end of file in patch\n");
        }
    }
This is a little hard to follow, but it basically says that if pgets() returns 0 (which it does at the end of the file), then we try to bail out. If p_max - p_end < 4, it will insert a blank line. Otherwise, it will assume the replacement text is missing if we've started looking at the replacement and it could be missing. Fairly straight forward.

p_max gets set to the largest possible extent of the patch in other code in another_hunk() when the "--- 59,61 ---" line is parsed in the original patch. In this case, p_max is 9 and p_end is 6 (it's set to p_end + 61 - 59 + 1). For normal diffs, we'd expect there to be an additional 3 lines of context here. But we don't have that with this diff since they are omitted.

So why '4' in the second 'if' in the quoted code above? what's so magic about it? Indeed, if we hack the patch to have 6 lines of context instead of 3, it applies correctly. So what gives? If we remove that entire if, the patch applies correctly as well. So that's a possible fix, but what are we losing by doing this?

The Fix

As noted, if we just remove the second if entirely and replace it with the lines from the 'else' clause, the patch applies. Now I need to justify just removing the if. An alternate fix would be to say if p_end != repl_beginning apply the heuristic, but otherwise don't. However,  I think that fix is worse because the whole if isn't needed.

The oldest patch version I can find is patch 1.3 which Larry Wall posted May 8, 1985 to mod.sources in the old USENET hierarchy (well, I guess it's all old now, so maybe the pre-reorg hierarchy). The SCCS comments in the file suggest it was started around Christmas the prior year, but I can't find any of those versions extant. The code is clearly there:
            ret = pgets(buf,sizeof buf, pfp);
            if (ret == Nullch) {
                if (p_max - p_end < 4)
                    Strcpy(buf,"  \n"); /* assume blank lines got chopped */
                else
                    fatal("Unexpected end of file in patch.\n");
            }
though I don't think that the bug actually bit that version since it didn't try to fill in the blanks. The 2.0 version, released on Oct 27, 1986 does have code very similar to the code we use today:
           ret = pgets(buf, sizeof buf, pfp);
           if (ret == Nullch) {
               if (p_max - p_end < 4)
                   Strcpy(buf, "  \n");  /* assume blank lines got chopped */
               else {
                   if (repl_beginning && repl_could_be_missing) {
                repl_missing = TRUE;
                       goto hunk_done;
                   }
                   fatal1("Unexpected end of file in patch.\n");
               }
           }
which has this bug for the same reason modern code has this bug...

So 'assume blank lines got chopped' is really only relevant to other types of patches (old-style context diffs I believe). One could also perhaps fix this only for old-style context and normal diffs. However, I think that's the wrong fix too. It's one of many patches that deals with 'diff going from A to B gets distorted in some predictable ways' that we no longer have to deal with.

So why was the code added? I've sent an email to Larry Wall, but I've not heard back from him (he's gone onto perl fame, and doesn't usually mess with patch issues since maybe 1990, so I'm not hopeful of a reply from him). Absent that, though, I can relate my limited experiences of USENET in the late 1980s that are likely relevant. Email was viewed by many authors as a way to get text from point A to point B over very expensive date links, sometimes. As such, there was little compunction for making minor edits to the email that was sent to facilitate these goals. The 'shar' programs of the era recognized this problem and pre-pended X to all the lines in files that were run through them.  A common issue was leading white space being deleted, and this solved that. Other issues with mailers, and mail software, included white space being inserted at the start of every line for replies. patch(1) itself deals with this case by trying to adjust for indented patch files by removing just enough leading white space to dig the applicable part of the diff out of these distorting influences. The notion of fuzz and other heuristics in patch cope, in part, with these difficulties. It's small wonder that in addition to all these issues, it coped with a few lines of trailing white space being deleted, corrupting the patch.

We no longer live in a world where patches are subjected to such hostile conditions. Rather than tweak this heuristic designed to cope with BITNET, UUCP, SMTP, VMS, VM and any number of other mailers in the wild to deal with my case, I would suggest that we should delete the heuristic as no longer relevant. Patch files no longer are subject to this level of mischief. And if they are, adding a few blank lines to the end of patch that's corrupt seems like a much smaller universe of issues than having basic functionality broken. This runs the risk of breaking no, well-formed patches. The new-style context diffs that are padded ignore this padding. unified diffs and other variants patch supports doesn't need this padding and will ignore it. 'ed' scripts don't take this code path. 'old' style context diffs are a extremely rare bird these days.

Side note: Old Style Context Diffs

So what program produced the "so called" old-style context diffs? The earliest diff I could find that produced context diffs was in 4.0BSD. The patch program looks for "*** XX,YY" for old style, but "*** XX,YY ***" for new. Looking at 4BSD sources, we see that they produce the former style. Releases through 4.2 included this style. Starting with release 4.3BSD, the new style was produced. So any system that was 4.2BSD based had the old style, and everything since 4.3BSD has had the new style (including gnu diff, which never produced the old style that I can tell). All diff programs since then have produced new-style context diffs (or the newer unified diffs that are even shorter). 4.3BSD was released in 1986, after the first release of patch, but before 2.0 which accounts for its understanding both variants.

FreeBSD Fix

I've committed the fix for FreeBSD here. It should be trivial to adopt for other versions of patch that I've reviewed.

Conclusion

So, a minor glitch I'd noticed in my reconstruction of 2.11BSD as released lead me to find a bug in patch that's been in the code for 35 years (and been a bug at least 34 of those years). The bug is an extreme edge case that triggers a heuristic for deleted trailing blank lines that in turn causes a problem reversing the patch, but only when it's the last one at the end of a patch file and only if it just deletes lines. Still, it's been rare that I've found and fixed bugs in my career that are 35 years old that I thought I'd write this up. It's also nuts that I found this using 27 year old patches...

Addendem

On hacker news, I see that modern gnu-patch doesn't suffer from this issue. It would appear that gnu-patch had corrected this some time ago. I was looking at an old version when I thought that it hadn't fixed it...

No comments: