20200426

Thanks for the tips!

Quick Update

Thanks for the tips!

Warren Toomey pointed me at the Kermit in 4.3BSD: It was 4C(057). But it was a trimmed copy of C-Kermit: just the unix files were there. I don't know how I missed it. This version was copied far and wide, but I didn't have any 4.3BSD trees extracted.

Warren Toomey also pointed me to the net.sources group. I'd looked in the comp.sources.* groups, but found nothing. Turns out C-Kermit was posted to net.sources (twice). There was a warning not to post it, that came too late, not to post it. There was much pent up demand for C-Kermit 4, it seems. Both 4.0(025) and 4.2(030) were posted. I've not looked closely, but there's some minor differences between the two, maybe due to more glitches in the 4.0 conversion on the DECUS tape I found. I'll have to see if any of the other differences matter.  I've not looked at the 4.2 to see if it matches or not.

I got another copy of an early 4F version, but it may be the OS/2 version. I need to study it some more. It says it was 4F(088). I'll look into it more when I have time.

Another friend suggested a search that lead me to another site that has DECUS tapes online. So far I've found 4C(053) and 4C(058). The latter is awesome, if complete, because the only 058 releases I've been able to find are for the Amiga, with all other files stripped. There's files for 'i', 'm', 'u' and 'v' there, which I think are Amiga, Macintosh, Unix and VMS respectively. I haven't tried to build it, so I don't know if there's issues with conversion or not. And 053 is awesome too because it lets me look at the DECUS version to see what DEC added, and what later made it into C-Kermit.  I feel kinda bad about finding these, since I wrote a script to pull files down and it kinda ran amuck before I noticed and could fix it. Still, a pure 053 gets me one step closer to recreating 4C(052) for the BSW Venix binary.

I also found a few more 5A versions, but I'm unsure what I'll do with those... There's a lot of them out there, and many are hacked for this version or that of Unix or whatever.

So if you have any version not listed here, please let me know.

20200422

Finding Kermit 4x

Unix C-Kermit 4x Versions


As part of my efforts to reconstruct my Venix system to sources, I recently went on a hunt for old versions of Unix Kermit. There's a fair number of them, but not many from the kermit project web site. Prior to version 4, unix kermit was a command line only program. These versions are adequately represented in the Unix archive, mostly because they had a funky name that C-Kermit didn't overwrite. I've also ignored gkermit, which appears to be a command line version of C-Kermit for latter-day Linux systems (though it retained support for the common early systems).

So I concentrated only on the 'Version 4' series of releases. In the end I've found 2 missing versions that I can say with good confidence are the final version of them, three preliminary or interim versions and one modified version of a preliminary version. I also found a great diversity of 5A betas, which I've not written up here. The early history of the 4x releases is missing, which is why I went on this hunt. I like a challenge.

UPDATE: I'll be posting an addendum blog in the coming days because people have sent me pointers to other versions. Stay tuned.

Known History of Kermit Versions

Here's a brief table of relevant 4x versions. I've omitted the earlier versions (they are available at the kermit archive) and the later versions (they are basically too).

Version Date Comments
4.0 Missing No known copies
4.0(025)5 Feb 85First Release of Unix Kermit version 4.
4.2(030)5 Mar 85In kermit archive and on 1987 Usenix tape
4C(050)30 May 85First enumerated beta after 4.2 and version name change
4C(052)18 Jun 85Boston Software Works Venix/Kermit.
4C(053)21 Jun 85DECUS VMSLT Venix/Pro sources labeled 4C(053)+1 (DEC changes)
4C(056)12 Jul 86Testing release, no copies in Kermit Archive
4C(058)19 Mar 86Official release, no copies in Kermit Archive
4D(060)18 Apr 86Testing release, no copies in Kermit Archive
4D(061)8 Sep 86Testing release, no copies in Kermit Archive
4E(067)14 Sep 87Testing release, no copies in Kermit Archive
4E(070)29 Jan 8810th Edition Unix and iubioarchive copies
4E(072)24 Jan 89official release in kermit archive
4F(095)31 Aug 89official unreleased release in kermit archive

Kermit Version Naming Convention

C-Kermit basically started with version 4.0 for a variety of historical reasons. 4 was the first one that had a command line built into it. 4.0 went out to a limited group, and 4.2 was released as a wider-spread beta. BSD 4.2 was coming out at this time, so Kermit changed its naming convention to using a letter: 4C, 4D, etc. The number in () is a change count. Older systems didn't conform to a 'minor' number, but instead used a change count to which was either an actual count of changes, or builds or some other incrementing counter since the project started or was rewritten. So 4C(050) is the 50th release (or change) and comes before 4C(051). The Kermit project used the 'release number' convention where 'release' is poorly defined during this time period, so all the 'release numbers' documented in change logs may not have been put on an FTP site.

Venix from Boston Software Works

The Boston Software Works Rainbow Venix, the driver of my obsessions for the last few years, included a version of kermit. On startup it printed
C-Kermit, 4C(052) 12 Jun 85
which I thought would be easy enough to find. Little did I know it would be trouble. When I first was looking at this two or three years ago, I found the following on the Columbia Kermit Archive:

C-Kermit 4.0 (5 Feb 1985), the first interactive version, through 4D are missing.

which was discouraging. I let the matter sit there. I noticed a bit later this news item: C-Kermit 4.2 that talked about 4.2 being rediscovered. So I was able to get 4.2 and 4E, but not 4F running on my Rainbow which helped (4E is an improvement over 4C that BSW shipped, 4.2 is about the same), but it left me curious. 4E was right on the edge of size, and 4F appears to be just a bit too big to run, so I've put off trying to puzzle out if I can get that working. 5A won't even link.
We see from the above table that this is an unofficial release, so we may never get the actual bits for it. Let's see what the best we can find is, however is.

Side Tracked: Hunting for a Xenix-11 Tape

Misspiggy, a PDP-11/70 that Microsoft donated to Living Computer Museum in Seattle Washington was recently demoed running V7 Unix and the adventure game. In an offhand comment, they said they wanted to run Xenix-11 on this, since that's what Microsoft ran on it. They were looking for a copy to run since they apparently didn't have one. There's a catalog entry at LCM for a XENIX tape, but that's not surfaced.

But Warren Toomey over at TUHS shared with me a tape he thought might contain XENIX (it had XENIX in the filename). Turns out that tape was just two files of a V6  system that was otherwise unremarkable, and a copy of Venix that never booted (most likely it was made / hacked together on said V6 system since the filesystem was V6, but Venix is a V7/System III port that uses the newer filesystem layout). Disappointed, that it didn't pan out, Warren shared with me some Venix related files since he knew of my work...

A new archive of PRO Venix

One of archives appeared to be Venix for the PRO that DECUS had distributed after Venturcom abandoned its support for the Professional line of DEC personal computers. It was a series of 22 diskettes. I've not looked through them all, but I did find that two of the diskettes had Kermit on them! One binary and one source! Extracting the files lead to this discovery in ckcmai.c:
char *versio = "C-Kermit, 4C(053)+1 21 Jun 85";
which is quite close.

The changes that were included were a number of changes for VMS, Support for US Robotics 212 modems, better support for some DEC modems, some technical corrections for auto dialing, some cleanup of help messages, many tweaks to cope with Venix's Code Mapping feature (kinda like overlays, but somehow different than traditional overlays), better handling of hangup and various debugging fixes. At least according to the file DECnotes that was included on the exe diskette.

So this is a hacked version of a beta version of 4C. So we're not there yet, but closer than before. Let's go looking on the internet for more.

Google to the Rescue

c-kermit has rather unique filenames. In order to cope with the realities of a PDP-10 with one big giant directory of files (which also helped the master tapes it produced), it developed a naming scheme where each of the first few letters means something. K11 was the MACRO-11 version for the PDP-11, K10 was the BLISS and MACRO-10 version, CP4 was for CP/M, BBC was for the BBC Acorn, etc (there's 129 of them on one of the full KERMIT tapes I found). CK is the code for the Unix C version, later the general C version. The third letter further specified what system: c for all, u for unix, v for vms, 9 for OS9, i for Amiga, etc.

Next, there are a number of different versions stamped in different files. However, the one that's most interesting is in ckcmai.c. Since version 4C, it's been the name of the file where the version printed at starupt lives. Now 'ckcmai.c' is a fairly unique string, and plugging it into Google gives a lot of results. There's a lot of them, but it's easy to churn through them all. I've omitted 5.x and newer. There's about a dozen different 5A betas that can be found this way. I've also omitted released versions we already have (4E and 4F, even variants than what's available at kermitproject.org).

char *versio = "C-Kermit, 4D(061) 8 Sep 86";
V10 Unix also has this
char *versio = "C-Kermit, 4E(070) 29 Jan 88";
char *versio = "C-Kermit, 4E(070) 29 Jan 88";

which is promising, but is only a couple of variants. We'll need to widen our search. First thing to note is before the 4C release, ckcmai.c was called ckmain.c. Widening that, we find a copy of 4.2 both in the kermit archives and on 1987 Usenix tapes:
char *versio = "C-Kermit 4.2(030) PRERELEASE # 2, 5 March 85";
which matches the version in the kermit archive. Different BSD distributions contains a number of the 5A betas mentioned above, but not listed here.

Kermit Archive

The Kermit Software Archive has a number of interesting bits of history in it. However, it doesn't have C-Kermit before 4E in it. Some of the specialty ports have old versions, but they are so modified that reconstruction is limited. Acron kermit "Panos-Kermit" and Archimedes kermit "Arthur-Kermit" both were forked from 4C(052). There's a small discrepancy between acorn kermit and the website, though. The website says it's based on 4C(057), but the main file says derived from 4C(052). These might prove useful to get back to the putative 4C(052) that my Boston Software Works Rainbow Venix came with, but it's unclear how best to thread that speculative path currently, so I'll have to put that aside for another time. It would be nice to have these sources, but it would be a speculative trudge to try to reverse engineer them from the kermit binary I have and there's other, more important reverse engineering to do there. There is also a 4D(061) that's derived from 4D(061) (supposedly, we'll see later) hacked for Minix v1.

What's FISH?

Fish is last name of a rather prolific gentleman named Fred Fish. He pulled together a collection of freeware disks for the Amiga which were instrumental in distributing freeware for the Amiga through the mid 90s. A very early disk, #26, had a copy of C-Kermit on it, ported to the Amiga. However, I've had to discard this potentially useful line of inquiry. The Amiga version is missing all the Unix files and has also been somewhat modified in ways that aren't at all clear to me. The search engine at aminet also fails to find anything but the latest version of kermit, which limits its usefulness.

DECUS Tapes

You'll notice above that there's a hit on DEC tapes for RSX-11. I found it in another location as well when I did a variation of the search. iblibio has a similar sort of thing, and it was easier to grab than the classiccmp site which had a lot of extra stuff that I wasn't sure about needing. So I mirrored that instead and hit the mother load.
rsx84b: "C-Kermit 4.0(025) PRERELEASE TEST VERSION, 5 Feb 85"
rsx85a: "C-Kermit, 4C(056) 12 Jul 85"
rsx86a: "C-Kermit, 4D(060) 18 Apr 86"
rsx87b: "C-Kermit, 4D(061) 8 Sep 86"
rsts/sig87: "C-Kermit, 4D(061) 8 Sep 86"
rsx87b: "C-Kermit, 4E(067) 14 Sep 87"
Sadly, there's no central repository of DECUS tapes, and the central library that existed at one time has gone away in all the M&A activity after DEC was bought and the support DEC gave DECUS dried up as Compaq and HP valued it less and less.

Reader Contributed

I've received the following versions from a reader:
C-Kermit, 4F(088) 19 Jul 89
C-Kermit 5A(166) ALPHA, 17 Mar 91
I'm working to verify them. There are many 5A versions on the net (some modified, some not), but this one is earlier than them all. I may do a followup with 5A, or I may leave that to others. This section of the blog may update in the future.

Version matching

4.0(025)

We have one copy of this. It appeared on the RSX84b DECUS tape. It's mixed in with all the other files for the PDP-11, which is a bit strange. It only runs on BSD4.2. I believe this is the initial release that went out the door to the world. Google has found the Info Kermit archives, which I used to piece this together:
  1. The date on these files is 5 Feb 85.
  2. The ckermi.ann file contained a copy of the Info-Kermit digest he sent out on Feb 5, 1985 announcing this.
  3. Work began in the summer of 1984 and was teased on the Info-Kermit mailing list
  4. uxkermit was released in September as an interim release that improved the earlier unix kermit releases (I've found 3.0(0) dated 8/1/84 and 3.0(1) dated 11/5/84 in various places).
  5. Frank de Cruz announced on Nov 28, 1984 that "Although far from ready for release, some progress has been made on the new (version 4) Unix Kermit." in an email to Info-Kermit. This was his last word on the topic until Feb 5, 1985.
  6. Within 2 days of the "Unix Kermit 4.0 Announcement," there were 16 different ports were announced. Within a month, it's exploded to too many platforms to mention and 4.2 was readied.
All these things lead me to believe this is the legitimate the first public 4.0 release, and there are no others to be found. We're quite fortunate this made it onto whatever Kermit Tape the RSX SIG used for their RSX84b tape.

I've prepared a ckc025.tar file that captures the 4.0 state of the release.

4.2 versions

We have two copies of 4.2. There's one from the kermit archive, and a second from the Usenix 1987 show some differences. A quick diff shows ckusr2.c differences. However, ckusr2.c.orig matches exactly the version in the kermit archive. So we've found a confirmation that the version that showed up is good. So this is confirmation that copy of the code in the Kermit Archives is good.

4C diversity

So we have all or part of the 4C(052),  4C(053), 4C(056) and 4C(058). Since the DECUS tapes are otherwise most reliable, it would seem that 4C(056) is the best of the lot in terms of original sources. We know that the 052 and 058 copies aren't for Unix, so lack the cku*.c files and they've been heavily modified for their targets. 053 has all the unix files, but modified in a number of ways that are documented. So we don't have the actual, final 4C release, but do have the 4C(056) release.

I've prepared ckc056.tar to capture this. I've also put together a ckc053-decus.tar to capture the modified version from DEC. I've provided a link to the Amiga 4C(058) files above, so won't be creating anything special for that, since it seems to be of limited usefulness.

Looking at the changelog, 053 and 052 differ only in the declaration of dopar as CHAR, so it appears all I'd need to unwind from the 053+1 release is DEC's changes. A fun project for another day.: 

UPDATE: 4.3BSD has 4C(057) included as well, a new version. Thanks to Warren Toomey for bringing this to my attention. Will post followup.

4D(060)

We have 1 copy of this from the DECUS RSX86a tape. Spot checking of the diffs between this and 4D(061) more or less match the change log and suggest this is a try copy of this release.

I've created a ckc060.tar based on these files to capture this version.

4D(061)

We have three copies of what appears to be 4.0D(061). One is from the Usenix 87 tape. One is on the DECUS RSX87b tape. And one from the DECUS RSTS/e 87 SIG tape. The RSTS/e tape is identical to the RSX87b tape, apart from weird line endings and NULs at the end of files. Which one do we believe. In an ideal world, we'd do a diff, they'd be the same and we'd go home. That didn't happen. so let's dive in. Apart from files that are just in one directory, there's 4 differences between these two sources: ckuker.bwr, ckukern.mak, ckwart.c and ckwart.doc. ckcuker.bwr are fairly different, but the ndifferences start like this:
--- rsx87b/ckuker.bwr      1987-08-05 18:00:00.000000000 -0600
+++ usenix87/ckuker.bwr        1987-08-14 15:04:02.000000000 -0600
@@ -1,6 +1,6 @@
-C-Kermit Version 4D(061):
+C-Kermit Version 4D(060):
 Status, Bugs, and Problems
-As of: 12:07pm  Thursday, 19 March 1987
+As of: 7 July 1986
So it would appear the DECUS tape got the ckuker.bwr right, and something is wrong with the c-kermit on the Usenix 87 tape. If we look at ckukerm.mak, we see that it's also a regression on the Usenix tape (2.06 vs 2.05). ckwart.c and .doc have the same issue too (copyright 1985 vs 1984). so this suggests strongly that we can grab the sources from the Usenix 87 sources, but augment them with the DECUS tape for these 4 files. This will also let us eliminate the extra files from the DECUS tape not part of C-Kermit. Two copies from disparate locations gives us good confirmation this is the right resolution. Also, the timing of when these tapes were created (both in August of 87) limits how late the copies were.

However, there's one last wrinkle, though, in all this, which suggests there were actually two different 4D(061) releases. The date listed in the ckcmai.c file is "8 Sep 86" and the dates newer than this date in the files affected  suggest that the Usenix tape is a truer copy of 4D(061) as released, but perhaps that the DECUS tape was a later correction to fix a couple of minor 'oopses' in that release might be best seen as 4D(061) as intended. The change log is not helpful, other than saying one of the changes was for 2.9BSD on a Pro-380, which is one of the changes in ckukerm.mak (it lumps everything after 4D(061)-4E(066) together). So the DECUS tape likely represents a 4D(062) snapshot, likely reflecting the KERMIT distribution tape / single directory practiced at the time. To be honest, 062 is kinda arbitrary, though, since it could be any of the next couple of releases. I choose 062, though, because the changes were so limited, and it looks like a classic case of forgetting to bump a couple of numbers, which I imagine would only happen now and again.

Therefore, I've created a ckc061.tar based solely on the Usenix tape since I think the case is stronger for that. I've created a ckc062.tar based on the DECUS RSX87a tape.

There's also a 'minix1' version in the archives supposedly based on 4D(061). It's actually quote close to the now-found sources, with the following differences:

  1. #ifdef for the version strings
  2. Some newlines removed from some messages to fit them on the screen
  3. Compile nits: additional prototypes, some longs become ints, %D instead of %ld
  4. Mostly based on V7, but with tweaks for tty differences
  5. A logging function rewritten to be smaller
Based on the size, nature and extent of the diffs, we have another confirmation of the 4D(061) sources found have good fidelity to the likely release, and minix1 in the archive is a direct descendent of 4D(061) and not a different version. Since it's missing ckwart.c, it's impossible to know if it was from the slightly newer version on the DECUS tape or not (for the files in minix1.tar.gz, there's no way to know).

4E(067)

As with 4D(060), we have one copy of this version from the RSX87b DECUS tape. I've created a ckc067.tar to capture it. Not much more to say about this, except it arrived as xk* instead of ck* files. I've renamed all the xk to ck files in this process, since the makefiles still had the original ck file names in them. The xk thing was normal, from any number of announcements in Info-Kermit, including the 4E(066) announcement:

The files are in KER:XK*.* on CU20B.COLUMBIA.EDU (available via anonymous FTP) and XK* * on CUVMA (available via BITNET KERMSRV), and will be on Kermit Tape B, and should also show up at Oklahoma State U for UUCP access within a couple weeks. The new files don't replace the current C-Kermit files (CK*.*), and will not do so until all the systems demonstrably work. In order to use these files, you have to rename them to CK*.* (or ck*.*) so that the various Makefiles and other build procedures work, and the include (.h) files have the right names. There's a program to do this, XKTOCK.C, which should be fairly portable (if it doesn't work, the files can be renamed by hand).


so I've just done what was instructed. It appears that only 4E(066) and 4E(067) were distributed this way, as the files were renamed back for 4E(068). And 4E(068) lasted only for a couple of days because 4E(070) was released quickly after it to fix two fatal flaws, the summary of which is too good not to share
 . getcwd() not defined in BSD UNIX, breaking BSD versions.
 . Unconditional reference to SIGSTOP, breaking non-BSD versions.
So in effect, we have the last two beta versions before the final 4E(072) release (071 was also a brief flash in the pan)

4E(070)

We have two copies of this. They match almost exactly. The only differences between the two is that the 10th Edition Unix version has 10th Edition Unix (V10) changes. Since those are the only change, and the change is in context exactly the change you'd expect, we can say with a high degree of confidence iubioarchive copy is the original copy of 4E(070). It's unclear how important this release is, but I've made a ckc070.tar tarball based on this find after renaming the files to lower case and changing the line endings to Unix.

4E(072) and 4F(095)

This release is in the Kermit archive. It was the first 4x release to have been in the archive (apart from the later found 4.2), so we'll stop our journey here. Other than 4F(095), there's no more 4F versions available that I've been able to locate. There's references to 4F(077), 4F(080), 4F(085), 4F(090) and 4F(094) in Info-Kermit archives as well, but it only has announcements for 4F(085) and 4F(094) in it, suggesting the announcing traffic has gone elsewhere. 5A and 4F were developed in parallel after this, and 4F was never officially released... Ah, but sorting out that tangled history will wait for another day (and likely another person).

Conclusion

So a simple hunt turned up a number of new releases. A copy of the final 4.0 and 4D releases, as well as testing copies of 4C, 4D and 4E. Or: 4.0(020), 4C(053)+dec, 4C(056), 4D(060), 4D(061), 4D(062), 4D(067) and 4E(070). Plus I turned up another copy of 4.2 that matches the copy in the kermit project's archives. 7 apparently unmodified releases and one modified release. This turns out to be far more than I'd hoped for when I began this little snipe hunt. I've made the files I found available (see links above) for anybody that's interested.

If you have a clean copy of any of the versions in the 4x series of releases not listed here, please get in touch with the author. I'm looking for anything from 1990 or earlier.

After posting this, Frank da Cruz tried out the 4.0(025) edit.  He found a few transcription errors from the DECUS tape, patched them up and posted the result at The Kermit Project for all to see.

20200421

More Venix reconstruction work

More Venix reconstruction work

With the simulator working well enough to run many / most of the Venix binaries (the C compiler being a notable except), I thought I'd turn my hand to some reconstruction work. You know, the whole reason that I started this thing up.

System Calls

There's no easier code to write in Unix that does something useful than interfacing to system calls. These calls are usually 'load these registers (or this block) with those values and trap to the kernel'. Venix is no exception to this rule.

Venix has about 60 system calls it implements. They are so regular I thought I'd be able to write a generator for all the system calls, except maybe pipe. I thought this because FreeBSD generates the glue for all its system calls, though pipe has been an exception because it needs to return two values.

Little did I know there's really 74 .s files associated with the system calls. Only about 50 of the system calls are regular. The rest are irregular in a number of different ways.

Return Values and weird pointers

There are 5 system calls that require some special handling just because the return values are weird. These include time(2) which you pass a pointer to a long to put the value of time into (that's done in userland in Venix, rather than with a copyout call that other systems use). This mirrors what's done on the PDP-11, so it's no real surprise here. pipe(2) also falls into this category. You pass it an array, and the system call caller is responsible for stuffing the data back into this array. wait(2) is the same way.

stime(2) is similar, but in the opposite direction: It loads the values from a pointer into a register rather than having the kernel just copy that value into the kernel. That's weird because plenty of other things do it with pointers.

Variations on a theme

dup(2) can be generated automatically, but dup2(2) can't. dup2 is the variant where you set the new fd rather than allowing the kernel to pick one for you. Rather than having two system calls, you just add 64 to the fd and call dup. What's weird is that dup(2) is documented to take one argument, but the dup.o file, when disassembled, clearly passes two arguments. This means that there's tack garbage for one of them (a bug!). dup2(2) makes sense to pass two args, but dup(2)? Really? So that's the first bug I've found in the generated code.

brk(2) and sbrk(2) are similar, but they also have to keep track of where the actual break point in the address space is.  And it's a little weirder than that for some NMAGIC binaries that put the stack at the top of memory (right?) and have the heap grow between the top of bss (ebss) and the bottom of the stack. I suspect bugs in this area of my emulator since the C compiler is one of the few binaries with this sort of odd arrangement.

Then there's the exec(2) family of calls. They are all a bit different in terms of calling them, but in assembler you can morph them all into one system call. Sweet, eh? Turns out to be hard in 'C' to pull this off portably, but this predates those worries. Both PDP-11 and the 8086 port use this same trick.

4 arguments are hard.

There's 3 system calls that have 4 arguments. 2 (lseek(2) and locking(2)) do it one way, the other does it a second way (ptrace(2)). And the 2 that appear to do it right (in that it follows the same convention as the 3 arg call) have the same bug.

Most of the 3 arg calls look something like the following:
_read:
        push    bp
        mov     bp,sp
        mov     bx,#3
        mov     ax,*4(bp)
        mov     dx,*6(bp)
        mov     cx,*8(bp)
        int     0xf1
which is simple and straight forward. The 2 call variants don't load anything into cx, the one arg calls skip dx, etc.

But the 4 arg ones are weird. In that they are really 3 args with one of the args being too fat. Let's look at lseek, which has three args, but one of them is a long. lseek takes an int, a long and a second int, let's see how it does it's thing:

_lseek:
        push bp
        push si
        mov bp,sp
        mov bx,#19
        mov ax,*6(bp)
        mov dx,*8(bp)
        mov cx,*10(bp)
        mov si,*12(bp)
        int 0xf1
Notice how the offsets are all buggered up. This code will work, and the offsets are correct, but only because the code botched the preamble to setup bp so we can access the args. The push si interrupts that, so bp has the wrong value, so you have to offset everything by 2. Another bug found through the powers of disassembly. So I mostly generate lseek, then hand tweak it to make sure it's the same file.

Signals

Then there's signals. They are hard, and they do all this wonderful weird stuff with trampolines and the like. This one file is by far the longest one.

Getting the same .o

A few years ago, I found the Minix disassembler dis88 floating around. I've been steadily hacking on it to produce good quality disassembled code. It's tough, though, since there's so many different rules. As I'm doing this reconstruction, I'm learning more. I'll go into those on another post.

But to make things as testable as possible, I've created a gensys script. This generates all the system calls I can, and tries to test the ones I can't. It does this by using the emulator (86sim) to run the assembler. we then compare the disassembled output between the original and the new one and report diffs. No diffs, I'm done! Like I said, the emulator is coming along nicely.

I tried running this same process on the Rainbow, and it was so slow I could only do one or two items in the time it took me to iterate through 5 or 6 different problems and rebuild everything.  The emulator is starting to save time for the investment in writing it...  We'll see if it is all worth it in the end.

20200419

Venix emulation update

April 2020 Venix update

I've had a bit of spare time lately and have refocused back onto Venix.

I've found a number of bugs, implemented fork/exec (brokenly, but well enough to mine 'cc' for what commands it's trying to do) and I have a status update.

Toolchain

The biggest bug I fixed was in lseek. I'd forgotten to translate from the emulated FD to the host's FD. When I did that, a lot of things started working, including cpp, as and ld.

This means I have almost all of the toolchian working. c0 and copt aren't working (though I don't know how to use copt, so maybe it is). That's unfortunate, but we're close.

I can compile hello world on Venix, and then snag the .s file. Once I have that, I can assemble it in emulation and ld to produce a working binary (which also works in emulation). cpp also produces the same output as running on Venix.

So this is huge since things are so much faster on my host environment.

crt0.s

OK. Now that I have a working as on my box, I thought to go about recreating the .s files likely to have been on Venix. I started with crt0.s. I was able to disassemble the old one and use that to recreate a good .s file. It assembles almost to the same binary. The only difference is in the unused parts of the relocation entries. I don't know why that is, but it doesn't seem to affect things since I can generate the same hello world executable from either the stock crt0.o or the one that I've recreated. I'm guessing the differences don't matter.

Next Steps

I need to do a reorg to get proper fork/exec behavior. This will involve factoring out the memory, register sets and fd tables to their own context while retaining a process table, etc in the main application. Once I do that, I can make fork/exec work properly and maybe get cc working finally.

I need to fix sbrk for the c0 binary. It loads without a stack size, so that puts the stack at the end of memory. The program itself takes up 18k of text 59k of data/bss, which leaves just 5k for combined heap / stack. Emulated sbrk doesn't check for collisions, so c0 may fail due to not properly failing an sbrk request. There may be other details.

I should pass `basename $0` as arg0, rather than the whole path. This will save ~40 bytes on the initial stack.

I need to create the notion of a prefix directory so we search there first for existing files. This will allow cc to work not in a chroot since I've created a virtual chroot of sorts. Linux emulation on FreeBSD does this as well.

I should add a proper command line parser.

Finally, I'm going to start plowing through all the system calls to at least get those going in the restored libc sources.

Final Words

The project has made some nice progress and is coming along nicely. I've also done some investigations with using pcc to see if it generates better code or not. No assembler or loader appear to be around, unless I've missed something that Minix uses (which I may have). I do know it doesn't generate code that the Venix as(1) program can eat. Maybe I'll have to try creating a Venix back end... There's also a ia-16 project based on gcc6 that I have generating code, but it's ELF based so there'd be some work. I have it building on FreeBSD and it seems to be decent.

Finally, I went nuts looking for old versions of Kermit. I found a few. More on that in a different blog.