Fixing FreeBSD/arm in stable/9...

Just a quick note. I recently tried to create a FreeBSD-stable 9 image for an Atmel AT91SAM9G20 board I have. It didn't work. So, I tracked down the problems. It turns out that when the unmapped BIO changes went into stable/9 in revision r251874 a single revision (r246881) was missed. This left the ARM busdma API effectively non-functional. Unfortunately, this merge happened before the 9.2 release, so it is quite likely that all platforms of FreeBSD/arm didn't work in 9.2-RELEASE. I've not confirmed this, but the Atmel board I have wouldn't have working networking (and quite likely working USB) in 9.2-RELEASE. A few revisions later, the VM layout changed slightly. This also broke Atmel in a different way. I've corrected both of these issues and as of r259093, and I have my Atmel AT91SMA9G20 board working, at least mostly. There still appears to be an mbuf memory leak in the ate driver that takes the system out after a few hundred megabytes of traffic. This makes it less than completely useful in NFS root operations. My ultimate goal isn't an NFS root, but it is an network server, so I guess I better track the root cause of this down...


Tracking down the problem...

OK. After fighting several gdb build issues, I gave up on earlier versions. I thought I'd go old school to try to find the problem. First step was to figure out where we were dying. I had only one clue, the crash location form /var/log/kernel:
Nov 24 19:12:02 (none) kernel: emulate_load_store_insn: sending signal 10 to fred(8249)
Nov 24 19:12:02 (none) kernel: $0 : 00000000 9001fc00 ffffffff 0000001f 9fff7b3c 7fff7b40 00000000 00000000
Nov 24 19:12:02 (none) kernel: $8 : 0000fc00 7fff7b44 00000000 ffffffff 80151b78 000067b2 000033d9 80184020
Nov 24 19:12:02 (none) kernel: $16: 0040bb00 7fff7c44 004015c4 00000003 00407ccc 1002340c ffffffff 100267ac
Nov 24 19:12:02 (none) kernel: $24: 00000000 020cfe48 1000b5f0 7fff7aa0 7fff7aa0 0040899c
Nov 24 19:12:02 (none) kernel: Hi : 00000000
Nov 24 19:12:02 (none) kernel: Lo : 0000004e
Nov 24 19:12:02 (none) kernel: epc : 00408b70 Tainted: P
Nov 24 19:12:02 (none) kernel: Status: 8001fc13
Nov 24 19:12:02 (none) kernel: Cause : 00000010
Nov 24 19:12:02 (none) kernel: 8001e9fc 8001eac0 80022bb4 800215b4 8001d6b8 00408a7c
Nov 24 19:12:02 (none) kernel: 00408a7c 0040899c
We know that epc is the address of the faulting PC in this message. So we died at 0x00408b70. Normally, gdb would tell us where this location is. Something is busted with stabs in gcc 3.0 and 3.3, so I couldn't get a good traceback when we died. So, I had to go old school. First, I disassembled the code:
mips-TiVo-linux-objdump -S -d tivovbi > tivovbi.dis
So I needed find 0x408a7c. Looking for it, we find:
408b64: 8c420000 lw v0,0(v0)
408b68: 00000000 nop
408b6c: 3043001f andi v1,v0,0x1f
408b70: 8c820000 lw v0,0(a0) ----- here ----
408b74: 00000000 nop
408b78: 00621007 srav v0,v0,v1
408b7c: 30420001 andi v0,v0,0x1
408b80: 1040007f beqz v0,408d80 main+0x1054
408b84: 00000000 nop
408b88: 8f99822c lw t9,-32212(gp)
408b8c: 00000000 nop
408b90: 0320f809 jalr t9
408b94: 00000000 nop
Sadly, this isn't as helpful as I'd like. There's no STABs interleaved, nor any source listed. This is less useful than I'd hoped, and likely the reason that gdb can't cope either. So what to do? Have gcc tell me the raw assembler that it is listing...
mips-TiVo-linux-gcc -g -DTIVO_S2 -c tivocc.c -S > tivocc.s
So, now we have to look for the 'lw v0,0(a0)' instruction and hope for the best. Trouble is, gcc outputs raw register numbers, so we have to lookup that in OABI v0 is $2 and a0 is $4. So, we have to look for lw\t$2,0($4) in the output:
.stabn 68,0,930,$LM506
lw $2,76($fp)
srl $2,$2,5
sll $3,$2,2
addu $2,$fp,160
addu $4,$2,$3
lw $2,76($fp)
andi $3,$2,0x1f
lw $2,0($4) ----------- here
sra $2,$2,$3
andi $2,$2,0x1
beq $2,$0,$L348
There's other places where this instruction is used, but this the only place where andi comes in the immediate instruction before the lw. The rest matches tolerably well. So we've found where we started. It turns out .stabn structure 930 is the line number. This leads me back to the following line:
if(FD_ISSET(remote_fd, &rfds))
How can this be wrong? It turns out it isn't wrong. Elsewhere, remote_fd is overwritten. But this turns out to be due to a bug in gcc 3.0. When I updated to gcc 3.3, the crash goes away. It turns out that
if (flags & TEXT_MODE && FD_ISSET(0, &rfds) && (ret = read(fileno(stdin), &c, 1)) == 1 && c == 'q')
is compiled such that it corrupts remote_fd. When parens are put around the first bit, the problem also seems to be fixed. gcc 3.3 compiles both of them identically. This has ended unsatisfyingly, since I don't see the bug in the assembler (so maybe it just moves the actual bug such that it doesn't bite this variable). Despite the odd end, I thought I'd share how I traced down what line this was at, in case others could benefit from the techniques.


Cross building gdb for TiVo series 2. First attempt fails to work.

OK. Flush with pride after the success of building tools for my TiVo, I thought I'd get the program that I built the tools for working. It isn't obvious what's going on, so I thought I'd build the debugger to look at the core files that I'd been able to get.

So, first the build steps:

tar xvf gdb-7.6.1.tar.bz
cd gdb-7.6.1
rm -rf intl texinfo/ sim
mkdir build-gdb
cd build-gdb
../gdb-7.6.1/configure --target mips-TiVo-linux --prefix ~/tivo/tools/mips-tivo --with-lib-iconv-prefix=/usr/local --without-expat
gmake install

It took a little bit of dorking around to find all the details, but the FreeBSD gdb port gave some good hints.

But this versions must be too new :(. All the core files I was able to generate failed to get a good traceback.

So, I'll give it another shot with an older gdb.


Building old-school TiVo build tools on FreeBSD 9.2-stable


As long-term readers of my blog know, I've been nursing along a TiVo HR10-250 for the past few years. It works great for the low-quality material that my younger son loves to watch, plus I can harvest video off of it with ease.

Recently, we switched up how it was connected to our TV (and indeed, got a new TV too). During this process, we lost the ability to display subtitles. Since my wife and I like to watch The Daily Show and other similar shows upstairs without going downstairs to the big TV with the HD DirecTV player, and the environment upstairs can be a bit noisy, subtitles are quite useful.

So, after crawling around the DealDatabase forums for a bit, I found a good program called tivovbi. It works really well for displaying closed captioning. However, I used a binary I found on the forum. Nothing is more annoying than a program you download from a forum that randomly core dumps for reasons that are totally mysterious.

I can't even hack it to do anything since I have no TiVo build tools.  Looking for tools online, I can't really find anything that isn't just a Linux binary.  The Linux binaries have issues with the FreeBSD linux ABI implementation, owing in large part to their age (binaries from 10 years ago have some issues, I think with just the packages are needed having a subtle incompatibility).

So what should I do. I could create a virtualbox VM and run linux. But then I'd be running Linux, and copying back and forth to a VM is always a hassle in some way.

So, the other alternative is to build the tools from source. I thought it would be easy to do this, but there's a number of issues with it, so I thought I'd write up my experience. This is on a FreeBSD 9.2-stable system on amd64. That last bit will turn out to be important in a bit, because nothing was easy and simple on this project...

After reading through these instructions, I can't help but marvel at how this lack of integration is tolerated, but to be fair, this is building tools that are nearly a decade old at this point and I did have to kludge around lack of support for 64-bit x86 in gcc... This is nearly 60 separate commands to type. Yuck. Also, looking at the layout in chrome, many of the line breaks have liberties taken with them, so your best bet may be to cut and paste much of what I'm doing here...

Locating the tools

Tivo Utils has a bunch of useful links. Including the linux binaries that I've had issues with. Thankfully, there's a shell script called 'build_mips_x_compiler.sh' which builds all the tools and the basic libraries. I thought I could just run it and have everything built. As with everything else in this project, this wasn't so much the case. So I wound up doing a lot of things by hand.

The first bit of the script fetches all the source tar balls. The gnu stuff has had long-term stable paths, so they fetched. However the TiVo linux didn't transfer, since it was in a new location. In fact, it was also for 4.0, and my TiVo was running 6.4a. So, I had to grab that from TiVo linux downloads page. The Linux 6.4 download was exactly what I needed to grab. Once I had these in place, I was ready to start building. You also need binutils 2.13Gcc 3.0 (yes, 3.0!), glibc 2.2.3, and glibc 2.2.3 linuxthreads support.

Also, I'm installing all the tools in ~/tivo/tools tree, and building them from ~/tivo/toolchain. Fetch all of the above linked tarballs into ~/tivo/toolchain (or whatever you want). Now that we've found the tarballs, we can start with the builds. I'll assume the following environment variables are set. TARGET is set to "mips-TiVo-linux" and PREFIX is set to "$HOME/tivo/tools". I've also added ${PREFIX}/bin to my PATH (I didn't from the start, and got quite far before it mattered). I also had gnu make installed as gmake from ports.

Building Tools

Binutils 2.13

Binutils 2.13 is almost trivial to build:
cd ~imp/toolchain
tar xvfz binutils-2.13.tar.gz
mkdir build-buinutils
cd build-binutils
../binutils-2.13/configure  --target=$TARGET --prefix=$PREFIX
gmake -j 20 all
gmake install
cd ..

gcc 3.0 stage 1

gcc 3.0 stage 1 took a lot of trial and error to create. Turns out, there's no support for FreeBSD 9 in this tarball. That's trivial to add, and I've provided some patches. However, the next problem is that FreeBSD/amd64 isn't supported. This problem turns out to be more difficult to overcome. So you have to configure for FreeBSD/i386 (or backport amd64 support, which I thought would be too hard, so I didn't do it). This means you need to have a 32-bit compiler. After all the talk about making cc -m32 working, I thought I could just do gmake CC="cc -m32". This failed, however. The 32-bit applications dumped core, meaning that gcc couldn't complete its bootstrap process. I wound up having to use FreeBSD's xdev to do it:
pushd ~/FreeBSD/svn/stable/9
sudo make xdev XDEV=i386 XDEV_ARCH=i386 WITHOUT_CLANG=t
Once we have that in place, we can build gcc. Except that it doesn't build quite right. We need to add some additional patches. One to work around a weirdness in the obstack implementation where it relied on an illegal lvalue autoincrement, and one to work around crazy sh that's likely bogus in fixincl. All of these are included in the above patch.
mkdir build-gcc
cd build-gcc
../gcc-3.0/configure --target $TARGET --prefix $PREFIX --without-headers --with-newlib --disable-shared --enable-languages=c --host i386-foo-freebsd9
gmake CC=/usr/i386-freebsd/usr/bin/cc all-gcc
gmake CC=/usr/i386-freebsd/usr/bin/cc install-gcc
If you wanted to see if -m32 worked, you could skip the make xdev step above and substitute CC="cc -m32" in the two gmake commands. I've also tested values up to 20 for -j for at least the first command.

Building the Linux Kernel headers

The following comes pretty much verbatim from the build_mips_x_compiler.sh and have been verified to work. You'll need to grab this patch for these instructions to work. It works around an expr difference, as well a really cheap kludge to get autoconf.h generated. Also, I had to install the bash port, and create a symlink from /bin/bash to /usr/local/bin/bash, which I've not put inline...
tar xf TiVo-6.4-linux-2.4.tar.gz
cd linux-2.4
patch -p0 < ../tivo-linux-2.4.diff
yes "" | gmake ARCH=mips CROSS_COMPILE=mips-TiVo-linux config
gmake ARCH=mips CROSS_COMPILE=mips-TiVo-linux include/linux/version.h
mkdir $PREFIX/$TARGET/include
cp -rf include/linux $PREFIX/$TARGET/include
cp -rf include/asm-mips $PREFIX/$TARGET/include/asm
which will be enough to get us to the next step...  Woof, maybe I should just make a script for all this stuff... With so many fiddly bits, I'm not sure that's a good idea.

Building gmake 3.80!

Newer versions of gmake don't grok the Makefiles that glibc 2.2.3 generates. All kinds of weird errors and odd behavior. So, to make progress, you'll need to build gmake.
fetch ftp://ftp.gnu.org/gnu/make/make-3.80.tar.gz
tar xf make-3.80.tar.gz
cd make-3.80
cp make ~/bin/gmake380
cd ..
I didn't bother installing it, since I just need it for this diversion so I copied into my bin dir, that I have in my path.

Building glibc 2.2.3

Now we're on to glibc. Things have been smooth sailing up to this point. With glibc, we have to extract, configure, build, fix the build oops, build again, fix some info files, then install. Woof! Sure makes for a difficult to reproduce experience. And those are the build on linux instructions... Apparently there's a lot of host leakage that's making things somewhat difficult to reproduce... but so far that appears to be only with gmake 3.82. gmake 3.80 works much better. More patches needed.
tar xf glibc-2.2.3.tar.gz
tar xf glibc-linuxthreads-2.2.3.tar.gz -C glibc-2.2.3
env CC=mips-TiVo-linux-gcc ../glibc-2.2.3/configure --host=$TARGET --prefix=$PREFIX --disable-debug --disable-profile --enable-add-ons --with-headers=${PREFIX}/${TARGET}/include
sed -i.old -e 's/elf32-bigmips/elf32-tradbigmips/' elf/rtld-ldscript
gmake380 install
cd ..

Munging things around before the second assault  on gcc for its stage 2

Don't know why the original instructions take this time rearrange the deck chairs, but it seems to be necessary.
mv ${PREFIX}/${TARGET}/include/asm ${PREFIX}/include
mv ${PREFIX}/${TARGET}/include/linux ${PREFIX}/include
rm -r ${PREFIX}/${TARGET}/include
rm -r ${PREFIX}/${TARGET}/lib
ln -s ${PREFIX}/include ${PREFIX}/${TARGET}/include
ln -s ${PREFIX}/lib ${PREFIX}/${TARGET}/lib

Build gcc 3.0 stage 2

Now that we have everything else setup, it is time to build the second stage of gcc. Alas, gcc 4.2.1 doesn't allow constructs like (a ? b : c) = d, which gcc 3.0 uses to implement C++. So, no g++ for me. If you want g++, you'd have to build the gcc34 port... The rest is almost boring after all the other hoops we jumped through:
mkdir build-gcc2
cd build-gcc2
../gcc-3.0/configure --target=$TARGET --prefix=$PREFIX --enable-languages=c --host=i386-foo-freebsd9
gmake CC=/usr/i386-freebsd/usr/bin/cc all
gmake CC=/usr/i386-freebsd/usr/bin/cc install
cd ..

Building stuff

Now, it is time build things...  I'll write more blog entries about that...


How to fix a Washing machine

Let's say you have one of those fancy washing machines with sophisticated circuitry.

Well, then if your machine is like mine, it sucks to be you.

Last night, before a trip, the Kenmore Oasis HE machine that we've had for many years started doing something weird. You'd turn it on, and it would start cycling from Heavy, to medium, to light, etc on the soil setting. It wouldn't stop doing it. And if you somehow got it to pause for a moment, the cycle would do weird things: spin at the wrong times, drain at the wrong times. And if you left it on overnight, it would suddenly spring to live at 1am just as you were falling asleep.

So, new parts for this would never arrive in time for me to install them before the trip.

So, what to do. I tried all the obvious fixes: reseat all the connectors. Do it again just in case. Get some rubbing alcohol and clean off the accumulated gunk on the board. None of these fixed the problem. So looking at the board, I thought "geeze, why don't I just remove the switch from the circuit: maybe it's gone bad and shorted out." I mean, if you're falling to earth, you might as well try to fly, right? The switch was wired into the circuit through a diode. Since the diode was SMT and the switch was through hole, I decided to desolder the diode (D59 in the pic). I had no rational basis for thinking this would work. Other folks have reported exactly this same problems on different washing machine forums, but the solution was always "start replacing boards" which wouldn't fly for me today.

So far, it seems to be working. I hope this repair will hold until I get back from my trip...
This photo shows my test run, where I just removed one side....

Who said these new washing machines are irreparable... Oh wait, all the repair guys that have come out and all the online forums, and this isn't really a fix so much as a kludge until I can get back to fix it right...


What could be so hard about an external toolchain.... Step 1: How big a problem...

FreeBSD is moving from gcc to clang as its base compiler. This works really well for x86, and moderately well for arm. However, it doesn't work very well for any other architecture that FreeBSD supports.

Brooks Davis recently committed changes that he claims allow him to build FreeBSD/mips with an external clang and the cross-binutils port.

I thought "how hard can that be, I'll knock together a port for gcc that lets me use that instead of the native compiler.

So, with a little bit of svn foo, I was able to extract the differences in FreeBSD's gcc from the stock 4.2.1 gcc that it is based on. I thought the differences would be small and manageable. However, they were large. In excess of 10,000 lines in the diff file.

Undaunted by this large size, I started breaking down the diffs. Well, first, the diffs, excluding the docs, are 10,652 lines in size. I didn't save how big they were with the docs included.

So I went to separate out the wheat from the chaff. First up is 5172 lines of changes to x86. Why do we need that many changes to x86? Well, to support newer x86 hardware models. since I assume that will be in a newer version of gcc, I chose to totally ignore it unconditionally.

Next, I discovered 924 lines of changes for the FreeBSD configuration of gcc, plus various tweaks to gcc to make it the FreeBSD native compiler. Half of these seem relevant, the other half not so much (I'll discover how much later).

There's also about 2187 lines of bug fixes that are split between back-ports of gcc bug fixes and various warning cleanups to clang will be more happy. Add to that another 340 lines of g++ modernization...

So we're down to less than 2k lines of changes here. They break down as 220 for arm, 675 for mips, 220 for ia64, 245 for sparc and 350 for powerpc (total of 1910) and another 78 line for threading tweaks, 135 for code to check FreeBSD's kernel's printf extentions, and 130 for some tweak to assembler output that I don't quite get, so it goes into the 'look at it later to see if it is still needed' pile.

The first 4 patches slotted into the gcc 4.5 nicely (config-guess changes, the config + arm + mips changes). After tossing out all the extra goo we had been dragging around in our tree, they are about 850 lines of diffs. After adding in the other architectures, I suspect this will double. Plus the other relevant gcc changes, we may see as many as 2000 lines of patches for any port that could be used as a system compiler for any supported architecture).

I have the beginnings of that port, but it is early days. It builds, but still can't make it through buildworld let alone buildkernel. The external toolchain support needs a lot of tweaks to make it happy with gcc, and there's still much I don't understand about why some of them are necessary.  But I'll save those for another time.

Anyway, that's enough for now. I'll report also on moving these patches to 4.6, 4.7, 4.8 and 4.9 when  get around to it. I'll also see what it takes to move them into upstream sources. That should have been done years ago, but that's another tale of woe I don't have time for just now, and even if I did I'm not sure that dirty laundry would be appropriate for my hacking blog.

How C can bite you, an example

Consider the function foo:

int foo() { if (error) return -1; else return 12; }

and consider its use here:

int n;
if ((n = foo()) >= sizeof(bar)) { memcpy(dst, src, n - sizeof(bar)); }

Seems simple and safe right? Well, not so much.  The 'C' standard states that when you have types that aren't of the same rank, a conversion must happen. So, since n is an int, and sizeof(foo) is a size_t (unsigned long), a weird conversion happens when you hit an error in foo. It returns -1, but that -1 is converted to MAX_ULONG, so the conversion is true, and we copy way too much memory smashing everything in it's path.

When a coworker asked me about this, I replied "And verily, thou shalt not compare an int to a size_t when thine int can be that which represents a plurality of deficit as well as a plurality of surplus." because it is easily clearer than the C standard...


TivoWebPlus 2.1 changes

As you may know, I still have a HR10-250 in service.  I run TivoWebPlus 2.1 on the box. It mostly works the way I want it, although at times it is a bit slow.

I checked out the latest version from the sourceforge web site many months ago and found there's some issues with it that I'd like to fix.

I'e made the following fixes:

  1. Sadly, I didn't document very well where I got this fix from. It makes Movies show up as Movies and not as 'Not an Episode' in the ToDo and Now Playing fields. This appears to be a combination of my hacking, but mostly the work of "DJL". I think I downloaded it from a dealdatabase or other tivo forum posting, but I can't find it now.
  2. Fixed the long series number problem so episode numbers are properly displayed.
  3. Added a Thumb link so you can put those in your ToDo lists. I find this handy for quickly killing those things that Tivo thinks I might like, but I don't.
  4. Fixed a problem with descriptions being nothing but spaces (this may have also been a forum find, but I can't find docs for that either)
  5. Fixed a few problems with todofeed.tcl (not sure these are completely good, as I think I was trying to parse the output for something I did differently).
  6. Fixed a spelling error.
OK, these changes won't change the world, but since all work has stopped on TivoWebPlus, I thought I'd share them with the world.  You can find my cumulative patches here. If anybody recognizes #1 or #4 or can locate their origin, please let me know by commenting

I've also hacked together a todo database publishing api. It is crude, but it allows me to cancel shows I've already downloaded via ftp and such.  On the off chance people find it useful, you can find it here.

If you find these things useful, please comment. Not sure how many people may be interested, or if the TiVo hacking world has totally moved beyond this device...