After tweaking, I was able to get the SD performance up to 750kB/s. This was accomplished by reducing the number of commands I sent to the card, as well as upping the clock rate. I also turned off INVARIANTS and WITNESS. Here's the numbers at each step: baseline: 200kB/s. Double clock rate: 220kB/s. Remove INVARIANTS and WITNESS: 340kB/s. Remove STOP and useless select commands: 750kB/s. I think could squeeze a little more out by tightening up the code, but not more than 1 or 2kB/s.
I think I can further improve the performance to the card by in 4-bit mode. Doing multiblock reads would also help. Maybe switching from a MPSAFE interrupt to a FAST interrupt routine would improve things as well. Hard to say for sure.
However, before doing more performance tweaking, I need to make all the error paths robust in the fact of actual errors. I also need to deal with proper timeouts.