After tweaking, I was able to get the SD performance up to 750kB/s. This was accomplished by reducing the number of commands I sent to the card, as well as upping the clock rate. I also turned off INVARIANTS and WITNESS. Here's the numbers at each step: baseline: 200kB/s. Double clock rate: 220kB/s. Remove INVARIANTS and WITNESS: 340kB/s. Remove STOP and useless select commands: 750kB/s. I think could squeeze a little more out by tightening up the code, but not more than 1 or 2kB/s.
I think I can further improve the performance to the card by in 4-bit mode. Doing multiblock reads would also help. Maybe switching from a MPSAFE interrupt to a FAST interrupt routine would improve things as well. Hard to say for sure.
However, before doing more performance tweaking, I need to make all the error paths robust in the fact of actual errors. I also need to deal with proper timeouts.
20061007
20061006
SPI Flash work
Just a quick note today.
I have most of a driver for the At91rm9200 written to deal with AT45D* (and a few other) flash parts connected via a spi bus. These parts are common boot parts, and there may be other manufacturers that have compatible parts.
In the project I'm working on, I'll need this to update the boot blocks as well as the kernel and ram disk that will live on this part. Maybe I'll update the boot blocks to cope with formatting this part as a filesystem. Given the non-standard block size, however, I'm unsure how well this will work unless I hide the non-standard size behind some kind of wear-leveling layer that takes the 1056 byte blocks and converts them to 1024 blocks. For the moment, however, I'm going to take the simple path and allow for data to be read/written and worry about these issues later.
I have most of a driver for the At91rm9200 written to deal with AT45D* (and a few other) flash parts connected via a spi bus. These parts are common boot parts, and there may be other manufacturers that have compatible parts.
In the project I'm working on, I'll need this to update the boot blocks as well as the kernel and ram disk that will live on this part. Maybe I'll update the boot blocks to cope with formatting this part as a filesystem. Given the non-standard block size, however, I'm unsure how well this will work unless I hide the non-standard size behind some kind of wear-leveling layer that takes the 1056 byte blocks and converts them to 1024 blocks. For the moment, however, I'm going to take the simple path and allow for data to be read/written and worry about these issues later.
20061003
FreeBSD MMC/SD driver update
Today was a good day.
I'm reading data from both the 256MB and 512MB SD card. In fact, I've been able to mount the root file system and run programs on them. The performance is great for a first cut, but by no means there yet. The data clock can go up to 30MHz. The Linux driver limits things to 25MHz. However, 25MHz really is 15MHz (because most people run the Atmel part with an MCK of 60MHz and you get 60 / 2 or 60 / 4 as frequencies). The Linux driver also only does 1 bit data bus. This limits them to about 2MB/s max, likely a little less. I've never tested their driver, so I don't know how fast it goes. My driver is getting about 200kB/s (at 15MHz) or 220kB/s (at 30MHz). This seems to indicate that I'm doing a lot of extra work since doubling the clock speed only gained me 10%.
Reading my driver, it looks like I'm doing a lot of extra work. For every read, we break it down into 512 byte blocks. For each block, we select the card, read a single block, issue a stop and then deselect the card. This is a lot of extra overhead. Well, it isn't QUITE that bad (we select, read 512 byte blocks, and deselect per request), but we do do the stop for every block. I'll have to see if I can eliminate most of these. I can also go to 4 bit bus, but if I do that right away I might only get 30-40kb/s. After I eliminate the extra commands, I think I'll be able to get a megabyte/s (a 5x improvement). I think I can even get more by going to multiblock commands.
I also need to implement writing to the card. And MMC support. And a bunch of other stuff.
Here are some things that I've learned.
So all in all, a good day. This should be ready to commit soon, assuming that the baby that's on the way doesn't come tomorrow... Otherwise it will be a little longer...
I'm reading data from both the 256MB and 512MB SD card. In fact, I've been able to mount the root file system and run programs on them. The performance is great for a first cut, but by no means there yet. The data clock can go up to 30MHz. The Linux driver limits things to 25MHz. However, 25MHz really is 15MHz (because most people run the Atmel part with an MCK of 60MHz and you get 60 / 2 or 60 / 4 as frequencies). The Linux driver also only does 1 bit data bus. This limits them to about 2MB/s max, likely a little less. I've never tested their driver, so I don't know how fast it goes. My driver is getting about 200kB/s (at 15MHz) or 220kB/s (at 30MHz). This seems to indicate that I'm doing a lot of extra work since doubling the clock speed only gained me 10%.
Reading my driver, it looks like I'm doing a lot of extra work. For every read, we break it down into 512 byte blocks. For each block, we select the card, read a single block, issue a stop and then deselect the card. This is a lot of extra overhead. Well, it isn't QUITE that bad (we select, read 512 byte blocks, and deselect per request), but we do do the stop for every block. I'll have to see if I can eliminate most of these. I can also go to 4 bit bus, but if I do that right away I might only get 30-40kb/s. After I eliminate the extra commands, I think I'll be able to get a megabyte/s (a 5x improvement). I think I can even get more by going to multiblock commands.
I also need to implement writing to the card. And MMC support. And a bunch of other stuff.
Here are some things that I've learned.
- When enabling the card, setting the MCIEN bit (enable) is correct, while setting MCIDIS is wrong.
- Disabling the part with MCIDIS still allows the interrupts to happen and the device to mostly kinda work, except all replies are 0.
- When doing data transfers, a data clock rate faster than '0' should be used to allow for the transfer to complete in a finite amount of time.
- putting panic("oink") in unimplemented functions rather than "// XXX WRITE ME" facilitates discovery of critical, unimplemented routines.
- Locking, but never unlocking, a lock makes it hard for other threads to acquire it.
So all in all, a good day. This should be ready to commit soon, assuming that the baby that's on the way doesn't come tomorrow... Otherwise it will be a little longer...