Does using the branch linked above help at all?
There are different scales of slow. I meant that a difference of tens of milliseconds for a particular SPI flash operation (during which time most other functions have to be suspended as the flash is busy) may be the key here. However, it sounds like I probably wasn't using the right method to reproduce!Its not slow because I when it does work I can send and write the whole 1Mbyte partition in a few seconds.