ESPUSB32 Full-speed USB Approach
Posted: Thu Jun 15, 2017 6:22 am
I intend to TRY to get USB 1.1 full speed implemented on the ESP32 over the next stretch of time. There are some approaches I have considered, but I was thinking perhaps you guys could point me in the direction you think may be the most fruitful.
The requirements are:
* Read data in at at least 12 Msample/sec.
* Very precise offset from first level change (GPIO), or the ability to sample fast enough to see transitions (I2S).
* VERY fast turn-around. With USB 1.1 you have only six bit-times between the end of one message to send an ACK. I want to check CRC inline.
* Timing must be precise. Inaccuracies, such as instructions taking unpredictable amounts of time instead of precise amounts of time are unacceptable. Can the DMA or other subsystems make executing code pause? Or does the processor have ultimate performance?
* Can I count on a dedicated core to answer an interrupt? Can I count on every instruction always taking the same number of cycles? I could on the ESP8266. Does the same hold for the ESP32?
* Need TWO pins. One is insufficient to detect a "stop" state.
* I am not worried about computation time. I find Xtensa much easier to play tricks with than ARM. I also learned several tricks when writing the first ESPUSB and ESPTHERNET. I think I can process about 2 or 3 USB timeslices per procedure call. And I can keep the procedure calls VERY small. <3 the Xtensa powerhouse.
Right now, I see a few approaches. I suppose I should get input from you guys before I trudge down any of these paths:
(0) Ultra-low overhead level-change interrupt?
Can I trigger an interrupt on IO change and have my code execute very quickly thereafter? Can I have it run on a dedicated core at highest priority? I have tried doing stuff like this on the ESP32 and found that the jitter is VERY low. So low, it should be fine. I am just worried about latency.
(1) Use the I2S engine.
Can the I2S engine have tiny block sizes in parallel mode? I.e. reading in 8- or 16- bits but only reading chains of 32 bytes or so? The ESP8266 cannot do this reliably. DMA with small blocks can become very buggy on the ESP8266, randomly incurring extra wait cycles between chains and DMA entries being skipped. Is the I2S engine in the ESP32 more robust with small block chains? The reason behind the small block size is because one has to return an answer to the host VERY fast. I can resynchronize every couple bits without much issue by using a software-PLL.
(2) Speed GPIO along.
After several tests, I found that the GPIO on the ESP32 is marginally faster. Though the wait states for reading from GPIO on the ESP32 is higher than on the ESP8266, the overall clock rate is higher as well. This gives me more time to actually work with the incoming data. THEORETICALLY there is enough time to do everything in GPIO. The only potential problem would be if program execution is non-deterministic, or, I can't synchronize off of the initial transition.
(3) Another side-channel
Is there another mechanism by which the IO can be read at high speed? Some mechanism to side-step the wait-states on the reading of I/O. Prefetching? Watching non-enabled interrupt flags? etc. I just can't find that much information about how the external IO is wired internally.
The requirements are:
* Read data in at at least 12 Msample/sec.
* Very precise offset from first level change (GPIO), or the ability to sample fast enough to see transitions (I2S).
* VERY fast turn-around. With USB 1.1 you have only six bit-times between the end of one message to send an ACK. I want to check CRC inline.
* Timing must be precise. Inaccuracies, such as instructions taking unpredictable amounts of time instead of precise amounts of time are unacceptable. Can the DMA or other subsystems make executing code pause? Or does the processor have ultimate performance?
* Can I count on a dedicated core to answer an interrupt? Can I count on every instruction always taking the same number of cycles? I could on the ESP8266. Does the same hold for the ESP32?
* Need TWO pins. One is insufficient to detect a "stop" state.
* I am not worried about computation time. I find Xtensa much easier to play tricks with than ARM. I also learned several tricks when writing the first ESPUSB and ESPTHERNET. I think I can process about 2 or 3 USB timeslices per procedure call. And I can keep the procedure calls VERY small. <3 the Xtensa powerhouse.
Right now, I see a few approaches. I suppose I should get input from you guys before I trudge down any of these paths:
(0) Ultra-low overhead level-change interrupt?
Can I trigger an interrupt on IO change and have my code execute very quickly thereafter? Can I have it run on a dedicated core at highest priority? I have tried doing stuff like this on the ESP32 and found that the jitter is VERY low. So low, it should be fine. I am just worried about latency.
(1) Use the I2S engine.
Can the I2S engine have tiny block sizes in parallel mode? I.e. reading in 8- or 16- bits but only reading chains of 32 bytes or so? The ESP8266 cannot do this reliably. DMA with small blocks can become very buggy on the ESP8266, randomly incurring extra wait cycles between chains and DMA entries being skipped. Is the I2S engine in the ESP32 more robust with small block chains? The reason behind the small block size is because one has to return an answer to the host VERY fast. I can resynchronize every couple bits without much issue by using a software-PLL.
(2) Speed GPIO along.
After several tests, I found that the GPIO on the ESP32 is marginally faster. Though the wait states for reading from GPIO on the ESP32 is higher than on the ESP8266, the overall clock rate is higher as well. This gives me more time to actually work with the incoming data. THEORETICALLY there is enough time to do everything in GPIO. The only potential problem would be if program execution is non-deterministic, or, I can't synchronize off of the initial transition.
(3) Another side-channel
Is there another mechanism by which the IO can be read at high speed? Some mechanism to side-step the wait-states on the reading of I/O. Prefetching? Watching non-enabled interrupt flags? etc. I just can't find that much information about how the external IO is wired internally.