Page 1 of 1

Dual Core Implementations

Posted: Tue May 21, 2019 9:24 pm
by berlinetta
Hello,

I am new to the use of multi-core devices, and was hoping to get more information / examples on how to properly configure the ESP32 for use of both cores. I have read data sheets and other threads discussing the topic and have gleaned some information from them, but it is not nearly enough to move forward with.

We plan on creating a communications device which could be used to implement connectivity needs for all of our products. We were hoping to place the ESP32 at the heart of this solution because it provides a cost-effective hardware solution with open source code that we can leverage to control the future of our product lines. We have paid a consultant to research the ESP32 and perform some preliminary design work as a proof of concept. One of the issues that was encountered in that exercise dealt with latency in serial communications between the ESP32 and our local processor. According to the consultant, the implementation of the SPI driver code for the ESP32 had some fairly serious limitations in how the traffic had to be packetized and queued. Since the driver and the queue are managed by the OS, there were latency issues with passing commands to the ESP32 and retrieving responses in a timely fashion. Additionally, pin assignments via the GPIO ring caused delays which prompted the necessity of a "dummy" bit work-around.

Much of what I have seen and read about in regard to example code relies upon the use of FreeRTOS. From what I have read, I believe the default configuration would have FreeRTOS manage the tasks and assign them to an "available" core. In the interest of segregating time-sensitive communications (stack) code from our application code, I was hoping to leverage the multi-core silicon to support the BLE and WiFi stack operations on the first core with the use of the OS, while using the second "App" core for our application-specific "bare metal" code. In theory, this would allow us to isolate our application code and manage the serial peripherals without interruption from the stack and FreeRTOS operations (essentially operating as two separate entities). If this is feasible, I need a better understanding of some root functionality of the ESP32 design before I can progress.

The first topic is memory access. The documentation refers to a flexible addressing scheme which provides both shared and isolated access regions for each core. Unfortunately, any external flash memory would have to be accessed through a common MMU and cache interface which can pose some arbitration issues. The data sheets mention part options with embedded flash, but I cannot find details regarding where this memory fits into the map. Based on the lack of predefined memory regions to support it, I presume this may be treated as "external memory" and accessed through the MMU as well... is that correct? One topic of discussion suggested executing code for the application processor through RAM to avoid arbitration issues with the first core... Is that necessary, or just a manner of ensuring zero delays in operation of the second core?

The next question I have is in regard to interrupt handling. Does the silicon support a single interrupt controller, or does each core contain an interrupt controller that can be configured to assume responsibility for specific events? If the design supports a single controller, is it handled by the first core by default?

Finally, I have questions regarding peripheral support. Is it possible to have the first core - which I envision as handling communications stacks - manage the radio interface and any required timers, while having the second core - which I envision as handling application-specific tasks - manage the interface with the remaining peripherals?

If the peripherals and interrupt handling cannot be independently managed by each core, then the topology I was considering is probably not plausible. If it is plausible to segregate these operations, it would be ideal for us to implement our custom secure bootloader code on the core running our application code. We can then securely connect to the cloud to retrieve firmware updates for both the application and the communications stacks and perform in-circuit reprogramming of the flash memory.

Thanks in advance!
Mark

Re: Dual Core Implementations

Posted: Wed May 22, 2019 3:52 am
by ESP_Sprite
berlinetta wrote:
Tue May 21, 2019 9:24 pm
One of the issues that was encountered in that exercise dealt with latency in serial communications between the ESP32 and our local processor. According to the consultant, the implementation of the SPI driver code for the ESP32 had some fairly serious limitations in how the traffic had to be packetized and queued. Since the driver and the queue are managed by the OS, there were latency issues with passing commands to the ESP32 and retrieving responses in a timely fashion. Additionally, pin assignments via the GPIO ring caused delays which prompted the necessity of a "dummy" bit work-around.
Just out of curiosity: what communication scheme are you using (who is the master, who is the slave, how are you packetizing things, what's the latency requirement etc)?
I was hoping to leverage the multi-core silicon to support the BLE and WiFi stack operations on the first core with the use of the OS, while using the second "App" core for our application-specific "bare metal" code.
You can do that, up to a point; in the current silicon, there will always be instances that need interruption of both cores. (Most extreme example would be writing to flash, which can hold up both cores for a fair amount of mS). Note that although theoretically it's possible to only run FreeRTOS on one core, in practice writing the 'bare-metal' code in a way that 'plays nice' is not an easy task and we don't recommend it. It is however possible to run all but one task on one core, and only have one single task run on the other that will more-or-less get free reign of the CPU time.
The first topic is memory access. The documentation refers to a flexible addressing scheme which provides both shared and isolated access regions for each core. Unfortunately, any external flash memory would have to be accessed through a common MMU and cache interface which can pose some arbitration issues. The data sheets mention part options with embedded flash, but I cannot find details regarding where this memory fits into the map. Based on the lack of predefined memory regions to support it, I presume this may be treated as "external memory" and accessed through the MMU as well... is that correct? One topic of discussion suggested executing code for the application processor through RAM to avoid arbitration issues with the first core... Is that necessary, or just a manner of ensuring zero delays in operation of the second core?
Embedded flash indeed is accessed through exactly the same interface (but slightly different GPIOs) as external flash. Using only RAM will indeed alleviate arbitrations issues in the cache (although you may still have some instances of arbitration on some memory buses.)
The next question I have is in regard to interrupt handling. Does the silicon support a single interrupt controller, or does each core contain an interrupt controller that can be configured to assume responsibility for specific events? If the design supports a single controller, is it handled by the first core by default?
Both cores have their own IRQ handler, and external sources can be switched to an IRQ input on either or both of the cores. Esp-idf handles this in a somewhat opaque way: the rule is that the core that the irq is allocated on, will handle it.
Finally, I have questions regarding peripheral support. Is it possible to have the first core - which I envision as handling communications stacks - manage the radio interface and any required timers, while having the second core - which I envision as handling application-specific tasks - manage the interface with the remaining peripherals?
Yes - all peripherals are accessible by both CPUs directly, so whichever CPU you decide to run the driver code on will handle that peripheral.

However, on a more global level, you're pretty much going against the grain here by trying to micro-manage all of this. The ESP32 is meant as a fast processor, and that means some sacrifice in predictability - you can't have two 240MHz cores without cache or arbitration. We compensate for this by having a lot of very flexible peripherals that should suit almost any need where smaller processors would require bit-banging the interface. SMP is also very much something built into the fabric of esp-idf, so you'd again be going against the grain by isolating one CPU. Can I ask what issues exactly you have in the current implementation? It may be a better option to take a better look at these, to see if they are solvable while going along with the SDK instead of perpendicular to it.

Re: Dual Core Implementations

Posted: Wed May 22, 2019 7:00 pm
by berlinetta
Thanks for the quick reply... this information definitely helps clarify the operation for me!

Perhaps it is best if I provide some background information on my project...

Our current connected product utilizes a device which provides both BLE and WiFi connectivity. The device has had a history of problems with memory management and crashes of its embedded firmware, none of which we have access to (black box) and we have to wait for the manufacturer to address issues as we find them. Our host controller communicates with this device via SPI and the manufacturer has provided the driver interface files for our host controller. These files are poorly designed and do not come with any documentation.

We recently contracted with a consultant to investigate the feasibility of utilizing the ESP32 as a replacement for this connectivity solution. As part of the project, they developed a proof of concept that was intended to "drop-in" to the original solution's socket. Therefore the original SPI communications interface was utilized.

During the course of implementing this design, the consultant discovered several issues with the SPI peripheral operation of the ESP32...
1) The default SPI signals are assigned to pins which also serve the JTAG port. Since the consultant was utilizing the JTAG port for their development, the SPI signals were relocated to other GPIO. They discovered the need to invoke a "dummy" bit work-around due to delays encountered when remapping the SPI signals through the GPIO matrix.
2) The SPI peripheral is managed within the ESP32 firmware where port requests are queued. The consultant discovered the queue mechanism required message sizes of at least 12 bytes in length, where the overall length must be a multiple of 4 bytes for proper operation. All packetized messages had to be padded to adhere to these requirements. The additional overhead is causing further delays in the communications.
3) The management of the SPI peripheral by a task in the OS takes too long to respond to queries. While the original communications chip will respond to queries within a few microseconds, the contractor determined a 35us delay on the host between reading the bytes was necessary for proper operation with the ESP32.

Our embedded firmware designs are traditionally bare metal and utilize a simple scheduler to execute tasks at specific time intervals. If any of those tasks require the use of the SPI port, they execute a single bus transaction within the time slice and await the result (blocking). If further bus cycles are required, they are broken up between time slices. Although this implementation is blocking, the restriction on the use of single transfers at high data rates minimizes the delay between tasks. This operation allows multiple tasks to access the SPI peripheral and function normally without the need for an arbiter.

Note that we typically implement a UART task to manage a "debug" port which provides us real-time diagnostic traffic for the firmware activity. This port has been extremely helpful in diagnosing any issues during development of our products. The UART is run at a high baud rate and the communications drivers are interrupt driven and provide a large enough circular buffer to prevent any issues with back-logging. Unlike SPI port activity, modules of code are free to request a dump of traffic to the debug port and the data is buffered until it can be sent.

Please note that I am just beginning to dig into the source code provided as part of this project, so I may still find issues that could be addressed to resolve the communications problems. Based on the description of how the SPI driver is working on the ESP, however, I am skeptical that I can do much without modifying that operation.

After reviewing the consultant's report, my initial thought was to potentially use a separate core to manage the serial communications operations with the host controller. If the cores were truly independent and the communications core could be isolated from the OS activities, we could utilize our proven bare metal code to manage the SPI and UART interfaces. If there is another way to guarantee the proper operation of the communications, I would have no problem relinquishing control to the SMP.

While we could manage the SPI and UART traffic with high-priority tasks within the OS, I am concerned about adversely effecting the BLE and WiFi stack operations. If there is a better way to manage this interface, I am open to suggestions.

Again, I am very grateful for your quick reply and the valuable knowledge you provide as I undertake this learning curve.

Best Regards,
Mark

Re: Dual Core Implementations

Posted: Thu May 23, 2019 4:33 am
by ESP_Sprite
Gotcha. First of all, you can still do the core1=wifi/bt, core2=peripherals split even with FreeRTOS. The trick is just to make sure WiFi/BT runs on one core (you can select this in menuconfig, I think it defaults to core 0) while you start the tasks to initialize / handle the peripherals on the other core (using vTaskCreatePinnedToCore).

Secondly, perhaps it's also an idea to look at the SDIO slave peripheral the ESP32 has? I do agree that SPI slave can be a bit finnicky, and I know we've had customers use the SDIO slave peripheral successfully for tasks that are like yours.

The UART bit should not be an issue: the ESP32 has pretty large FIFOs for these, the UARTs can be run entirely interrupt-driven and even DMA-based if so needed.

Re: Dual Core Implementations

Posted: Fri May 31, 2019 2:49 pm
by berlinetta
Thank you for the clarifications!

If I elect to utilize the SDIO slave interface, can that be connected to the host controller's SPI peripheral without an issue, or are there any issues with protocol?

Regards,
Mark