Profiling on Espressif MCUs woes
-
- Posts: 170
- Joined: Sun May 22, 2022 2:42 pm
Profiling on Espressif MCUs woes
During the last couple of days, I tried to get a working setup for profiling, but it seems very involved.
Although there is quite a bit of documentation with regards to setting up ESP-IDF, I found nothing to really work. I have OpenOCD setup correctly and can debug via JTAG, so the basic setup is working fine. The GCOV example just crashes on the MCU when I try to read the data using OpenOCD.
The Segger System View approach at least lead to some kind of results, but much too detailed and still lacking the information I'm really after. A quick github search for CONFIG_APPTRACE_GCOV_ENABLE=y also leads to very little results.
Is really noone profiling the hottest paths in your code in order to optimize? What kind of tools / what setup are you using?
Although there is quite a bit of documentation with regards to setting up ESP-IDF, I found nothing to really work. I have OpenOCD setup correctly and can debug via JTAG, so the basic setup is working fine. The GCOV example just crashes on the MCU when I try to read the data using OpenOCD.
The Segger System View approach at least lead to some kind of results, but much too detailed and still lacking the information I'm really after. A quick github search for CONFIG_APPTRACE_GCOV_ENABLE=y also leads to very little results.
Is really noone profiling the hottest paths in your code in order to optimize? What kind of tools / what setup are you using?
Re: Profiling on Espressif MCUs woes
While SystemView provides detailed insights into your system's behavior, it might not be the ideal tool for identifying and optimizing hot paths in your code. It's more focused on system-level tracing rather than profiling specific code paths.
-
- Posts: 1818
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: Profiling on Espressif MCUs woes
My €0,02:
0. Don't optimize if there's no performance issue.
1. General profiling/optimization of regular application code likely doesn't make much sense.
1.1. Profiling code which uses blocking operations is likely futile.
2. Computationally heavy operations may be worth optimizing; i.e. functions which have loops in them with (on average) hundreds or thousands of iterations which do not block.
3. Manually instrumenting the (few) functions in question is an option; esp_cpu_get_cycle_count() is very helpful.
4. FreeRTOS run time stats can also be helpful.
0. Don't optimize if there's no performance issue.
1. General profiling/optimization of regular application code likely doesn't make much sense.
1.1. Profiling code which uses blocking operations is likely futile.
2. Computationally heavy operations may be worth optimizing; i.e. functions which have loops in them with (on average) hundreds or thousands of iterations which do not block.
3. Manually instrumenting the (few) functions in question is an option; esp_cpu_get_cycle_count() is very helpful.
4. FreeRTOS run time stats can also be helpful.
-
- Posts: 49
- Joined: Thu Feb 22, 2024 3:59 pm
Re: Profiling on Espressif MCUs woes
I also need a profiler for ESP32. In the past on ESP8266 I was able to use OpenOCD to profile by polling the program counter, and I'm looking to do something similar on ESP32-S3. I don't disagree with the other comments, but sometimes you need to look at the system as a whole and see where your time is being spent to see where to tune.
I'll also note that the last time I had this working on ESP8266, OpenOCD required a debug ISR and injecting a small function to readout the program counter per PC-sample, which was horrifyingly slow.
I'll also note that the last time I had this working on ESP8266, OpenOCD required a debug ISR and injecting a small function to readout the program counter per PC-sample, which was horrifyingly slow.
-
- Posts: 49
- Joined: Thu Feb 22, 2024 3:59 pm
Re: Profiling on Espressif MCUs woes
Tulip Creative Computer has a nice write-up, but with built-in USB-JTAG on ESP32-S3, I was able to reach 8-9 samples per second. Tulip was able to reach 12.7 samples/second, but neither is really very useful. This closely matches where I reached on ESP32 in 2017. OpenOCD profiling also causes my program to drop a lot of packets, since the debugger pauses the program for every sample. It's possible that an external JTAG adapter could perform better, since much of the time seems to be USB1.x latency and not JTAG transfer time, but I'm using external memory which conflicts with the external JTAG pins.
If anyone is more familiar with the debug or trace modules on Xtensa systems, is there anything similar to Arm's DWT_PCSR? On Arm cores that support this, the debug interface can read the DWT_PCSR register to sample the program counter without interrupting the target CPU. For Lauterbach debuggers, the corresponding variable is SYStem.Option.SnoopAddressPC. I did find these XDM registers, `XDM_PERF_INTPC , XDM3_TRAX_DEBUGPC, XDM_TRAX_DEBUGPC.`, but I don't have any documentation for them to tell if they would be useful.
If anyone is more familiar with the debug or trace modules on Xtensa systems, is there anything similar to Arm's DWT_PCSR? On Arm cores that support this, the debug interface can read the DWT_PCSR register to sample the program counter without interrupting the target CPU. For Lauterbach debuggers, the corresponding variable is SYStem.Option.SnoopAddressPC. I did find these XDM registers, `XDM_PERF_INTPC , XDM3_TRAX_DEBUGPC, XDM_TRAX_DEBUGPC.`, but I don't have any documentation for them to tell if they would be useful.
-
- Posts: 9835
- Joined: Thu Nov 26, 2015 4:08 am
Re: Profiling on Espressif MCUs woes
PERF_INTPC has the PC at the time of a performance monitor interrupt which triggers when a performance monitor counter overflows, so it's not really useful for this. However, TRAX_DEBUGPC does seem to be the current executing PC (it's part of the trace port). It's 'lightly documented' when it comes to reading it out directly via JTAG (normally it interfaces to a trace logger), but it might be worth trying. I'm not sure if there is any difference between the XDM and XDM3 variants.Bryght-Richard wrote: ↑Tue Jul 30, 2024 6:04 pmI did find these XDM registers, `XDM_PERF_INTPC , XDM3_TRAX_DEBUGPC, XDM_TRAX_DEBUGPC.`, but I don't have any documentation for them to tell if they would be useful.
-
- Posts: 49
- Joined: Thu Feb 22, 2024 3:59 pm
Re: Profiling on Espressif MCUs woes
Thanks ESP_Sprite! I'll give that a try later.
I wired an FT2232H into OpenOCD - was able to reach ~14 samples per second, mostly due to reduced USB2.0 latency(125us for FT2232H vs 1ms for ESP32-S3 USB JTAG), not fast enough to usefully program our system.
I wired an FT2232H into OpenOCD - was able to reach ~14 samples per second, mostly due to reduced USB2.0 latency(125us for FT2232H vs 1ms for ESP32-S3 USB JTAG), not fast enough to usefully program our system.
-
- Posts: 49
- Joined: Thu Feb 22, 2024 3:59 pm
Re: Profiling on Espressif MCUs woes
Moving to polling XDM_TRAX_DEBUGPC non-intrusively:
A very rough set of changes is available for the ESP32-S3 only here: https://github.com/rsaxvc/openocd-esp32-zomgsofast . This gives me nice, fine-grained profiling data.
This still requires renaming .flash.text section to .text section, and can only analyze a single section at a time - this could be improved in gprof to allow loading ".*text", but for now, use xtensa-esp32s3-elf-objcopy -I elf32-xtensa-le --rename-section .flash.text=.text build\app.elf
ESP_Sprite, do you know if all Espressif cores support TRAX? I think OpenOCD could all Espressif cores similar to how OpenOCD for ARM reads DWT_PCSR, and if the read fails or returns 0x0, falls back to halt-readPc-resume loop.
- Built-in JTAG -> 2.9ksamples/second.
- USB2.0 FT2232H -> 3.4ksamples/second
- Built-in JTAG with request batching -> 18ksample/second
- USB2.0 FT2232H with request batching -> 100ksample/second!
A very rough set of changes is available for the ESP32-S3 only here: https://github.com/rsaxvc/openocd-esp32-zomgsofast . This gives me nice, fine-grained profiling data.
This still requires renaming .flash.text section to .text section, and can only analyze a single section at a time - this could be improved in gprof to allow loading ".*text", but for now, use xtensa-esp32s3-elf-objcopy -I elf32-xtensa-le --rename-section .flash.text=.text build\app.elf
ESP_Sprite, do you know if all Espressif cores support TRAX? I think OpenOCD could all Espressif cores similar to how OpenOCD for ARM reads DWT_PCSR, and if the read fails or returns 0x0, falls back to halt-readPc-resume loop.
Last edited by Bryght-Richard on Mon Aug 05, 2024 12:51 pm, edited 1 time in total.
-
- Posts: 9835
- Joined: Thu Nov 26, 2015 4:08 am
Re: Profiling on Espressif MCUs woes
As far as I know, all Xtensa-based ESP32 chips should; the RiscV based ones probably have something similar somewhere. I'll point the tools team to your branch; it could be worth integrating into our official openocd branch.
-
- Posts: 49
- Joined: Thu Feb 22, 2024 3:59 pm
Re: Profiling on Espressif MCUs woes
Thanks ESP_Sprite. I think it should be possible to support them all with this approach, I just need to figure out how to structure it in OpenOCD so they can all share the same implementation.
Also, does Espressif maintain their own port of gprof, or distribute mainline? It might make sense to patch it so that it loads .flash.text, ".iram.text", as well as the ROM-ELF if possible.
Also, does Espressif maintain their own port of gprof, or distribute mainline? It might make sense to patch it so that it loads .flash.text, ".iram.text", as well as the ROM-ELF if possible.
Who is online
Users browsing this forum: No registered users and 78 guests