I'm looking for documentation on using the FPU on the s3. I searched the Programming Guide and Technical Reference Manual but found nothing.
Thanks
FPU Documentation for s3
-
- Posts: 1702
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: FPU Documentation for s3
What documentation do you want?
Normally you don't explicitly 'use' the FPU, you just let the compiler deal with it.
On the assembly level, the FPU is used via the Xtensa ISA's floating point instructions.
Normally you don't explicitly 'use' the FPU, you just let the compiler deal with it.
On the assembly level, the FPU is used via the Xtensa ISA's floating point instructions.
Re: FPU Documentation for s3
Pretty much the "you just let the compiler deal with it." part. How can I verify that is being used by the compiler. I have a multiplication that is taking very long to process, longer than a 16 bit SPI transfer, which is unimaginably long. I need to verify if it is fact working. I have also heard rumors that it doesn't do division, so I would like to see what other short comings it may have.
Thanks
Thanks
-
- Posts: 1702
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: FPU Documentation for s3
You can play around with the compiler explorer to see what instruction sequences gcc puts out.
According to the Xtensa ISA, 4.3.11.5 "Divide and Square Root Sequences", a single FP division indeed takes around 25 FP instructions.
Multiplication should be pretty fast, but note that a) ESP-IDF does 'lazy-saving/restoring' of the FPU registers, which means that the first FP instruction executed in a task after a context switch can appear to effectively take a hundred or so clock cycles, and b) no FPU use in an ISR context.
According to the Xtensa ISA, 4.3.11.5 "Divide and Square Root Sequences", a single FP division indeed takes around 25 FP instructions.
Multiplication should be pretty fast, but note that a) ESP-IDF does 'lazy-saving/restoring' of the FPU registers, which means that the first FP instruction executed in a task after a context switch can appear to effectively take a hundred or so clock cycles, and b) no FPU use in an ISR context.
Re: FPU Documentation for s3
Thanks for the Xtensa document.
The last sentence in you response seems to be only partially correct. I have a single task running on core 0 which is nothing but a while loop. I have a single task running on core 1 which is doing multiplication. The FPU is never activated on core 1. There are no context changes on either core. I would appreciate any ideas you may have.
Thanks again for all you help.
The last sentence in you response seems to be only partially correct. I have a single task running on core 0 which is nothing but a while loop. I have a single task running on core 1 which is doing multiplication. The FPU is never activated on core 1. There are no context changes on either core. I would appreciate any ideas you may have.
Thanks again for all you help.
-
- Posts: 1702
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: FPU Documentation for s3
Works for me. On my S3, one single-precision floating point multiplication takes 4 CPU clock cycles. (Or 7 when including the transfer of 2 operands and 1 result between address and FPU registers in the count.)I have a multiplication that is taking very long to process, longer than a 16 bit SPI transfer
Re: FPU Documentation for s3
Thanks for the feedback and sticking with me on this issue. I set up to do more incremental measurements and found that the FPU was in fact working and saw the same number of cycles as you. I was leaning in the direction of an FPU problem because the loop that was doing the multiplication was running slower when a task was running on core 1. I found out that actually everything on core 0 ran slower when a task was running on core 1, even though the core 1 task was nothing more than a while() loop. Some things take as much as %50 more cycles, the multiplication loop takes %32 more cycles.
While I have suggested that the FPU was not being used due to a task running on another core, the latest measurements don't support that.
Thanks again for you help.
While I have suggested that the FPU was not being used due to a task running on another core, the latest measurements don't support that.
Thanks again for you help.
Who is online
Users browsing this forum: Baidu [Spider], Google [Bot] and 70 guests