ESP32-S3 PIE: Extracting high-precision results from QACC

pmjobin
Posts: 1
Joined: Wed May 17, 2023 9:50 pm

ESP32-S3 PIE: Extracting high-precision results from QACC

Postby pmjobin » Wed May 17, 2023 11:07 pm

Hello,

I'm writing code targeting the Processor Instruction Extensions (PIE) of the ESP32-S3 to perform SIMD multiply-add operations on signed 8-bit values. More precisely, the code consists of a sequence of six consecutive EE.VSMULAS.S8.QACC instructions followed by the extraction of the results from the accumulator register pair QACC_[H/L].

In the QACC_[H/L] register pair, the results are represented with 20 bits of precision. I need to extract 16 bits out of these 20 bits. Unfortunately, there doesn't appear to be a trivial way to achieve this. According to the ESP32-S3 Technical Reference Manual, there are three different methods to obtain the values contained in QACC_[H/L]:
Option A.1) EE.SRCMB.S8.QACC : Extract 16x 8-bit data segments from the 16x 20-bit accumulators in QACC_[H/L], shift, saturate and pack as 16x signed 8-bit values into QR register.

Option A.2) EE.SRCMB.S16.QACC : Extract 8x 16-bit data segments from the 8x 40-bit accumulators in QACC_[H/L], shift, saturate and pack as 8x signed 16-bit values into QR register.

Option B) EE.ST.QACC_[H/L] : Write the QACC[H/L] registers as-is to memory.

Option C) RUR.QACC_[H/L]_n : Copy 32-bit segment from QACC into AR register.
Option A.1 is obviously the most efficient out of the lot. Unfortunately, it only allows extracting 8 bits from the accumulators and I need 16. Options B & C rely on writing out the QACC[H/L] registers and manually extracting the 16-bit values using a combination of shift and mask operations, which will cause a huge performance hit in my case. An alternative solution would be to process 8x 16-bit values at a time (instead of 16x 8-bit values) and extract the result using Option A.2 but this will make the algorithm run at least twice as slow.

Am I missing something here? Is there another way of extracting high-precision values from the QACC accumulator in an efficient manner that I am unaware of?

On the subject of the EE.SRCMB.S[8/16].QACC instruction, there appears to be an undocumented sel2 field in the instruction. It is shown in the assembler syntax provided for this instruction in the technical reference manual :
Screenshot from 2023-05-17 18-48-39.png
Screenshot from 2023-05-17 18-48-39.png (27.31 KiB) Viewed 644 times
I verified that it is possible to substitute the 0 at the end (circled in green in the screenshot) with a 1 and that this changes the result stored in qu in some circumstances, but I've been unable to determine exactly how. Is there any explanation for the presence of this bit or its effect?

Thanks for your help,
P-M

Who is online

Users browsing this forum: No registered users and 100 guests