Exact same code runs twice as fast on the ESP32-C3 than on an ESP32-S2. Did I do something wrong or is this to be expected? #17161
-
So I have this Neopixel matrix driver that I wrote in micropython, the show() method takes around 40-50ms when running on my ESP32-C3 Supermini board but takes around 90-100ms on my ESP32-S2 mini board (240 leds on both). I upgraded to an ESP32-S2 for this project because I thought replacing the ESP32-C3 which runs at 160MHz to a 240MHz ESP32-S2 would allow for slightly faster refreshing, but it's the complete opposite? I guess there is an architecture difference between the two, but I did not expect it to make it 2x slower on the ESP32-S2. Does xtensa just have worse performance per clock? is the xtensa micropython port less optimized than the riscv port? or did I make a mistake somewhere? (yes i have manually set the frequency to 240MHz at boot for the ESP32-S2). Here is the code btw, it draws from the buffer to the neopixel strip. (Optimization suggestions welcome too. I couldn't find any library or example code for a neopixel matrix using framebuf so i had to make one myself) @micropython.native
def show(self):
buffer_index = 0
for i in range(0, len(self.buffer), 2):
# Read two bytes and form a 16-bit RGB565 value
rgb565 = (self.buffer[i + 1] << 8) | self.buffer[i]
# Extract and scale RGB components
r = ((rgb565 >> 11) & 0x1F) * FIVERATIO
g = ((rgb565 >> 5) & 0x3F) * SIXRATIO
b = (rgb565 & 0x1F) * FIVERATIO
line_number = (buffer_index//self.width)+1
r, g, b = (int(self.gamma_lut[c] * self.brightness) for c in (r, g, b))
if (not (line_number & 1)):
line_index = buffer_index%self.width
self.neopixel[line_number*self.width - line_index - 1] = (r,g,b)
else:
self.neopixel[buffer_index] = (r,g,b)
buffer_index += 1
self.neopixel.write() |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Actually, another thing I have forgotten to consider is the 2MB PSRAM on the ESP32-S2FN4R2 chip used in the ESP32-S2 Mini board. Could this code be considered memory heavy? I mean we are reading from the framebuffer itself, then the gamma_lut and then writing to the neopixel buffer. Could the use of a PSRAM slow things down in this situation? and is there a way to disable PSRAM from micropython or is a rebuild required? the generic S2 build detects a PSRAM and uses it automatically. |
Beta Was this translation helpful? Give feedback.
-
The discrepancy is probably due to the needs of FreeRTOS. |
Beta Was this translation helpful? Give feedback.
Okay I got the same exact code running on an ESP32-S3 this time which has the same Tensilica Xtensa 32-bit LX7 core as the ESP32-S2 (but two of them on the S3, I'm not sure if/how micropython utilizes this second core) and also lot more RAM.
This time on the ESP32-S3 (no PSRAM) running at 160MHz I got 60.7ms per frame, which is still slower than the 47.5ms on the ESP32-C3 running also at 160MHz. But setting the ESP32-S3 frequency to 240MHz does speed it up to 36.2ms. (I ensured the code is 1:1 exactly the same as the one running on the ESP32-S2)
So I think in conclusion, at the same clock speed, Micropython is a bit slower on Xtensa compared to on RISC-V (Well for this one task at least).…