Exact same code runs twice as fast on the ESP32-C3 than on an ESP32-S2. Did I do something wrong or is this to be expected? #17161

misaalanshori · 2025-04-21T04:46:35Z

misaalanshori
Apr 21, 2025

So I have this Neopixel matrix driver that I wrote in micropython, the show() method takes around 40-50ms when running on my ESP32-C3 Supermini board but takes around 90-100ms on my ESP32-S2 mini board (240 leds on both). I upgraded to an ESP32-S2 for this project because I thought replacing the ESP32-C3 which runs at 160MHz to a 240MHz ESP32-S2 would allow for slightly faster refreshing, but it's the complete opposite? I guess there is an architecture difference between the two, but I did not expect it to make it 2x slower on the ESP32-S2. Does xtensa just have worse performance per clock? is the xtensa micropython port less optimized than the riscv port? or did I make a mistake somewhere? (yes i have manually set the frequency to 240MHz at boot for the ESP32-S2).

Here is the code btw, it draws from the buffer to the neopixel strip. (Optimization suggestions welcome too. I couldn't find any library or example code for a neopixel matrix using framebuf so i had to make one myself)

    @micropython.native
    def show(self):
        buffer_index = 0
        for i in range(0, len(self.buffer), 2):
            # Read two bytes and form a 16-bit RGB565 value
            rgb565 = (self.buffer[i + 1] << 8) | self.buffer[i]

            # Extract and scale RGB components
            r = ((rgb565 >> 11) & 0x1F) * FIVERATIO
            g = ((rgb565 >> 5) & 0x3F) * SIXRATIO
            b = (rgb565 & 0x1F) * FIVERATIO

            line_number = (buffer_index//self.width)+1
            
            r, g, b = (int(self.gamma_lut[c] * self.brightness) for c in (r, g, b))
            
            if (not (line_number & 1)):
                line_index = buffer_index%self.width
                self.neopixel[line_number*self.width - line_index - 1] = (r,g,b)
            else:
                self.neopixel[buffer_index] = (r,g,b)

            buffer_index += 1
        self.neopixel.write()

Answered by misaalanshori

Apr 21, 2025

Okay I got the same exact code running on an ESP32-S3 this time which has the same Tensilica Xtensa 32-bit LX7 core as the ESP32-S2 (but two of them on the S3, I'm not sure if/how micropython utilizes this second core) and also lot more RAM.

This time on the ESP32-S3 (no PSRAM) running at 160MHz I got 60.7ms per frame, which is still slower than the 47.5ms on the ESP32-C3 running also at 160MHz. But setting the ESP32-S3 frequency to 240MHz does speed it up to 36.2ms. (I ensured the code is 1:1 exactly the same as the one running on the ESP32-S2)

So I think in conclusion, at the same clock speed, Micropython is a bit slower on Xtensa compared to on RISC-V (Well for this one task at least).…

View full answer

misaalanshori · 2025-04-21T05:56:54Z

misaalanshori
Apr 21, 2025
Author

Actually, another thing I have forgotten to consider is the 2MB PSRAM on the ESP32-S2FN4R2 chip used in the ESP32-S2 Mini board. Could this code be considered memory heavy? I mean we are reading from the framebuffer itself, then the gamma_lut and then writing to the neopixel buffer. Could the use of a PSRAM slow things down in this situation? and is there a way to disable PSRAM from micropython or is a rebuild required? the generic S2 build detects a PSRAM and uses it automatically.

5 replies

peterhinch Apr 21, 2025
Collaborator

My understanding of MP memory management is that PSRAM is only used if internal RAM becomes full or too fragmented to enable an allocation to succeed. I'd start by calculating the sizes of your arrays, also by doing measurements:

gc.collect()
print(gc.mem_free()

You might also consider using Viper. This would involve some rewriting, but Viper is amazingly fast - in some situations nearly as fast as Assembler.

Lastly, I don't know if anyone has benchmarked MP riscv vs ARM. The latter has many years of MP development behind it, so it is possible that there is a difference in performance.

misaalanshori Apr 21, 2025
Author

Hmm okay, the ESP32-S2 does have less memory. I am trying to make sense of the results from mem_free and mem_alloc, i tried printing them after drawing an image and got 34160 from mem_alloc and 2028688 from mem_free. How can i tell if ive started to use the PSRAM? Also the total doesn't make sense to me? if we add up the PSRAM and the internal RAM than I think we should have more than 2062848 bytes?

I have considered using Viper, but I don't think I fully understand it yet so I haven't figured out a good approach to this that will actually speed it up. One approach I tried actually slowed things down because of having to use the data type viper wants and also making another function call.

Also, The ESP32C3 is RISCV and the ESP32S2 is Xtensa (not ARM)

misaalanshori Apr 21, 2025
Author

Oh also my estimate for the total memory used by all the arrays and stuff is around 52KB, though i calculated this using full python sys.getsizeof, does micropython have the same sizes?

misaalanshori Apr 21, 2025
Author

Okay I got the same exact code running on an ESP32-S3 this time which has the same Tensilica Xtensa 32-bit LX7 core as the ESP32-S2 (but two of them on the S3, I'm not sure if/how micropython utilizes this second core) and also lot more RAM.

This time on the ESP32-S3 (no PSRAM) running at 160MHz I got 60.7ms per frame, which is still slower than the 47.5ms on the ESP32-C3 running also at 160MHz. But setting the ESP32-S3 frequency to 240MHz does speed it up to 36.2ms. (I ensured the code is 1:1 exactly the same as the one running on the ESP32-S2)

So I think in conclusion, at the same clock speed, Micropython is a bit slower on Xtensa compared to on RISC-V (Well for this one task at least). But also that accessing PSRAM is super slow apparently... I guess that was kinda expected, I didn't really consider that...

Answer selected by misaalanshori

misaalanshori May 2, 2025
Author

You might also consider using Viper. This would involve some rewriting, but Viper is amazingly fast - in some situations nearly as fast as Assembler.

Thanks for pushing me to figure out how to use Viper. It is, in fact, amazingly fast.
On my ESP32-S3 (No PSRAM), my original show() method took 37.8ms, while my Viper show() takes 8.4ms. Converting the regular code to Viper code took quite a bit of trial and error, though 😅. But that was definitely worth it.

Though I think an RGB888 mode would speed this up more and allow the full use of all the colors.
I was really curious why there's no RGB888 mode and unfortunately, it looks like discussions about adding more modes like RGB888 fizzled out in 2022 due to code size concerns?

peterhinch · 2025-04-23T12:54:33Z

peterhinch
Apr 23, 2025
Collaborator

Also the total doesn't make sense to me? if we add up the PSRAM and the internal RAM...

The discrepancy is probably due to the needs of FreeRTOS.

1 reply

misaalanshori Apr 23, 2025
Author

Hmm how much does FreeRTOS need? Assuming total memory is 2MB+320KB, if the total reported by the gc module is 2062848 bytes then we're missing like 257KB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MicroPython

Exact same code runs twice as fast on the ESP32-C3 than on an ESP32-S2. Did I do something wrong or is this to be expected? #17161

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

MicroPython

Exact same code runs twice as fast on the ESP32-C3 than on an ESP32-S2. Did I do something wrong or is this to be expected? #17161

misaalanshori Apr 21, 2025

Replies: 2 comments · 6 replies

misaalanshori Apr 21, 2025 Author

peterhinch Apr 21, 2025 Collaborator

misaalanshori Apr 21, 2025 Author

misaalanshori Apr 21, 2025 Author

misaalanshori Apr 21, 2025 Author

misaalanshori May 2, 2025 Author

peterhinch Apr 23, 2025 Collaborator

misaalanshori Apr 23, 2025 Author

misaalanshori
Apr 21, 2025

Replies: 2 comments 6 replies

misaalanshori
Apr 21, 2025
Author

peterhinch Apr 21, 2025
Collaborator

misaalanshori Apr 21, 2025
Author

misaalanshori Apr 21, 2025
Author

misaalanshori Apr 21, 2025
Author

misaalanshori May 2, 2025
Author

peterhinch
Apr 23, 2025
Collaborator

misaalanshori Apr 23, 2025
Author