Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codon slower than PyPy, can't find out why #553

Open
Tenchi2xh opened this issue Apr 26, 2024 · 2 comments
Open

Codon slower than PyPy, can't find out why #553

Tenchi2xh opened this issue Apr 26, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Tenchi2xh
Copy link

Tenchi2xh commented Apr 26, 2024

Hi, love the project!

I recently started implementing a ray tracer as an exercise for trying out Codon. After a while, I was curious to try and make the code work with vanilla Python and PyPy, and then found out that my renders are about twice as fast with PyPy compared to Codon.

image

After trying a few optimizations to no avail, I decided to try and profile the execution of the Codon-made binary:

image

It appears that more than half the time is spent on some internal gc.alloc_atomic, and also starting threads? (I have zero @par in the whole codebase).

I noticed that when using time, the user time is often twice as much as the real time (something in a thread is doing something). And in turn the real time is still twice as much as PyPy's.

My suspicion is that creating a lot of Vec3 classes all the time is somehow bogging down the GC. Maybe I have a basic misconception of how to use Codon?

Here is an interactive version of the flame graph (unzip and open the SVG in a browser), and the code is available here: https://github.com/Tenchi2xh/RTOW-Codon (check out commit 379d5d0, the master branch now has other types of optimizations). The main entry point is rtow/__main__.py but it's easier to run it from the run.sh script (a preprocessor has to remove python-specific stuff). To run it faster, just reduce samples_per_pixel and max_depth on lines 52-53 (it runs even slower in the profiler)

(Sorry to link to a whole repo, it's not a big codebase, but big enough to make it hard to produce a minimal reproducible example for a Github issue)

I am using the latest dev build of Codon, downloaded from a CI build

@Tenchi2xh
Copy link
Author

Update, after implementing an algorithm that reduces the number of lookups, the scales tipped to the other side:

  • Without the algorithm, Codon is 2.3x slower than PyPy
  • With, PyPy is 8.6x slower than Codon

The optimization alone makes Codon 35x faster, making the same render go from 11 minutes to a mere 19 seconds

The flame graph still looks the same with or without the optimization, so maybe something else is at play (or using dtrace messes up with Codon?)

@inumanag inumanag added the bug Something isn't working label Sep 23, 2024
@inumanag
Copy link
Contributor

Hi @Tenchi2xh

This looks like a GC issue. Are you using multi-threading by any chance? @arshajii has some patches for such use-cases.

If this program uses lots of RAM, try setting GC_INITIAL_HEAP_SIZE to something large: that can help a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants