This is an implementation of the basic ray tracer described in Peter Shirley's books Ray Tracing In One Weekend and Ray Tracing: The Next Week. While those books describe an implementation in C++, I don't believe in producing new C++ code, so this version is in Rust.
More information after the two pretty pictures:
Demo scene from the first book rendered at 1200x800 at 50x oversampling, in 52s.
Demo scene from the second book, showing sub-surface scattering, volumetric fog, motion blur, etc. 5000x oversampled, 100 minutes.
Install Rust, clone this repo, and type:
cargo run --release > out.ppm
Now view out.ppm
with the image viewer of your choice.
If you make changes, run cargo bench
before and after to check for performance
regressions.
This section has two goals:
-
To help you read the original C++ codebase and the Rust codebase for comparison purposes.
-
To make my larger point about why I do not produce new C/C++ code.
The overall structure of the code is vaguely similar to the C++, but there are some differences, and those differences are growing with time. This list is not exhaustive.
-
The algorithm implementations use idiomatic Rust. For example, "out-parameters" have been eliminated, functions that may or may not return a result now use
Option
, and so on. -
Images are rendered into memory before being printed to
stdout
. This makes concurrency (below) easier. -
The ray propagation routine is now iterative, not recursive, which lets me play with higher bounce limits without blowing the stack. (It also improves code generation.)
-
Essentially all of the math code, including bounding box intersection tests, is phrased so that it gets auto-vectorized into Intel AVX instructions. (No actual vector intrinsics are used, so the code can be compiled for older CPUs or ARM.)
-
The ray tracer will distribute rendering over available CPU cores. Because we're in Rust, this took about one line of code and is statically free of data races.
-
Material is an enum, not a class hierarchy. C++ doesn't have Rust-style enums, but they're a useful way of modeling a closed set of options, and matching on an enum is significantly cheaper than dynamic dispatch.
-
Object (
hitable
in the original) uses Rust'strait
concept to do dynamic dispatch where required, but static dispatch where possible. In particular, transformation nodes likeTranslate
andFlipNormals
integrate with the object they're transforming, which not only eliminates an indirection and heap allocation, but allows the compiler to optimize combinations likeTranslate+Sphere
together. -
The C++ codebase contains a lot of anti-pattern pointer usage. That's all gone. In particular, the data structures in this implementation can be safely deallocated; this was not true in the original.
-
Random number generator state is explicitly passed around, so that the entire system can be made deterministic for benchmarking.
To compare the performance of the Rust codebase to Shirley's C++ codebase (on Github), I've used the following settings:
- Scenes: Cornell box with rectangular prisms, and final scene from book 2, with the texture-mapped Earth sphere removed (because I couldn't be bothered to implement it in Rust yet).
- C++ and Rust using the same scene data structures (bounding volume hierarchies for certain dense areas, simple vectors everywhere else).
- Computer: Skylake Thinkpad (Intel i7-8550U, 4 cores / 8 threads).
rustc
1.33.0.- GCC 8.2.1.
- Rust built with
cargo build --release
, and restricted to a single thread by exportingRAYON_NUM_THREADS=1
at runtime. - C++ code built with
g++ -O3 -march=native main.cc -o main
. (Adding-ffast-math
and/or-fomit-frame-pointer
doesn't change things significantly.)
Note that the C++ code is the best possible case for GCC's optimizer
(effectively a single source file with all definitions inlined), while the Rust
code is split across many files, a library target, a binary target, and uses
upstream libraries. To level the playing field, I switched on LTO in
Cargo.toml
.
Here are the results at the time of this writing (scenes rendered at 300x300x100):
Scene | C++ | Rust 1CPU | Ratio | Rust 4CPU | Ratio |
---|---|---|---|---|---|
Cornell box | 14.25s | 10.91s | 0.7656 | 2.94s | 0.2063 |
RT:TNW final | 32.48s | 18.07s | 0.5563 | 5.01s | 0.1542 |
Which is to say, the Rust code is substantially faster than the original:
-
Limited to one CPU, the Rust implementation takes between 24% and 45% less time than the C++ implementation, depending on the scene.
-
When not limited, it takes 80% to 85% less time. (I think counting this is fair, because parallelizing Rust code is so easy compared to C++.)
This is despite the Rust code technically doing more work: all array/vector accesses are bounds-checked, certain corners of floating-point math are checked more rigorously than in C, every potentially null pointer is checked before use, and all memory operations are both memory-safe and thread-safe. Remember this next time a C programmer insists that they need to do unsafe tricks "for performance."
(Interestingly, the Rust programs also use about half the RAM.)
As measured by cloc, the Rust implementation is somewhat longer than the C++. To do a fair comparison, I excised the C++ code responsible for image format decoding, which I didn't implement. The results:
- C++: 1,219 LOC.
- Rust: 1,647 LOC.
While Rust is generally less boilerplatey than C++, the fact that it contained more lines of code here didn't shock me, for three reasons:
-
I was deliberately more verbose in how objects are declared, using Rust's struct literal syntax with named fields. (I would have done the same in C++ -- names are nice -- but Shirley used constructor functions.)
-
cloc
is sensitive to formatting, and I've usedrustfmt
to enforce a somewhat sparse style. If I run, for example,clang-format
on the C++ codebase, it grows to 1,330 lines. -
The C++ code's organization into a single
.cc
file with all definitions inlined made it less boilerplatey than a "real" C++ codebase using separate header and implementation files. (On the other hand, this means it actually takes longer to compile than the Rust code.)
As noted above, the C++ codebase is organized into a single source file with
includes. As a result, while GCC is typically faster than rustc
, the C++ ray
tracer takes longer to compile than the Rust code (3 seconds vs 2).
Consider the amount of work required to review each codebase for possible memory-related errors, such as buffer overruns, dangling pointers or use-after-free, reads of uninitialized memory, null pointer dereference, and the like. The codebases are roughly the same size (see above).
In C++, every non-blank line of code could potentially contain such errors. (In this codebase in particular, there are a bunch of potential use-after-frees waiting to happen, and basically every type has a default constructor that leaves its member variables entirely uninitialized, virtually guaranteeing reads of uninitialized memory.)
In Rust, such errors can only occur in unsafe
blocks, and an attribute
(pragma) at the top of the ray tracer codebase bans them. You don't even have
to read the code to know there is no unsafe
in it; your review is complete.
As if to make my point, when I first checked out and built the C++ code, it
segfaulted immediately. (The required earthmap.jpg
file is not distributed
with the code, and the program handles this error by dereferencing a null
pointer.)