-
Notifications
You must be signed in to change notification settings - Fork 73
Black squares in output. #78
Comments
Hypothesis:
|
Hypothesis: It looks to me as if the core-group runs into a division-by-zero because the normal is perpendicular to the ray. (say: "what is the tan of the angle that the ray hits the pixel?" is answered by "infinity" or a division by-zero. In the example above, you can see that where mathematically a precisely perpendicular ray would exist on each edge pixel, only a few get the division-by-zero: This is because the actual ray is likely to miss the precise perpendicular point. |
The math is done in floating point. Division by zero on a GPU doesn't cause an exception, it produces either Infinity or NaN values, and the computation keeps running. There's nothing that the Nouveau driver can do to change this behaviour. The Nvidia proprietary driver doesn't produce these glitches, and nobody has reported this problem before. So we are looking for a mechanism that is specific to Nouveau. Those glitches appear in places where the sphere-tracing algorithm that I use for ray-marching will normally use an increased number of iterations. We can test my hypothesis once I make those constants configurable. |
I just added two new command line options:
You can try reducing ray_max_depth indirectly limits the number of iterations. If the ray has travelled more than 400 units (or whatever value you supply), then the loop terminates early. I don't think this one will fix the glitches in |
Thanks! Pulled, compiled and tested. Starting from the default you show above, neither of the two parameters seems to do anything. I tested 2x more and 2x less. Edit: Ah! the "change by a factor of two" was not aggressive enough. 80: no change. 40: Only two black blobs. 20: no blobs. (5 is the "normal" number of blobs for the rainbow-cylinder that I use to test). |
You tried -Oray_max_iter=100.
Now try -Oray_max_iter=50, -Oray_max_iter=25, and smaller...
|
yeah. With 40 "some" are fixed, but at 20 they are all fixed and render correctly. So 40 is "on the edge" and 20 is enough.... |
Testing with: rainbow.curv, counting only the black blocks on one side of the cylinder: 0-19 does not render correctly (part of the cylinder is missing) . |
All GPU drivers render by partitioning the viewport into tiles, and rendering the pixels within each tile in parallel. Multiple tiles are also rendered in parallel, depending on how many cores you have. In Curv, the time required to compute a tile can vary greatly. Background tiles are usually very fast. Certain tiles, like the rounded edges of the rainbow cylinder, can be slow. There could easily be a 50 to 1 or 100 to 1 difference in rendering times between tiles, depending on the shape, but if the slow tiles are rare, then you still get fast average tile rendering times, and the user can't tell the difference. The Nouveau driver appears to impose a hard limit on the rendering time of each tile. It is the slow tiles that are turning black, and we can eliminate the problem by speeding up the ray marcher. I would guess that Nouveau attempts to guarantee 30 frames per second, based on the assumption that all tiles take the same time to render, and imposes a hard time limit based on these assumptions. If the slowest tile in a Curv program is required to meet this deadline, then the net effect is as if the GPU is 10 or 50 times slower than it actually is. There is at least one more simple trick for getting a bit more performance out of the ray marcher, but no easy to way to 10x or 50x more performance. I think that Nouveau is not suitable for use with Curv, and I recommend installing the Nvidia proprietary GPU driver. |
I couldn't find a clear explanation why Nouveau works this way. No other Mesa based GPU driver has this "black rectangle" bug. But, we do know that Nouveau suffers performance problems because Nvidia is blocking the Nouveau project from doing thermal management. (Those APIs are blocked, due to a requirement for digitally signed firmware on some hardware models, and due to implied legal threats if they reverse engineer the proprietary driver.) This means that Nouveau must be careful to avoid doing anything that would cause your GPU to overheat and become damaged. This is consistent with my theory that Nouveau aborts a SIMD group if it runs too long. I looked to see if there is a way of disabling the "black rectangle" behaviour, but I couldn't find anything. |
So, now we have a "workaround in curv" and possibly a demonstration case, I think it is time to report this as a bug in Nouveau. For my understanding: you have a "ray_max_depth" that says how far from the viewpoint the rays can be broken off. This explains why some things that look infinite seem to have an end, but when you move the viewpoint the actual end stays just as far as the ray depth is measured from the camera position. Right? |
A suggestion from the Nouveau bug tracker is to use this environment variable:
This will disable the Nouveau GPU driver and use software rendering of OpenGL calls instead (meaning the work is done on the CPU). The results may be unacceptably slow, but there should be no rendering artifacts. This is not a serious or practical suggestion, due to the loss of rendering performance, but I'm including it for completeness of the historical record. |
The Nouveau driver is not supported until this issue is resolved upstream. I think that it isn't just a simple bug fix, that instead Nvidia will need to change their corporate policy and support the Nouveau project, before the issue can be resolved. |
Might I make a suggestion? I think the difference is important: I almost gave up on "giving curv a test-run" because of your "not supported" status. While in fact it is quite usable, if you know that the black rectangles are a rendering artifact. Getting people to test-drive curv and subsequently interested in curv works both ways: With a bit of luck someone might fix the nouveau bugs that cause this issue, or maybe someone fixes it by modifying curv in such a way that the nouveau issues no longer occur. |
Here's sort of good news, a way to work around the Nouveau driver bug. But in the end, it's still easier and safer to just install the Nvidia proprietary driver. More information about the Nouveau bug:
And here is the official Nouveau web site. It looks like the "black squares" performance problem can be mitigated by "manual reclocking", at least on the older pre-GTX-900 GPUs that support this. This is a risky procedure that involves setting
Phoronix provides more helpful instructions: https://www.phoronix.com/scan.php?page=news_item&px=linux-4.5-nouveu-pstate-howto I don't recommend following this procedure. It's far less difficult, and far less risky, to install the Nvidia proprietary driver. And you'll get better results than with the Nouveau driver + reclocking. |
I updated the GPU requirement section of the README with better wording and more information. Thanks for the suggestion. |
On the Nouveau graphics driver with GF119 [GeForce GT 610 as the hardware, I get black boxes in the output. It happens when the normal of the surface is perpendicular to the viewing angle.
The boxes are 4x8 pixels (4 wide, 8 high).
I don't do drag-n-drop. So I can't attach a picture here. I've uploaded it. http://prive.bitwizard.nl/curv_black_boxes.png
The text was updated successfully, but these errors were encountered: