-
-
Notifications
You must be signed in to change notification settings - Fork 347
Multipass render to texture with options - implemented #1460
Multipass render to texture with options - implemented #1460
Conversation
0f6f94c
to
4d39c10
Compare
This pull request introduces 2 alerts when merging 4d39c10 into 19ca0ed - view on LGTM.com new alerts:
Comment posted by LGTM.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mar753 Thank you for this proposal and for your work.
I'm afraid this PR cannot be merged as is. Let me explain why.
Render-to-texture is currently disabled in the master branch, presumably because of performance and/or incomplete implementation.
If enabled, render-to-texture will render the current scene to an Open GL texture, only stored on the GPU, while keeping track of the destination address of said texture in VRAM (ft_rtt.TexAddr). When the next scene references a texture at this address, the renderer will simply return the previously generated RTT texture.
This fixes 99% of the games using RTT and is efficient since the texture data buffer doesn't have to be transferred back to VRAM from the GPU memory. So my first complaint about this PR is that it doesn't seem to handle this very general case. I'd rather have RTT working for most games first, and only then work on the exceptions.
Now, this system works for most games, but it doesn't work with THPS and THPS2 (and may be a few others). Why? Because THPS and THPS2 use render-to-texture is an "special" way: they first render the skater's shadow into the red channel of an RGB 565 texture. Then, when using this texture in a subsequent scene, they load it as an ARGB 4444 texture, thereby using the red channel of the rendered texture as the alpha channel of the new texture (and loosing one bit of precision along the way).
There's no easy way to do this in Open GL and detecting this behaviour is quite difficult. The solution is, in this particular case, to copy back the rendered texture buffer into the VRAM. And let the normal texture cache load it when needed. There is no need to manually hack the texture data as long as it's properly copied into the dest VRAM location with the requested bitpack format. This is how the actual dreamcast hardware is working so doing it this way should take care of any "exotic" use of textures such as the THPS[2] case.
If I'm not mistaken, this is not what this PR is doing. Instead the channel switching between red and alpha is hardcoded and thus can only work for this particular game.
You might want to have a look at my branch (fh/mymaster) where proper render-to-texture is implemented including reading back the texture buffer data into VRAM when a particular option is enabled (RenderToTextureBuffer, which is enabled by default for THPS and THPS2).
Thank you for your feedback @flyinghead! Regarding my implementation, I have done some debugging before and yes, I know that byte swap from red channel to alpha without LSB (RGB565 -> 'ARGB4444') needed by THPS2 can be done automatically when we read pixels directly to vram[fb_rtt.TexAddr<<3] location without any swapping, but it must be done within the same RTT frame rendering cycle (like in your code). I have even determined that fb_rtt.TexAddr<<3 for the shadow in THPS2 equals 0x4D0180, at least on my Android device, this can be used for debugging purposes. In my implementation I have moved glReadPixels() to the next frame rendering cycle on purpose (after RTT frame rendering cycle and next, scene frame, preparation), doing like this I was able to significantly increase the performance, thus framerate (tested on Qualcomm Snapdragon 625) with fully enabled single pass RTT (first: RTT frame, next: scene frame, then: RTT frame etc.). Performance is better because during making some stuff on CPU to prepare the next frame, RTT rendering/drawing from the previous frame is executing in parallel. In contrast, calling glReadPixels() within an RTT frame rendering cycle, causes a stall on glReadPixels() until rendering/drawing is finished. Unfortunately this approach forces me to make this color channel swap manually (red->alpha), because when a next frame is being processed (scene drawing), vram[fb_rtt.TexAddr<<3] location is not valid anymore - now it is valid for a current frame thus I cannot write to it. But... I am doing:
it is exactly what will be done in
little bit higher in this function there is a code responsible for red->alpha channel swap:
What is more
and will always return here in my solution (when using RTT), because What is more I have limited glDetele* commands in But you are right in terms of hardcoding red->alpha swap/move in my code, because when a texture to which we are rendering is RGB565 I am always doing the swap and converting the texture to RGBA4444, what can cause artifacts. I have tested several games and the problem was not visible, probably because they were using some other formats for RTT rendering like RGBA5551 or RGBA4444 from the beginning. Anyway, the artifacts can be visible only when RTT is in use, thus changing 'Render to texture' select to 'Zeros' in my implementation will prevent any artifacts, because in case of RGB565, rendered texture will be transparent, so it will not be visible anyways, if I am thinking correctly. I can make this option a default one. Do you know maybe games that uses RGB565 for RTT rendering in the 'normal' way that I can test (maybe some games with mirrors)? |
Crazy Taxi uses RTT for the pause menu in game. |
This pull request introduces 3 alerts when merging 302bfcd into bbc54e4 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
Render to texture (RTT) with multipass rendering support has been implemented. OpenGL ES 2.0 fully compatible solution was implemented, when reading pixels from a framebuffer object (FBO). There is an issue on Adreno 506 with RGBA5551 format/type, to be specific I was able to observe a huge performance drop when using this image format/type, even though this GPU declares that it is supported (GL_IMPLEMENTATION_COLOR_READ_TYPE query). This format/type is supported, but probably there is some time expensive internal format conversion, which makes it unusable, at least here. We can see this issue very clearly in case of Rez game. Using framebuffer object with GL_UNSIGNED_BYTE texture, and then reading it using glReadPixels() with the same image format/type combination seems to be a workaround for this issue. It is strange that 32 bit color rendering works faster than RGBA5551 16 bit rendering, but it is true in this case. I have not altered TextureCacheData struct (gltex.cpp) and TexCache.h/cpp files to adapt them to support 32 bit colors, because I decided it will be a lot of changes just for this fix and because we do not need 'texconv' in this case, 32 bit image is directly written to the texture after read (bypassing TextureCacheData's 'Update()' method). To achieve this blur effect in Rez: blending feature is used in gldraw.cpp: I have added RTT swizzle texture support too. It is used in Virtua Tennis during logo and demo video transitions to achieve 640 pixels in width of a texture. As I saw in the source code, there is no standard code convention and a lot of the C++ code is mixed with the C code, but I have tried to keep the similar convention and limit the refactoring. Default option for 'Render to Texture' is set to 'Zeros'. 'Full' option must be selected to enable multipass RTT. Solution was tested with several games that use RTT (including: Rez, Virtua Tennis, THPS2 and Crazy Taxi) and is working fine. Testing platform:
Feel free to test. |
This pull request introduces 3 alerts when merging 2aadb3c into 6936bc2 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
I have added the stencil support as well (if "OES_packed_depth_stencil" extension is available). This fixes the dimmed screen issue in Crazy Taxi. If you would like to test this solution please fetch my branch: https://github.com/mar753/reicast-emulator/tree/multisample_rtt_implementation @flyinghead Could you take a look at this PR |
@flyinghead were all your points worked on or is there still something left from your pov? |
This pull request introduces 3 alerts when merging 10d6f25 into 3c57177 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
@flyinghead |
As I have checked FBOs are not supported in OpenGL 2.x without the "GL_EXT_framebuffer_object" extension, so this last commit is not needed because OpenGL 3.0 at least is required anyway. |
This reverts commit 86339c3.
Just to clarify: I meant desktop OpenGL 2.x (NOT OpenGL ES 2.x). |
@flyinghead are all your remarks addressed? Since the default for RTT is 0 the behaviour is the same as before I have no problem merging this one. |
Sorry I haven't had time to dedicate to reviewing this. I assume the changes answered my concerns. Interestingly you mention that fb_alpha_threshold and kval_bit could change between the moment the texture is rendered and when it is used. (I must say I overlooked this case) Have you noticed this happening in some games? Also I don't think Circle/All Ones and And Zeros options are needed. We need three options:
|
Actually I would prefer to stick with my current solution, I will explain why. |
cb3b0e7
to
88b9deb
Compare
Reicast does not handle odd resolutions, we can see that effect in case of Xiaomi POCOPHONE F1 phone where higher dimension resolution is 2159 pixels. There is a narrow line visible (Rayman 2): I have fixed that in "Handle odd screen resolution (POCOPHONE fix)" commit. I have added dynamic resolution recalculation when screen resolution changes during emulation. This fix was needed when lowering rendering resolution is active. It's ready to be reviewed |
…into mar753/render-to-texture-with-options
…kground positioning
Moreover as I have compared on Android, native reicast (this repository) with my RTT solution from this PR (even that resolution is higher) is faster than RetroArch reicast core emulation (tested with 'Rez' - violet level). @baka0815 @skmp @dmiller423 This pull request is complete and ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mar753 did a really quick scroll though the diff and added some comments.
In future PRs, would be great to
- avoid nonfunctional whitespace changes to make reviewing easier. Trying to understand what was actually changed in
gl_tex.cpp
was a bit of a challenge. - Split changes that are not directly related to the main PR purpose into separate PRs (eg odd-sized-rendering workaround)
Here's a couple more questions
- Are textures always read back to the host ram and then re-uploaded?
- How does this compare with @flyinghead RTT changes /
fh/mymaster
? There's quite some divergence frommaster
.
I think it's up to @flyinghead to have final say on this. If it doesn't upset the code too much otherwise, I'd merge it.
@skmp Find my answers below
You are right, I prefer spaces instead of tabs (which looks strange on GitHub page) in source files, but just to be consistent with the rest of files I have used tabs to indent source files.
I agree to this in future PRs, just though that changes like odd screen resolution are so small that I do not need to create separate PRs.
I will briefly write how it works.
|
@skmp |
@flyinghead Can you approve this PR? |
@baka0815 @dmiller423 @skmp |
Since @skmp is ok with this and @flyinghead approved it, I'm merging this. Thanks @mar753 for your work! |
I was a huge fan of THPS2, so many memories ;), but when I saw those artifacts on my Android phone:
I decided that it must be fixed!
Current version of reicast supports GLES 2.0. Plain glReadPixels() operation is time consuming, sometimes preventing smooth 60fps gameplay. I have even upgraded GLES to 3.0 in reicast application and tried using single PBO (on GLES 2.0 we can use eglCreateImageKHR() as well, but unfortunately GraphicBuffer is not available in NDK, without compiling with Android native code, fortunately solution is to dynamically load libui.so library and get GraphicBuffer from there, check this out: https://github.com/fuyufjh/GraphicBuffer), but performance increase was not noticeable, at least on my device with Qualcomm Snapdragon 625.
I know that 'render to texture' support was disabled because of performance issues, so I have prepared several options, what I hope will be a good solution. There is a new menu select (Experimental section) called 'Render to texture', where available options are:
In terms of performance:
when 'render to texture' is 'Disabled - skip frames' performance is exactly the same as it is now in the current version (it is using exactly the same code as currently), there is also no noticeable performance differencies in every other modes/options, except 'Full' (Drawing is disabled in every mode except 'Full'), which is sometimes much slower. Maybe, limiting the draw distance may help here.
When using 'Full', only two pass rendering is available at the moment. It will be nice to implement multipass RTT as well.
Feel free to fetch this branch and test!
Cheers,
Marcel