-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dmap / testmap broken under linux (Ubuntu 19.04) #436
Comments
How are you coming along on that custom test map? It doesn't need to be fancy, just something to demonstrate the issue at hand. Also not to be discouraging, but is there a reason you're not using an engine like UE4, Unity, or Godot? They would be more adept at giving you what you would need and want in an engine for a custom game. |
I have the same errors, dmap skips some random triangles, and I've encountered some portaling errors too, I've made a video that shows it |
there's a similar bugreport for dhewm3: dhewm/dhewm3#147 As a lucky guess, try removing all instances of It would probably help if you shared your testmap and not only the video ;) |
Unfortunately none of those changes to CMakeLists.txt solved the issue. what do |
Yes, Could you share your testmap? Otherwise no one can debug this.. (and I'd like to test if dhewm3 is affected) |
https://github.com/BielBdeLuna/OTE_testmaps/tree/vilcabamba |
I can't really reproduce your problems there, but I noticed
|
it seems that dmap fails at detecting the visportals, the visportals seem not be registered as such, and so the material is invisible but the movement through them seems to be impeded. edit the visportals work, those errors seem to be due the faces of the visportal not creating the visportal making such an error
wait, didn't I put the needed TGA's? edit I checked all the necessary tgas are there it seems that everything is failing, if you "reloadDecls" what sort of errors do you gets from the materials? did you put the folder "vilcabamba" as a mod folder? or you put "OTE_testmaps/vilcabamba" as a mod folder? edit I downloaded the map in another computer and with RBDoom3BFG it loads correctly, the level compiles (with those errors ) but it loads correctly ( with all the textures working ) but still unfortunately with the errors displayed in the video. |
As everything was black (in dhewm3) I tried adding all the other OTE assets (which used PNGs which I converted) but that didn't help either But in latest RBDoom3BFG git (master branch) I didn't get those glitches seen in your video. |
ah, ok, but the map wasn't meant to be used with all the assets from OTE just as a single map test I pushed a map without visportals, it displays the same triangular errors in the same places in my end when compiled, minus the visportal related compiling error warnings lines. I wonder are you using Intel? both of my computers are Ryzen based, I wonder if there is something against AMD optimizations? |
I'm using a Ryzen 2700X (with a Geforce GTX 970), so AMD vs Intel can be ruled out. I got your map to work with dhewm3 now, by adjusting the filenames (png => tga) in the materials as well. |
ah I found the culprit for the black textures, the gray texture ( that the level used still makes a reference to a PNG texture ) I don't know why it worked in RBDoom3BFG as it doesn't use PNG for textures edit that's not true PNG IS supported. edit , in dhewm3 compiles without the errors in those newly created triangles in the brushes. I pushed the changes now it should work |
So the same map works (meaning: looks correct) if compiled with dhewm3 but not if compiled with RBDoom3BFG?
RBDoom3BFG does support loading PNGs. They probably get converted to .bimage before being actually used, but PNG is supported as far as I can tell. |
yeah, something lays rotten within the source of dmap in RBDoom3BFG unlike Dhewm3.
ah, it's true, now I see the readme... sorry :) |
oh, also OTE compiles that map fine, it's RBDoom3BFG that has some kind of problem. I'm using the Master branch (last commit), and OpenGL of RBDoom3BFG. ...compiling Vulkan to see if there is any luck, but I don't think it might make any change to the issue. |
Damn, just realized that I can reproduce the bug after all, in both RBDoom3BFG and dhewm3 - if I enable ONATIVE in CMake. So it seems like it's related to optimization after all |
certainly ONATIVE off compiles the map correctly! :) so following cmake:
the culprit is:
isn't it? |
I wonder, can we command the compiler not to optimize some part of the code? some source file in specific? |
first I'll try to see if ONATIVE ON but
to the beginning of every file in dmap and:
In their end. *** edit *** Effectively |
it's something in dmap/optimize.cpp, more specifically in idlib/math/Vector.h included there haven't narrowed it down further yet |
in order to clarify and to narrow out my test benches:
and using GCC 10.3.0 in Ubuntu 21.4 |
I'm debuggin this in dhewm3. I found out the following: It's somewhere in Click to show sourceI moved that function into a separate source file (I called it optimize2.cpp) which looks like: #pragma GCC push_options
#pragma GCC optimize("O0") // commenting this line out breaks the build
// moving this include above #pragma GCC optimize("O0") breaks the build
// (usually it's included implicitly by dmap.h, but including it first allows to use different optimization settings on it)
#include "idlib/math/Vector.h"
// restore normal optimization for the rest of the file => still works as long as Vector.h isn't optimized
// (while only optimizing Vector.h and not PointInTri() breaks it)
#pragma GCC pop_options
#include "tools/compilers/dmap/dmap.h"
bool PointInTri( const idVec3 &p, const mapTri_t *tri, optIsland_t *island ) {
idVec3 d1, d2, normal;
// the normal[2] == 0 case is not uncommon when a square is triangulated in
// the opposite manner to the original
d1 = tri->optVert[0]->pv - p;
d2 = tri->optVert[1]->pv - p;
normal = d1.Cross( d2 );
if ( normal[2] < 0 ) {
return false;
}
d1 = tri->optVert[1]->pv - p;
d2 = tri->optVert[2]->pv - p;
normal = d1.Cross( d2 );
if ( normal[2] < 0 ) {
return false;
}
d1 = tri->optVert[2]->pv - p;
d2 = tri->optVert[0]->pv - p;
normal = d1.Cross( d2 );
if ( normal[2] < 0 ) {
return false;
}
return true;
} in the original optimize.cpp I just comment out that function and added a prototype for it, like So it appears like something in Vector.h, quite likely idVec3's cross product (just a hunch, haven't checked yet!), gets optimized in a way that breaks the |
Ok, turns out it's indeed the cross product, can be easily verified without even a second source file: #pragma GCC push_options
#pragma GCC optimize("O0") // commenting this out triggers the bug
// copy of idVec3::Cross() as standalone function
static idVec3 myCross( const idVec3& v1, const idVec3 &v2 ) {
return idVec3( v1.y * v2.z - v1.z * v2.y, v1.z * v2.x - v1.x * v2.z, v1.x * v2.y - v1.y * v2.x );
}
#pragma GCC pop_options // restore default optimizations then in After recompiling and running dmap on the testmap, it should look fine. I guess I'll have to look at disassembly next to figure out what's really going wrong.. not that I'm any good at reading assembler UPDATE: #pragma GCC push_options
#pragma GCC optimize("O1") // O0 and O1 work, O2 causes the glitches
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) __attribute__((noinline));
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
float z = v1.x * v2.y - v1.y * v2.x;
bool ret = z < 0.0f;
return ret;
}
#pragma GCC pop_options // restore default optimizations
// followed by PointInTri() implementation and in PointInTri() I replaced I also found out that with Here's the disassembly in case anyone can read that (I haven't looked very hard at it yet, I need to look up every single instruction..): # broken (-O2)
0000000000000810 <_ZL8norm2lt0RK6idVec3S1_.isra.3>:
float z = v1.x * v2.y - v1.y * v2.x;
810: c5 f2 59 d2 vmulss xmm2,xmm1,xmm2
bool ret = z < 0;
814: c5 f0 57 c9 vxorps xmm1,xmm1,xmm1
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
818: 55 push rbp
819: 48 89 e5 mov rbp,rsp
}
81c: 5d pop rbp
float z = v1.x * v2.y - v1.y * v2.x;
81d: c4 e2 69 9b c3 vfmsub132ss xmm0,xmm2,xmm3
bool ret = z < 0;
822: c5 f8 2e c8 vucomiss xmm1,xmm0
826: 0f 97 c0 seta al
}
829: c3 ret
82a: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
# working (-O1)
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) __attribute__((noinline));
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
2b0: 55 push rbp
2b1: 48 89 e5 mov rbp,rsp
float z = v1.x * v2.y - v1.y * v2.x;
2b4: c5 fa 10 07 vmovss xmm0,DWORD PTR [rdi]
2b8: c5 fa 59 46 04 vmulss xmm0,xmm0,DWORD PTR [rsi+0x4]
2bd: c5 fa 10 4f 04 vmovss xmm1,DWORD PTR [rdi+0x4]
2c2: c5 f2 59 0e vmulss xmm1,xmm1,DWORD PTR [rsi]
2c6: c5 fa 5c c1 vsubss xmm0,xmm0,xmm1
bool ret = z < 0;
2ca: c5 f0 57 c9 vxorps xmm1,xmm1,xmm1
2ce: c5 f8 2e c8 vucomiss xmm1,xmm0
2d2: 0f 97 c0 seta al
return ret;
}
2d5: 5d pop rbp
2d6: c3 ret
2d7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0]
2de: 00 00 |
does this happen for certain input values or generally? Do you have some test vectors and expected results? FWIW: I've playing with the function at https://godbolt.org/z/4GnzrYsfW ; code generated by it however is different, more lke the working example; for all over O>=1 I've also tried different compiler versions… Also tried for clang in the hope the clang diagnostics has some hints… |
I can confirm the I've found that page that explains very basically some methods on how to stop the optimization: https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/ maybe we could use this method from the page in order to do the cross operations:
and then maybe:
like explained in that page? *** edit *** I see that my proposed solution resembles somewhat to the assembly operations in -O1 for every dimension of the result. only making them "volatile" |
so is the error everywhere where the cross product is used? or is it only in the dmap optimization usage of it? |
My experience says that whether optimization or not is likely a red herring… IMHO it is a subtle bug in the code or (unlikely) a compiler problem. So I'm skeptical about this trying to trick the compiler… One idea: That could be the order of execution: With optimization the compiler is (beside other things) allowed to reorder instructions. I've seen situations where this might be problemtic, especially if the individual floats are using significant different exponents, so that for example an a-b is basically a, if b is so small that the difference cannot be represented… (lacking of words to express myself, sorry if that confuses more than helping) one obersavtion: the (broken) assembly snippet from Daniel seems to have omitted the load instructions: It soley works on registers. That could mean that the function has been inlined and gcc ignoring the attribute. As noinline is only a hint for the compiler, it is not bound to obey, this is possible: See also https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html) PS: https://godbolt.org/z/csfqW56Kh -- observe that test() is not calling norm2lt0() in the assembly code AFAICS… (On O3) |
I've got it solved! Optimized to 3 as per normal, but adding Daniel function before PointInTri() with a slight change (and also the due replacements in PointInTri() ) :
no need of the preprocessers if it's just that function *** edit *** made a pull request with the changes: #573 Maybe we should put norm2lt0 in the general vectors file? |
I added a "CrossZisNegative" function to Vector.h inside the
afterwards I could inline it after the inlined cross function:
and eventually I have reupdated the three cases of PointInTri to use the CrossZisNegative with:
also now in PointInTri it works fine. I haven't made this change in the pull request yet. |
I think that the x86_64 calling convention (on Unices) pass suitable arguments in those xmmN registers, so it should be fine to operate directly on them instead of doing loads.
That's likely - note that the -O2 version uses a multiplication ( One set of values that triggers the bug is: The cross product's I'll go on investigating later today, I just got up.. It's possible that a better solution would be to replace |
I guess what if we keep this function out of the optimization and we search a way to stop the optimization for MSVC or a workaround for MSVC, like a function specific for Windows? also:
maybe we can test those results separated and via code make z 0.0? |
It should be disabled, it's garbage. But even without -ffast-math GCC miscompiles that function. I don't think MSVC has that problem, if we used some GCC-specific pragma or function annotation as a workaround that must be guarded by |
but at |
Did you read my posts? |
yes but it's not an easy issue to follow for me, that's why I'm asking. by |
I updated the post to make things a bit clearer without having to keep all the former posts (that investigated which compiler settings trigger it) in mind.
yes
No, it's enabled with I'm still not sure how to properly fix this issue. |
ah, ok znver1 are the extensions of Zen cpus (version 1, whatever that is) So if you're attribute to maybe we could use *** edit *** *** edit 2 ***
and compiled with shouldn't this be it? |
Only happend if `ONATIVE` was enabled (or some other flag was set that enables the FMA extension), the root cause was that the cross product didn't return 0 when it should, but a small value < 0. RobertBeckebans/RBDOOM-3-BFG#436 (comment) has lots of explanation. I think this is a compiler bug, this commit works around it. fixes #147
See dhewm/dhewm3@320c15f for a fix |
Only happend if `ONATIVE` was enabled (or some other flag was set that enables the FMA extension), the root cause was that the cross product didn't return 0 when it should, but a small value < 0. Caused some faces to be missing in maps compiled with dmap. RobertBeckebans/RBDOOM-3-BFG#436 (comment) has lots of explanation. I think this is a compiler bug, this commit works around it. fixes #147
GCC bugreport for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839 |
so it's *** edit *** how do you check the ASM commands that the compiler do? |
I've used Daniels Compiler Explorer to try something: static float crossZ_O2( const idVec3& v1, const idVec3 &v2 ) { (the compiler explorer has some extra lines to output things commented out that I used to see the intemediate values; I've omitted them here) Just wanted to share this, I don't know if this is suitable. |
Nice idea (and I think I even read somewhere that this kind of optimization shouldn't be done then, at least in C?), but I'm annoyed that the GCC developers don't even seem to see any problem with this behavior. They must really hate their users.. |
thanks for seeing my damn copy+paste f*up, Daniel.
|
Another idea, as in the gcc bugreport, they've said: "C is just different here from C++ :)." … |
but if there might be other functions corrupted by contract optimization wouldn't then make sense to just use I think the idea is: I think it's more a problem of the documentation of the compiler that fails to inform of such behaviour and how to combat it (with that single command) rather than gcc devs hating the gcc users. so this also might mean that those same optimizations we're calling might break more stuff, and we might have to scan the documentation in order to find more commands to stop the code optimization troubles somewhere else. |
oh my god. nice that it works but.. I still don't get that the GCC people think that it's completely normal and fine that you can't implement a simple cross product with their "compiler" without using obscure compilerflags, pragmas or hacks. I'm thinking about ditching GCC and explicitly removing support for it from dhewm3 (with build errors or something) and tell people to use clang instead. It's time GCC loses the last relevance it still has. |
in our case we are not just asking the compile to do a simple cross product, we are also asking it to optimize it. what do you think? Do I make a pull request with the simple changes in CMakeLists.txt? |
updated fix: dhewm/dhewm3@2521c3d |
made the following pull request #575 |
solved! :) |
Only happend if `ONATIVE` was enabled (or some other flag was set that enables the FMA extension), the root cause was that the cross product didn't return 0 when it should, but a small value < 0. Caused some faces to be missing in maps compiled with dmap. RobertBeckebans/RBDOOM-3-BFG#436 (comment) has lots of explanation. I think this is a compiler bug, this commit works around it. fixes dhewm#147
When compiling a map under linux, some brush triangles are missing causing the map to leak. Compiling the same map under windows works flawless.
Ill try to provide a doom3 compatible testmap showcasing the problem later, as i am using a custom game using RBDOOM3-BFG as its engine.
The text was updated successfully, but these errors were encountered: