-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Add option to use handles to RID #54907
Conversation
570cb8a
to
1163297
Compare
9852498
to
4544fab
Compare
PR meeting: reduz said this is good, needs review by @RandomShaper |
Since it's an optional build (which I don't think it should be), the CI/CD checks are meaningless. I tried to build this with
Building with I'll start using this build for my project. |
Ah good spot I'll fix that. I was mostly checking the tracked build and it takes a while to recompile. EDIT: Fixed, just needed some #ifdefs to check it was the tracking build for printing the leaks etc. You are correct about CI builds, but in practice it's not hugely important as it is just for special tracking builds at this stage, and once up and running it is unlikely many things would break it (except work on the RIDs themselves). We really want to keep the CI builds as streamlined as possible and limiting the permutations is already done. If you want to use the handles build in production that is probably fine if you desire (It's preferable to the tracked build as that will bloat the executable, and may be a little slower). It could be optimized a bit for release use but it's not that bad. We can definitely consider swapping the standard builds to handles in the future, but I'm not sure it's a good idea to jump the gun on this, mainly because this approach and any implementation needs thorough testing, as RIDs are central to the engine. There's also a trade off - handles are safer, but pointer RIDs may be a bit faster in some situations, as they avoid an indirection (although whether that is actually a problem in practice remains to be seen, we can do plenty of profiling / performance testing etc). And don't forget the political aspect. It's much easier to get this in as a debugging feature, then have people say "hey why aren't we using this in production", than to get it merged in the first place as production code. 😉 On the fixing the existing leaks, they are actually false flags, because they are deleted in a non-standard way. They can be fixed but better in a separate PR. We try and split PRs when they do more than one divisible thing. This makes it easier to track the changes, and e.g. revert one without the other. I actually did fix some order of destruction bugs in here (I think the owners were deleted before the RIDs) but that was more pressing. rasterizer_gles3.cpp line 505:
|
4544fab
to
1946a6d
Compare
As a compromise I've added a second commit which fixes the cleanup in GLES3 and GLES2. Rather amusingly, there was not even an attempt to cleanup RasterizerSceneGLES2 (there was no destructor!). 😀 I still have a RID leak in Environment in GLES2 but don't want to get things out of hand in this PR. And there might be more that should be ideally deleted in the GLES2 scene destructor but at least the RID leak list is a bit more manageable now. In retrospect it would probably have been better to put the GLES3 ordering fix I quoted above in the 2nd commit, but I don't know easy this is to do, whether you can edit a previous commit, not being a git expert. I could alternatively back these leak fixes up and put in a separate PR, depends what the mergers think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed something, since a lot happens here, but as far as my review can reach, this is fine.
I understand about testing. But the whole purpose of fixing the RIDs ability to determine if they are valid or not was to fix leakage and spatialeditor destroying memory that doesn't belong to it. If this isn't set as the default, then it doesn't meet the primary motivation for writing it, and I need to fix spatialeditor and visual server, and maybe physics server by other means, and assuming the RIDs are still unreliable. Not a deal killer, but worth discussing. |
Sorry this may be a misunderstanding. The PR isn't intended directly to fix the RID bugs, it is intended as a debug tool to make it much easier to find, track down and fix RID bugs in client code. To explain the reasoning a little more: The existing behaviour of RIDs follows the Now the handles are written in such a way as it can detect such use and recover from it, but this pattern of double deletes etc is imo not expected behaviour by the client and should not be relied on, or used as a replacement for a proper bug fix. This behaviour is the same in the handles in 4.x afaik. So currently using double deletes with handles might work, but it will flag errors, and would be considered a hack, rather than a proper fix. The proper fix is to keep proper track of lifetimes in the client code. There is no reason why you can't use e.g. reference counting for this or other methods if they are warranted in a situation. Altering the "contract" in the way you suggest is quite a major change and would need to go through a proposal etc. |
7835057
to
4cb4915
Compare
Yes on balance I think I will split the fixes into another PR and shrink this one to just the tracking. Hopefully won't take too long. 👍 😀 Ok is now separated hopefully. EDIT: Turns out the rasterizer order of destruction bugs do have to be fixed at the same time, because the |
22fe8b4
to
8725ca7
Compare
memdelete(storage->shader_owner.getptr(default_worldcoord_shader_twosided)); | ||
|
||
memdelete(storage->material_owner.getptr(default_overdraw_material)); | ||
memdelete(storage->shader_owner.getptr(default_overdraw_shader)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these are changed. They are essential fixes for this PR, because the old form of calling get_data()
directly on the RID is effectively no longer supported (i.e. won't compile with tracked_handles
).
It was only used in a couple of places anyway, constituted a bug here, and also wasn't compatible with handles / allocation tracking, so better to change these (the other place where it was used in this way was getting lights, and was unnecessary).
It may actually be worth removing the get_data()
in RID just to prevent people accidentally using it in future instead of the long form, by making e.g. private and only accessible by making the RID_Owner
a friend class. Doesn't have to be in this PR though.
@@ -1500,7 +1500,7 @@ void RasterizerCanvasGLES3::render_joined_item(const BItemJoined &p_bij, RenderI | |||
} | |||
} | |||
|
|||
glBindBufferBase(GL_UNIFORM_BUFFER, 1, static_cast<LightInternal *>(light->light_internal.get_data())->ubo); | |||
glBindBufferBase(GL_UNIFORM_BUFFER, 1, static_cast<LightInternal *>(light_internal_owner.get(light->light_internal))->ubo); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other occasion when get_data()
had to be changed as it is no longer supported directly on the RID as it won't compile with tracked_handles
. This form compiles down to the same form as previous anyway when rids=pointers
(i.e. default) is set.
a07fcde
to
859d915
Compare
Adds an option to compile an alternative implementation for RIDs, which allows checks for erroneous usage patterns as well as providing leak tests.
859d915
to
3d981b8
Compare
Thanks! |
Just to note here, that at the moment, you won't see the leak list at the end when running a game from the Godot IDE. Instead you need to run a game (or the Godot Editor) directly from a console, or c++ debugger for the leak list to show up. The reason for this is that the machinery for This is a bit annoying as it would be nicer to be able to see the leaks directly in the Godot IDE. I'm just seeing if we can fix this by simply moving the call site around for the rid_database shutdown (which lists the leaks), but no joy so far. I'm not sure exactly which parts of the engine are responsible for debug output. So at the moment we either get no output in the Godot IDE, or we get 'fake leaks'. An alternative could be to move the RID cleanup code for the rasterizers to their We could also potentially try moving the cleanup of the debug output to later in the cleanup sequence, but I'm loathe to touch this if possible, as this order has obviously been tested on all the platforms and works. Having said that the errors for misuse at runtime should show up fine in the Godot IDE, it's just a gotcha to watch out for. |
@lawnjelly How do we free the handles? When opening my project in the editor, not running it, then closing it, the console reports 21 leaks in CanvasItem and Texture:
Looking at texture.cpp:828, it creates an RID for a StreamTexture. However that is freed in the destructor. Since the RID is freed in the destructor, is this falsely reporting a leak?
Both go to RasterizerStorageGLES3, where the CanvasItem shows the same thing:
Both go to |
That looks correct in theory, but you need to debug it (or post a min reproduction project and I can take a look). The things of interest are the RID _id, and the _revision. False leaks are in theory possible, but unlikely (most likely a valid bug), because the RID_Database should be shutdown last. There could also be a bug in the tracking (as it is quite new), or most obvious this could be a genuine error that needs fixing. Incidentally best to post these in individual new issues then we can investigate and fix. RID bugs are to be expected because none of this has been monitored before, and it would be surprising it a few hadn't built up over the years. Incidentally in the GLES2 environments was a very similar situation - on first glance it looked like it was freeing them correctly and I suspected a bug in the allocation tracking. But on further investigation the leak was genuine and there was a bug in the free routine. |
print_verbose("RID revision incorrect : " + _rid_to_string(p_rid, pe)); | ||
ERR_FAIL_COND_V_MSG(pe.revision != p_rid._revision, nullptr, "RID_Database get_or_null, revision is incorrect, object possibly freed before use."); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: In retrospect, I probably should add a lock here if we do decide to roll out further.
I suspect there is a potential race condition from getting PoolElement
and reading at the exact moment a pool resize is happening from another thread (this should be incredibly rare, and may not even happen in practice). This is kind of a pain, as it would be nice to have this performance sensitive function operate without a lock, especially for such an unlikely situation. Maybe there is a way around this. Anyway not a significant problem unless we do decide to make regular builds with handles.
EDIT: Fixed in #60427 .
How about making this the default build option in the next beta release for editor builds @akien-mga? Suggested here: #60406 (comment) The bottom lines are:
@lawnjelly said this:
|
Next build should RC1, so it's not the time for experiments anymore for 3.5. But we can consider it for 3.6 beta 1. |
I'll remember to use |
Adds an option to compile an alternative implementation for RIDs, which allows checks for erroneous usage patterns as well as providing leak tests.
Note : This is compiled out during a regular build so there should be no differences to operation. Handles are only compiled in (replacing the regular RIDs) when passing an argument to scons.
Introduction
We've had a number of very difficult issues over the years concerned with the lifetimes of objects. There are two lifetimes of major interest - scene side Nodes (and objects derived from Reference, Object), and server objects, which are referenced using RIDs. It is this latter objects that are addressed here.
An example of the difficult to investigate problems caused by RID lifetimes is #53374. This and similar bugs can be debugged and fixed using this PR.
The implementation of RID in 3.x is basically a glorified pointer to the object (stored in
RID_Data
, the actual server objects are derived from this). This means that if you copy a RID, all the RIDs point to the same object.This is great and quite efficient, in that you can basically just look up the pointer to get the object, the problem comes when you come to delete objects, you often get dangling RIDs (pointers) that are pointing to an object that has been deleted, or worse, another object that has replaced the old object.
One widely used solution to this problem is handles containing an indirect lookup for the object rather than a direct link. That way when you delete an object, a further erroneous lookup of the handle can detect this situation, flag an error and return a NULL. There are a number of other advantages of handles - the object is relocatable, and you can detect use of an invalid handle ID using revisions, etc etc.
https://en.wikipedia.org/wiki/Handle_(computing)
The downside in this case is that the lookup requires an extra indirection, which may potentially cause a cache miss. There is a little extra housekeeping versus a straight pointer, but not a lot, and certainly nothing compared to the cache implications.
Godot 4.x has moved to using handles for the safety aspects.
This PR
What I've done here is add a simple alternative implementation for RIDs that uses handles, and is complete with error checking, and also can pinpoint the original files / lines of leaks etc (to a limited extent). This ability to do tracking hasn't been added to 4.x, it's somewhat difficult logistically but I've managed to get it working via a small macro when creating RIDs (you wrap them in a RID_PRIME()), which compiles out when tracking is switched off.
This is a small extra hassle when creating RIDs, but it's no big deal if you forget to add one, it just means you won't get tracking for that object.
Handle system
The handle system itself is pretty simple, it doesn't attempt some of the optimizations in 4.x, and it reuses the
pooled_list
templates which have already been created for handles in the BVH and portals, which makes the new code quite easy to understand.Building with handles
Nothing has changed in the regular builds of Godot, in order to build with handles you need to add an
scons
argument:The tracking version does a little more than the straight handles implementation because it has to record files and line numbers. But even the tracking version uses less memory than the standard build, around 20 megs less in the small tests I used so far, so the old system is quite memory hungry presumably with red black trees etc.
Implementation Details
The
RID_Database
is just a global currently. It needs to be readily accessible for theRID_PRIME
systems to work efficiently. It could be a singleton, but there's no real need for this and it will probably make it slower. It could also potentially be tagged onto another existing object, but the lifetime issues are quite important here, it needs to be available until other systems have been destroyed, so having it some form of global is a safe bet in this regard.Performance
I haven't tested yet to any extent, I get the impression the handles version may be very slightly slower in some situations (and possibly faster in others), and it is not highly optimized, indeed many of the calls are in cpp files rather than headers to avoid the unfortunate side effect of recompiling the engine when making changes, as
RID
is quite deep in the dependency tree.The main aim to start with however is to provide a debugging build that can be used to investigate and test for RID lifetime issues. We may decide at a later date to swap completely for handles, but there's no need to make this decision at this time.
Example Leak Output
This is actually the standard situation running the engine, these are some default materials / shaders that are not cleaned up in the normal way. They are not problematic, but could be fixed.
In use there may be other leaks in particular projects, which indicate problems that need to be investigated.