Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear external tileset skeletons from tile tree to save memory usage #1107

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

azrogers
Copy link
Contributor

Closes #739. Currently, though the content of tiles are unloaded when no longer used, the "skeletons" - the Tile objects - created by loading external tilesets are never unloaded. This can cause memory usage to steadily increase. This change implements a _doNotUnloadCount number on each tile that tracks situations where the tile's pointers are still in use. When a tile is in a situation where its pointer is being used - the tile is being loaded, for example - it increments this count on the tile and each of its parent tiles, and when the pointer is no longer needed, this counter is decremented up the tree as well. This means we can clear the children of external tilesets when their _doNotUnloadCount number is 0. This implementation also includes a TileDoNotUnloadCountTracker class, enabled with the CESIUM_DEBUG_TILE_UNLOADING switch, that tracks the source of every modification to a tile's _doNotUnloadCount.

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @azrogers, this is so good! I tried flying around with Google Photorealistic 3D Tiles in Cesium for Unreal, and memory usage stays extremely steady. Previously I believe it would have gone up quite quickly. I didn't see any crashes or other dodgy behavior either. This will be a major improvement for our users!

Mostly small comments here, but I did notice a couple of cases where I think there's potential for (rare) bad behavior.

I think it's also worth taking a bit of time to think through whether there could be any other gotchas like this, and generally do everything we can to thoroughly test everything. Perhaps bring back the soak test from #1415?

Cesium3DTilesSelection/src/TilesetContentManager.cpp Outdated Show resolved Hide resolved
Cesium3DTilesSelection/src/Tileset.cpp Outdated Show resolved Hide resolved
Cesium3DTilesSelection/src/Tile.cpp Outdated Show resolved Hide resolved
Cesium3DTilesSelection/src/Tileset.cpp Outdated Show resolved Hide resolved
@azrogers
Copy link
Contributor Author

@kring Looks like reversing the direction of iteration, as well as unloading empty tiles, worked great! I'll take a look at that soak test and think of some other ways we could verify correct behavior here.

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small things here. Also please update CHANGES.md.

Cesium3DTilesSelection/src/Tile.cpp Outdated Show resolved Hide resolved
@azrogers
Copy link
Contributor Author

Bringing the soak test up-to-date (now in CesiumGS/cesium-unreal#1615 for reasons detailed in that PR) did identify some crashes, including issues with sampleHeightMostDetailed. I'll try to fix those now.

@azrogers
Copy link
Contributor Author

I realize now that sampleHeightMostDetailed never showed up on your list of Tile pointer references in the previous PR because the functionality hadn't been implemented at that point 😅

@azrogers
Copy link
Contributor Author

It's now tracking most of the tile pointer usages in TilesetHeightQuery, though it still crashes when running the test in Unreal. However, it also crashes when running the soak test from CesiumGS/cesium-unreal#1615, so I might in fact be correctly tracking all the usages in TilesetHeightQuery and it's crashing from an unrelated pointer use that I haven't found yet. Still looking into it, but might need to finish up solving this on Monday.

@azrogers
Copy link
Contributor Author

Almost fixed all the issues. First up, the fix for the issue where external tilesets would sometimes be unloaded while some of their children were ContentLoading. This confused me, since it's something we're explicitly flagging to avoid. After a lot of log statements and trial and error, I found out that the issue is here:

setTileContent(tile, std::move(pair.result), pair.pRenderResources);
tile.decrementDoNotUnloadCount("TilesetContentManager::loadTileContent done loading");

The issue with these two lines is that, despite the decrementDoNotUnloadCount call coming after the call to setTileContent, which updates the tile out of the ContentLoading state, decrementDoNotUnloadCount would reliably run before the tile's state was updated. How is this happening? setTileContent isn't a future, it blocks until completion. setState isn't doing anything funky either. It gets even more confusing when the fix turned out to be just adding decrementDoNotUnloadCount after every setState call in setTileContent instead of calling it in the loadTileContent future. How does this change anything? I have no idea. But it works, the _doNotUnloadCount is now decremented after the tile leaves the ContentLoading state.

The other fix at least makes sense. Turns out, while _loadedTiles is reliably in order from top to bottom of the tree after the root tile, this isn't a guarantee before the root tile. I'm fairly sure this is because some of a tile's children can be visited for rendering without visiting the others, meaning those tiles and their ancestors will be dragged to the end of _loadedTiles, but the ones that are no longer rendering will stay in their original position. Then, when cleaning up those more recently rendered tiles, we will end up cleaning up the ancestors before the children, resulting in those pointers still in the _loadedTiles list turning into junk data.

The easiest solution is just to make sure we're unloading the entirety of _loadedTiles at once, ensuring that every parent and child is removed from _loadedTiles before we clear any children. But the time limit is there for a reason! So the best solution I came up with is a _clearChildrenRecursively method that allows us to bail out at any time:

bool Tileset::_clearChildrenRecursively(Tile* pTile) noexcept {
  // Iterate through all children, calling this method recursively unless the
  // child is still in _loadedTiles. If the child is still in _loadedTiles, or
  // any of its children (or children's children and so on) are in _loadedTiles,
  // we return false so we don't clear the children prematurely.
  for (Tile& child : pTile->getChildren()) {
    if (this->_loadedTiles.contains(child) ||
        !_clearChildrenRecursively(&child)) {
      return false;
    }
  }

  pTile->clearChildren();

  return true;
}

That way, if there are children that still haven't been unloaded, we will hold off on clearing the children. And if this method returns false, we also hold off on removing the external tileset Tile from _loadedTiles so that we can handle it next frame, or the frame after that, or however long it takes for all of its children to be unloaded. It's not ideal that we have to walk the tree every time when we're potentially still in a state where we can't unload the children, but it's the best I've come up with so far.

The remaining issue is a result of this fix - because the external tileset Tile can remain Unloaded for an indefinite amount of time without clearing its children, this opens us up to "Children already created" errors. We can't just set the Tile to Unloading either while it's pending, because this will mean that if we need the tile and its children to render, we will have to wait for all the tile's previous children to unload and be cleared before we can get it to an Unloaded state to be able to start the loading process again. So, I either need to figure out a way to make sure the external tileset has all of its children cleared on the same frame as it is unloaded, or I need to figure out how to re-use those children that haven't been cleared yet when the tile is picked back up for rendering.

@azrogers
Copy link
Contributor Author

On the first ordering issue: you can test this out by checking out 140a7d9 and adding log statements to both Tile::decrementDoNotUnloadCount and Tile::setState. It was occurring for me in Unreal building with MSVC - results may vary in other engines and compilers.

@kring
Copy link
Member

kring commented Feb 19, 2025

The issue with these two lines is that, despite the decrementDoNotUnloadCount call coming after the call to setTileContent, which updates the tile out of the ContentLoading state, decrementDoNotUnloadCount would reliably run before the tile's state was updated. How is this happening?

Do you have an insight into why this matters? Only the main thread should ever look at the doNotUnload count, and also only the main thread should look at the tile state. And finally, only the main thread should modify either of these. Therefore I can't see how it would matter in the slightest whether the count was updated before or after the load state was updated, because nothing should ever be able to observe one updated without the other, regardless of their order. In fact, the compiler is free to reorder these if it feels like it.

If any of those assumptions are violated (i.e., if any Tile property is read or written off the main thread), then that's the real problem.

We do, in some cases, create Tile instances in a worker thread, and then transfer ownership of them to the main thread. This should be fine because the transfer itself will create the necessary memory barrier to avoid partial updates of the Tile being visible. Are there any other cases where we might have thread-unsafe access to a Tile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak when using external tilesets.
2 participants