Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added "ActiveGameObjectDatabase" to replace FindObjectsOfType #2605

Merged

Conversation

KABoissonneault
Copy link
Collaborator

Intro

My solution for #2355 crashes.

I have personally observed some users struggling with DFU crashes multiple times a day. The callstacks are almost always somewhere in the middle of a specific Unity function: FindObjectsOfType. DFU uses these in many places, often on every frame, for each object in the scene (see EnemySenses).

From the Unity docs: https://docs.unity3d.com/ScriptReference/Object.FindObjectsOfType.html

Note: This function is very slow. It is not recommended to use this function every frame. In most cases you can use the singleton pattern instead.

And this is precisely what this changelist does.

I have personally never reproduced these crashes. They seem to require extensive gameplay sessions on a very specific mod setup. Avoiding calls to FindObjectsOfType should for sure avoid crashes in that function - we'll see if it doesn't end up crashing somewhere else.

One thing is for sure, the performance should be better no matter what. Doing a lookup on a limited list of objects should always be faster than doing a lookup on the entire scene.

The Database

ActiveGameObjectDatabase is the namespace where we store the GameObject caches for DFU objects. I have created a new utility class, GameObjectCache, which is designed with specific performance characteristics and safety concerns. I have made this class public, since mods might want to reuse it for similar lookups on their own objects (see the Mods section below).

GameObjectCache

I don't actually suspect that the crashes we've been seeing are caused by multiple threads touching the scene at the same time. DFU does use parallel threads for jobs in the terrain system, for example, and while Unity does not actually support creating GameObjects from such parallel jobs, I have seen mods like Interesting Eroded Terrains try to do GameObject lookups from terrain jobs. Therefore, I decided to be safe and find the minimal overhead solution to make such lookups safe for any code.

GameObjectCache has two data members: a List<WeakReference<GameObject>>, and a ReaderWriterLockSlim.

For the List part, the WeakReference is there to ensure that 1) the cache does not keep objects alive for longer than necessary 2) we can safely detect destroyed objects long after Unity has cleaned them. I suspect this is not actually necessary for Unity GameObjects, maybe I went overkill on this aspect, but I don't think the cost has been an issue so far.

For the Lock part, here's the C# documentation: https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlockslim.

The Cache is used in two ways: multiple systems do frequent reads to the cache, and a few system do infrequent additions to the cache. A "Reader Writer Lock" allows multiple readers to read at the same time without blocking each other, while a single writer can still claim exclusive ownership for actual modifications. This means that while we're not spawning objects, the cache has no interference.
The RWLockSlim also has an "upgradable" lock feature, which I somewhat used for reading whether an object was already there before committing to the write. I still kept this check only in Editor, we shouldn't have this in real code.

For Registration, I first experimented with having factory functions like GameObjectHelper handle registration. This is because registering multiple objects at once would be faster than doing it once per object. However, I could not find a way to do it in a way that would guarantee that mods that spawn objects are also covered. So I went with the per-object registration in one of their component's Awake.

One detail of note here is that I don't "unregister" objects. I could easily add an OnDestroy event on all components where I do the register on Awake and remove them from the cache there. However, I find that this would cause many unnecessary write locks, so I decided not to do that. Instead, whenever we write to the cache, we take the opportunity to remove any destroyed objects.

Whenever a gameplay system needs to lookup all the GameObjects of a certain type, the cache will not only avoid returning destroyed objects, but it will also not return inactive objects. Similarly, when trying to get a specific component type on the objects, objects where the component is disabled will also be skipped. Both of these behaviors reproduce how FindObjectsOfType works.

DFU's caches

In the end, ActiveGameObjectCache currently has 6 caches: Enemy, Civilian Mobile NPC, Loot, Foe Spawner, Action Door, and Static NPC. These correspond to six GameObject types where FindObjectsOfType lookups are frequently made, and happen to match six different prefabs in Assets/Prefabs/Scene.
For each of these GameObject types, we identify a primary component (where the registration is made) and some secondary components which are frequently used in lookups.

Enemy:

  • Primary: DaggerfallEnemy
  • Secondaries: DaggerfallEntityBehaviour, QuestResourceBehaviour
    Civilian Mobile NPC:
  • Primary: MobilePersonNPC
  • Secondaries: DaggerfallEntityBehaviour
    Loot:
  • Primary: DaggerfallLoot
    Foe Spawner
  • Primary: FoeSpawner
    Action Door:
  • Primary: DaggerfallActionDoor
    Static NPC:
  • Primary: StaticNPC
  • Secondaries; QuestResourceBehaviour

Two cases stand out here. Some DaggerfallEntityBehaviour lookups can be intended to grab more than one GameObject type. EnemySenses wants the behaviour from both other enemies and the player. PlayerGPS wants the behaviours of both enemies and static NPCs. Fortunately, it's pretty easy to concat two IEnumerable<DaggerfallEntityBehaviour> together, the changes were minimal. QuestResourceBehaviour can also be either for enemies or for static NPCs.

Ultimately, the flow looks like one of two things.
Registration:

// ObjectComponent.cs
+ void Awake()
+ {
+   ActiveGameObjectDatabase.RegisterThing(gameObject);
+ }

Lookups:

// PlayerObjectComponentHandler.cs
- foreach(ObjectComponent component in FindObjectsOfType<ObjectComponent>())
+ foreach(ObjectComponent component in ActivateGameObjectDatabase.GetActiveThingObjectComponents())
{
  if(component.Type == ComponentType.Thing)
    ...
}

This is done all over for our 6 object types.

Performance

I wanna update this PR with more specific benchmarks for key DFU cases of lookups and object registration. In practice, I'm pretty confident all lookups with be faster, though object creation has extra overhead. I estimate the impact to be minimal.

Anecdotally, I made a quest that spawns hundreds of harpies, and I found the performance better with my changes (due to the many EnemySenses lookups).

Key tests:

  • Walking around town (PopulationManager spawning Civilian Mobile NPCs)
  • Transition in and out of Interiors
  • Transition in and out of Scourg Barrow
  • Transition in and out of Daggerfall Palace
  • Tedious Traveling with Travel Options on a World of Daggerfall save (lots of enemy and loot spawning from nearby bandit camps)

Debugging

Such a change requires some form of debugging for when we suspect issues. Two primary tools I've implemented are:

  • Editor-only check against double registration. If mods all follow the same "Awake" pattern on their objects, it should never be an issue, so I don't think we should add such a check in normal gameplay.
  • New visualizers in the tdbg console command (see FPSDisplay.cs). Whenever a user has an issue we suspect could be related, we can ask them to run tdbg and post a screenshot

image

Some more reference numbers from my tests running around:
Entering Scourg Barrow
image

Leaving Scourg Barrow
image

Running around Daggerfall City during the day
image

Inside a house
image

Daggerfall City at night
image

Daggerfall Palace
image

Mod

Mod compatibility is always a concern in post 1.0. I had to reject my initial approach to object registration because mods can easily spawn DFU GameObjects in any way (Location Loader from WoD sure does). I believe the current solution should be safe from any sort of weird spawning patterns mods might have.

The crash and performance benefits on lookups will need be applied manually by mods. For example, I had users report crashes with Horrible Hordes when entering main quest dungeons, and this was due to a StaticNPC lookup. In 1.1, we will be able to use ActiveGameObjectDatabase.GetActiveStaticNPCs() instead.

One thing I did not take into account yet are DFU objects that mods might do lookups on but not DFU itself. RDBBlocks, Interiors, RMBBlocks, Lights... I did not want to overreach into object types that I couldn't test properly. Mods will have to manifest their interest if they need DFU to track more of its object types.

Improvements

Garbage collection

While I mentioned that I deliberately don't clear objects as they are destroyed, I do think we could use some points where we clear objects before a new one is added. For example, leaving a large dungeon will leave lots of dead enemies, loot, action doors, and potentially static NPCs. Lookups in PlayerGPS will still traverse these big lists to check if the object is alive, which could be avoided if we cleared the list.

@numidium Do you have suggestions for clear places where we can "collect garbage" on this system? My current idea is to do it on dungeon transitions, but I'm not sure when GameObjects are all properly destroyed after a dungeon exit.

Remaining FindObjectsOfType

I left a few FindObjectsOfType lookups.

I ignored all the FindObjectOfType (singular) lookups for DaggerfallUnity scene objects. These are often just done once on scene load for one object and cached. We don't need a separate cache.

I did not cover DaggerfallStaticDoors, which is a component on Interior and RDB objects. I was mostly concerned with having to introduce two new caches for one lookup, but maybe the caches would be useful for certain mods.
I also did not cover DaggerfallMarker. It's an obscure object, I'm not sure how to test it.
There is one lookup for QuestResourceBehaviour that uses Resources.FindObjectsOfTypeAll, which covers inactive objects. It's a weird unique case, I did not cover it for now.
There is one lookup for SongPlayer in DefaultCommands. Commands run while the game is paused, not really an issue for crashes.

Conclusion

This is a big change to DFU, but given that the crashes really ruin some users, I believe it is necessary to integrate as soon as possible. I will block version 1.1 on this. In addition, the earlier we deploy this, the more mods can benefit from the improvements.

I will probably deploy a test build with this change (and some others) before it gets approved. We can check for regressions, and see if users get other forms of weird crashes.

…Enemies, Civilian Mobiles, Loot, Foe Spawners, Action Doors, and Static NPCs. This is intended to replace FindObjectsOfType lookups, which are slow, and have been known to crash on some user setups. Objects are registered from the Awake event of their primary component.
@KABoissonneault KABoissonneault added this to the DFU 1.0.1 milestone Mar 1, 2024
@KABoissonneault
Copy link
Collaborator Author

I've added the "RDB" cache. Might as well cover the DaggerfallInventoryWindow check, even if I don't think it was problematic. For static doors on Interiors, well, there's only one Interior object, you can get it on the PlayerEnterExit.

I finally ran the profiling on a dev build. I added a new mark for dungeon layout. I did this one specifically because main quest dungeons do the most spawning, and it's they add objects of many types (enemies, loot, action doors, rdbs, static NPCs).

The results seem clear that this change has little impact on the creation of GameObjects.

Daggerfall.LayoutDungeon
	Master
		Scourg Barrow
			4714.68ms
			3938.70ms
			4258.01ms
		Daggerfall Palace
			3581.92ms
			3050.95ms
			3303.34ms

	New
		Scourg Barrow
			4659.25ms
			4045.06ms
			4175.31ms
		Daggerfall Palace
			3211.11ms
			3277.56ms
			3405.08ms

The numbers vary a lot, but do not diverge significantly from the average on the current master branch.

On another note, doing this, I noticed that while the automap creates a second set of RDBs for its view, those RDBs do not get destroyed when you leave the dungeon. They'll stay inactive until the player enters a new dungeon. Maybe that's something for #2528 . But yeah, another perk of having this new GameObject tracker in tdbg.

@numidium
Copy link
Collaborator

numidium commented Mar 4, 2024

Do you have suggestions for clear places where we can "collect garbage" on this system? My current idea is to do it on dungeon transitions, but I'm not sure when GameObjects are all properly destroyed after a dungeon exit.

In #2528 I run a full sweep when exiting dungeons so that could be one place to do it (admittedly this is because it was the only way I could figure out how to get rid of every single mesh left dangling by the dungeon).

I noticed that MaterialReader uses temporal caching i.e. it clears its cached resource if it hasn't been used after a specified period of time. Here's the pruning method for reference:

internal void PruneCache(float time, float threshold)

If you go that route then beware of expiring references. I don't remember exactly which mod it was but I think Carademono was showing me resource references going null after a period of time and causing things to disappear.

@KABoissonneault
Copy link
Collaborator Author

This PR has been tested by quite a few people with no known regressions. It's pretty easy to test on the Test 2 build I released on Lysandus' Tomb, with the tdbg command to track the state of the data in real time.

I've reached out to a user who was experiencing frequent DFU crashes in 1.0, and they have not had any since I sent them the test build. This change is pretty essential for these users, I would not deploy a new DFU version without this.

To maintainers concerned with the size of this change, note that most changes are just changing FindObjectsOfType<Thing>() to ActiveGameObjectDatabase.GetActiveThing(). I also change some index-based for loops to foreach-loops.
Once you see the pattern, it's pretty easy to get through.
I know the main file is more difficult though, but I'm confident in its stability now.

I need to insist that DFU cannot move forward without two reviewers on this. All the other PRs are blocked by primarily this.

Copy link
Contributor

@magicono43 magicono43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to put my anecdotal testing experience out there involving this PR.

Been playing with the experimental v1.1.0 build that has this PR included in it, using a moderately modded set-up for a fair few hours the past 2-3 weeks and I have personally not experienced any crashes or notable performance issues thus far, usually in sessions lasting 3-5 hours. On a Windows 10 system.

@KABoissonneault KABoissonneault merged commit 8efd0e8 into Interkarma:master Apr 6, 2024
@KABoissonneault KABoissonneault deleted the fix/find-objects-crash branch April 6, 2024 17:41
@KABoissonneault KABoissonneault restored the fix/find-objects-crash branch April 6, 2024 17:41
@KABoissonneault KABoissonneault deleted the fix/find-objects-crash branch April 6, 2024 17:41
@KABoissonneault KABoissonneault mentioned this pull request Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants