Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for memory leakages in SteamDB #292

Merged
merged 1 commit into from
Jun 26, 2016
Merged

Fix for memory leakages in SteamDB #292

merged 1 commit into from
Jun 26, 2016

Conversation

Netshroud
Copy link

  1. serverMap was never being cleared, so re-using the same SteamClient object when reconnecting to Steam caused memory to balloon.
  2. There was no way to run all pending callbacks, which includes a ServerListCallback. If only running callbacks periodically, e.g. every tick of an application, it will take N ticks to clear a queue of N callbacks. I've added an API to run all pending callbacks, which should fix this issue.

@JustArchi
Copy link
Contributor

JustArchi commented Jun 23, 2016

That's why I prefer using HashSet over List 😄

There will still be leak if Steam decides to push us new list of servers before actually disconnecting, as our current Lists will be populated with duplicate entries. Wouldn't it be better to just use HashSet and forget about the issue? Clearing dictionary also doesn't make much sense, as the memory is never reclaimed (no TrimExcess()), it only adds extra work generating new List objects on the reconnect (for same types), and if user doesn't re-use SteamClient then GC will pick it up sooner or later anyway.

I'd be up for using HashSet instead of List in dictionary and clearing + trim excessing them on disconnect, instead of clearing the Dictionary. That makes more sense IMHO. We'll not do the same work twice (allocating new HashSet/List objects), and if user abandons the object then GC will finish the job.

@Netshroud
Copy link
Author

Netshroud commented Jun 24, 2016

There will still be leak if Steam decides to push us new list of servers before actually disconnecting, as our current Lists will be populated with duplicate entries.

True. Odd edge case, but definitely worth considering.

Wouldn't it be better to just use HashSet and forget about the issue?

Probably. It does mean that we could still end up with stale data, but at least memory won't balloon indefinitely.

Clearing dictionary also doesn't make much sense, as the memory is never reclaimed (no TrimExcess())

I'm not expecting subsequent ServerList messages to contain less server types. Even if it does, the memory should be relatively negligible.

I'll switch it over to HashSet.

@JustArchi
Copy link
Contributor

It does mean that we could still end up with stale data

That's true, but considering steam maintenances and everything what is happening, our data can still be cleared on disconnect and recent enough to satisfy us. I doubt that any bot can handle being truly online without a disconnect for more than a few days anyway, so that serverMap.Clear(); could still be there to remove servers that got inactive over time between our last log in and current one. Up to you to decide.

HashSet definitely solves other issues though, so I still see it useful. I even got into habit myself of using HashSet as general-purpose collecton for everything, unless I specifically want duplicates or access through index. Makes lots of things easier 😄

Good job on this PR, I was wondering why my SteamClient was getting heavier over time 👍

@Netshroud Netshroud added this to the 1.8.0 milestone Jun 25, 2016
@Netshroud
Copy link
Author

I might have to think about this a little more... ideally you should get no servers back when disconnected, but I don't want to thrash memory on object creation/GC.

@JustArchi
Copy link
Contributor

JustArchi commented Jun 26, 2016

HashSet supports advanced collection-like functions, including ExceptWith, UnionWith or SymetricExceptWith. For comparing current list of servers with what we have already we could simply do following trick, considering we have old servers and new ones ready:

HashSet<IPEndPoint> oldServers, newServers;
oldServers.SymmetricExceptWith(newServers);
newServers.IntersectWith(oldServers);
oldServers.ExceptWith(newServers);

Old servers are direct copy of what we have in serverMap already, while new servers are in callback.

After such thing, oldServers will include only servers that we had, that didn't exist, while newServers will include only servers that should be added. We can now operate on serverMap and simply ExceptWith(oldServers) and UnionWith(newServers). This is one of the best methods to update serverMap without having to recreate entire set, although some memory will be used for our temporary helper sets that will be modified.

It solves problem of stale data, yet avoids recreating set from nothing. I'm using similar trick for updating my MySQL database to avoid fragmentation - In database I have set A, currently I have set B, and the objective is to remove from A what doesn't exist in B, and add to A what exists in B only, but not in A already. Recreating entire thing is easier, but I actually store gameIDs, so fragmentation would kill me for users with 2k+ games. It's more efficient to do calculation and push only "changes" instead of entire data. Not to mention that write operation is expensive in my case considering write lock is acquired, so the less data to write - the better.

@Netshroud
Copy link
Author

Netshroud commented Jun 26, 2016

I don't think the issue is the IPEndPoint objects, since:

  • N(newservers) objects will be created
  • N(oldservers - newservers) objects will be GC'ed
  • N(newservers) objects will either be GC'ed, or retained by the HashSet, depending on how large the overlap is.

Looking at the extremes, if the HashSet is empty, then:

  • N(newservers) objects will be allocated
  • N(newservers) objects will be assigned to the HashSet
  • 0 objects will be deallocated

On the flip side, if the hashset contains all new servers exactly, then:

  • N(newservers) will be allocated
  • 0 objects will be assigned to the hashset
  • N(newservers) objects will be GC'ed

So it would appear that no matter what we do, N(newservers) objects will be allocated, and typically most or all will be GC'ed. So I don't think we can do anything about the amount of IPEndPoints in memory.

What we could do is flip back to the original method, but retain the List/HashSet. Clear it out, don't TrimExcess because we will probably need to grow back to roughly the original size. That way only IPEndPoints thrash memory, but the collection object itself hangs around.

Am I making sense?

@JustArchi
Copy link
Contributor

JustArchi commented Jun 26, 2016

Am I making sense?

Definitely it does make sense to me, it's all based on what goals you want to set for SK2 as a maintainer.

I'd still suggest to keep HashSet mainly for it being free from duplicates issue - if we don't expect duplicated endpoints, and we don't, then this is not only good from implementation viewpoint, but logical as well. It solves the possibility of Steam talking crap and sending us two or even more same requests, and HashSet is superior when it comes to Contains() and Add(), checking if endpoint is in set already is O(1) operation.

TrimExcess() makes sense if goal of SK2 is to keep memory low, but on the other hand it should be avoided if we focus on performance. We don't know if the object will be reused or not - If it does, then indeed TrimExcess() doesn't make sense from performance and logical view, and if it won't, then GC will finish the job sooner or later anyway, so there is more arguments against TrimExcess() than after. I just have a habit of keeping static and global objects tidy - it doesn't make that much sense here, especially because we expect it to be refilled with same or nearly the same number of objects in a moment.

So indeed, I think that original method with HashSet instead of List is the best what can be done. We could in theory keep List without duplicates by clearing it, like you did initially, but HashSet doesn't solve only that problem, it also solves the problem of eventual dupe requests and other things that can happen to the same connected SteamClient. And I think nobody wants to come to this issue later, if Valve decides to change it and push updates to already connected SteamClient 👍

@Netshroud Netshroud merged commit f4c5d34 into master Jun 26, 2016
@Netshroud Netshroud deleted the memory-fix branch June 26, 2016 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants