-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optizmize base migrations #285
Comments
All right here's a high level summary of time spent. For more precise understanding of the code corresponding to each log line check out the timing branch here: master...spike/timing-migration
Timing with cache
|
It's clear the main issue is the baseline migration of code cids. It's unclear why this is slower than expected. Also memory usage seems about half of what I'd expect (8 bytes for actor cid * ~3_000_000 actors ~24 GiB). Some hypotheses are -- 1) caching isn't as good as it should/could be 2) we're hitting some internal inflection point in go map performance vs size since there are more actors in the state tree. At this point I'm suspicious we're not going to be able to make this significantly faster by tomorrow. But I'll keep looking and run a pprof trace. |
nv23-migration-mainnet.zip |
From what I can see it looks like go GC is the expensive thing |
Watching htop I see a max memory of 15 GB and it looks like the cache is taking 11GB this seems off by 2 so I'll look into measuring cache size. --edit-- --edit 2-- |
AnalysisHere is a reference to the testing time of the nv21 migration: https://filecoinproject.slack.com/archives/CP50PPW2X/p1693544595593359 This shows migration time with cache ~20s. @rjan90 shared a lower hardware spec at ~37s with cache above but on a good hardware spec machine we see ~24s with cache for the nv23 migration. Note that between the time of that dry run and now the network has grown by ~640,000 actors or about 25% from its ~2500000 actor size at the time. Under the assumptions that the ~20s nv21 migration with cache was 1) bottlenecked by cache reads and not redoing actual migration work 2) the nv21 dry run was on good hardware (I remember it being run last year and Im confident about this) then our 24s migration with cache time lines up very well with the model of a linear increase in time for each actor. This is a model we should expect since the top level migration traverses all actors. Therefore the slowdown is expected. SolutionsSmarter use of GCAs shown in the pprof profile above the CPU bottleneck for running this migration is all in GC and memory management. This seems suboptimal because the migration did not take up much memory (~11 GB for cache ~15 high water mark). Node operators have a lot of memory so if we could figure out how to trade off more memory for less time this would be ideal. I.e. turn off GC completely for data alloced from migration go routines. My initial experiments setting GOGC = 400 and GOGC = off did not improve migration time and I have reached the limits of my understanding of go GC to help solve this. But in principle it appears we could engineer a more memory intense trade off. Top level HAMT diffTo get sublinear we have to stop traversing every actor. Fortunately this is possible by caching nodes in the top level HAMT and skipping re-migration of the subtrees with no changes. I expect this to speed things up significantly in practice. However it is a big departure from the migration model that we've used for around 4 years so we'll want to proceed carefully. IMO we should not try to do this without significantly slipping the nv23 deadline What should we doThe migration times are not ideal but probably safe to run on mainnet. Our two options are
|
FYI @Stebalien I'm sure you have thoughts about both enabling more aggressive memory use and HAMT diffing solutions and possibly other ideas not yet considered. |
@ZenGround0 can you explain what the migration is doing by touching pretty much all the actors? I was running it with some debug printing and seeing it touch ~6M of them and I'm not sure why. I understand that the actors code CIDs have changed, but why are we going through apparently all of the on-chain actors here: https://github.com/filecoin-project/go-state-types/blob/master/migration/util.go#L212 |
Ok nice this is just code that should not be used for code cid migrations. The run down is this code was written long pre-FVM phase I when all actors we migrated needed state migrations.
I think you are seeing 6M cache hits? If so this seems to make sense given the unnecessary migration of all actor heads. 3M actors + 3M state heads. This is a great find that I missed. It should give us 2x performance which puts us at parity with Forest which makes the world make more sense. I'm going to work on a PR for this today in case its not too late to get it in. |
Ok furthermore here's an interesting corallary. The cachedMigrator around the code cid migrator is purely bad. It costs us an unnecessary cache hit on the state head AND this it does no caching of the migrated actor. Furthermore I think the only way to correctly cache top level actors is at the level of HAMT nodes since actors are not stored on disk as single objects so we don't save on disk writes by caching them. The correct way to do this is with HAMT diffing mentioned above. Anything else is more work for less payoff. The only speedup that premigrations give us for a base migration are in writing HAMT nodes to disk already so that the second time around badger doesn't need to actually write it can just noop when we give it a top level actor HAMT node that didnt change. |
ok -- pretty modest gains, idk if its worth rereleasing: Before change:
After change
15% improvement on the part that matters This makes sense -- we halved 10s of time accessing go maps and theres still 15s of time reading HAMT from disk that we have to pay. This also makes me wonder how much better the actual migration is in a lotus node with a hot actor HAMT cache from syncing the chain -- probably not a lot since few actors actually change at the head. |
@ZenGround0 : will you put up a PR for this work so it's there for the future, even if it's not leveraged for nv23 (which is TBD, but I assume we'd include it if we're going to do an additional release anyways) |
I've been spending a little bit of time looking at this, measuring, tweaking and trying different things. If you strip away a lot of the multi-worker complexity so you don't have the synchronisation overhead and can see where the rest of the time is spent, it ends up looking something like this: 35% iterating through existing actors tree (around 45% of this is just badger loading blocks), 15% performing mutation and building up a new actors tree, 10% flushing that actors tree to disk (badger again), and the remaining 40% is spent on dealing with gc concerns because we're doing so many small allocations in the process of serialisation and deserialisation. |
@ZenGround0 can you expand on how you see this working?
From what I understand, we need to migrate the entire actors HAMT of >3M entries because all entries need to be updated because they all get a new The only way I see diffing helping with this is with a pre-migration. Where we perform the full rewrite at one tipset some time earlier than the real migration. Then during the real migration we diff the actors tree between the premigration and current epoch and only mutate the entries new HAMT we made at premigration that have changed since then. Is this what you had in mind, or something else? |
Yes this allows us to make the migration cheap during the part that matters. As you are saying there is no avoiding the pre migration taking a longer time without resorting to more drastic measures |
When benchmarking the nv23 migration on my mainnet node, I get these bench-times:
Migration without cache is taking 41seconds:
Migration with cache is taking 36 seconds which is higher then what our threshold for migration times - which should be less then 1 epoch (30s)
The text was updated successfully, but these errors were encountered: