-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPTIMIZATION] Explicitly make xidmap shards as nil #4738
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @animesh2049 and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @animesh2049 and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved
Fix memory held by b+ tree even in the reduce phase of bulk loader by bringing back the change #4738.
Fix memory held by b+ tree even in the reduce phase of bulk loader by bringing back the change #4738.
In
bulk/live loader
, we usexidmap
to keep mapping of external ids to UUIDs. In case of bulkloader, this is required only in
map phase
. Hence at the completion ofmap phase
we makexidmap
instance used in bulk loader asnil
. Ideally this should be good forxidmap
GC andrelease of underlying memory. But this is not happening and
xidmap
still shows up in memoryprofiles even during
reduce phase
.This is fixed by making xidmap.shards = nil in
xidmap.Flush()
method.Bulk loader memory profile on 21M dataset during reduce phase:
Profile has been taken after
reduce phase
is completed 50%.Master
This PR:
I also ran bulk loader on freebase dataset.
System Conf: ubuntu18, 16 core CPU, 64 GB RAM
Datasize: compressed 31GB, ~3.13B RDF
Master:
Time for completion: could not complete, crashed in reduce phase after running for ~2min.
Map phase time: ~1h53m
Reduce phase time:
Peak Memory usage(RES):
This PR:
Time for completion: ~03h40m (successful completion)
Map phase time: ~1h55m
Reduce phase time: ~1h45m
Peak Memory usage(RES): ~56 GB
This change is