-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design #7666
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
area/enterprise
Related to proprietary features
area/testing
Testing related issues
labels
Mar 30, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 12 files reviewed, all discussions resolved (waiting on @vvbalaji-dgraph)
manishrjain
changed the title
Opt(Restore): Optimize Restore's new map-reduce based design
[BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design
Mar 30, 2021
ahsanbarkati
pushed a commit
that referenced
this pull request
Apr 23, 2021
…gn (#7666) This PR along with the previous restore PR is a BREAKING change. Marking this PR as breaking, because we forgot to mark the previous one. - Make restore map run concurrently for faster execution. - Add progress updates every second for both map and reduce phase. - Refactor code to break out the map and reduce code into separate files. - Make reduce cheap by avoiding marshal-unmarshal step. With these changes, I see map phase is faster than reduce and both finish in about 2 mins each. Map runs at 200 MBps, while Reduce runs at 130 MBps, processing 20GB of uncompressed data in under 5 mins. Changes: * Work on optimizing restore * Some file moves * Moved map output to a temp directory. * Add restoreTs in export-backup Co-authored-by: Ahsan Barkati <ahsanbarkati@gmail.com> (cherry picked from commit 1c7d449)
mangalaman93
pushed a commit
that referenced
this pull request
Dec 13, 2022
fix(lsbackup): Fix profiler in lsBackup (#7729) Bring back "perf(Backup): Improve backup performance (#7601)" Opt(Backup): Make backups faster (#7680) Fix s3 backup copy (#7669) [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) Perf(restore): Implement map-reduce based restore (#7664) feat(backup): Merge backup refactoring Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Dec 14, 2022
fix(lsbackup): Fix profiler in lsBackup (#7729) Bring back "perf(Backup): Improve backup performance (#7601)" Opt(Backup): Make backups faster (#7680) Fix s3 backup copy (#7669) [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) Perf(restore): Implement map-reduce based restore (#7664) feat(backup): Merge backup refactoring Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Dec 28, 2022
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore. following commits are cherry-picked (in reverse order): * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Dec 29, 2022
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore. following commits are cherry-picked (in reverse order): * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 2, 2023
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore. following commits are cherry-picked (in reverse order): * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)" * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup
mangalaman93
pushed a commit
that referenced
this pull request
Jan 2, 2023
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore. following commits are cherry-picked (in reverse order): * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)" * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup
mangalaman93
pushed a commit
that referenced
this pull request
Jan 2, 2023
This commit is a major rewrite of online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes in the case of restore. following commits are cherry-picked (in reverse order): * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)" * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup
mangalaman93
pushed a commit
that referenced
this pull request
Jan 3, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * Don't ban namespace in export_backup * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 3, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * Don't ban namespace in export_backup * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 3, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * Don't ban namespace in export_backup * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 4, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + * fix(restore): return nil if there is error (#7899) skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 4, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + * fix(restore): return nil if there is error (#7899) skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 6, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 17, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 17, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 17, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
mangalaman93
pushed a commit
that referenced
this pull request
Jan 18, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
all-seeing-code
pushed a commit
that referenced
this pull request
Jan 23, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + * fix(restore): return nil if there is error (#7899) skipping /probe/graphql from audit (#7925) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
all-seeing-code
pushed a commit
that referenced
this pull request
Jan 23, 2023
This commit is a major rewrite of backup and online restore code. It used to use KVLoader in badger. Now it instead uses StreamWriter that is much faster for writes. cherry-pick PR #7753 following commits are cherry-picked (in reverse order): * opt(restore): Sort the buffer before spinning the writeToDisk goroutine (#7984) (#7996) * fix(backup): Fix full backup request (#7932) (#7933) * fix: fixing graphql schema update when the data is restored + skipping /probe/graphql from audit (#7925) * fix(restore): return nil if there is error (#7899) * Don't ban namespace in export_backup * reset the kv.StreamId before sending to stream writer (#7833) (#7837) * fix(restore): Bump uid and namespace after restore (#7790) (#7800) * fix(ee): GetKeys should return an error (#7713) (#7797) * fix(backup): Free the UidPack after use (#7786) * fix(export-backup): Fix double free in export backup (#7780) (#7783) * fix(lsbackup): Fix profiler in lsBackup (#7729) * Bring back "perf(Backup): Improve backup performance (#7601)" * Opt(Backup): Make backups faster (#7680) * Fix s3 backup copy (#7669) * [BREAKING] Opt(Restore): Optimize Restore's new map-reduce based design (#7666) * Perf(restore): Implement map-reduce based restore (#7664) * feat(backup): Merge backup refactoring * Revert "perf(Backup): Improve backup performance (#7601)"
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR along with the previous restore PR is a BREAKING change. Marking this PR as breaking, because we forgot to mark the previous one.
With these changes, I see map phase is faster than reduce and both finish in about 2 mins each. Map runs at 200 MBps, while Reduce runs at 130 MBps, processing 20GB of uncompressed data in under 5 mins.
This change is