-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limits tablets and offers bulk import as option for ingest #287
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Two new continuous ingest features are introduced in this changes. First options were added to limit the number of tablets written. Second an options to use bulk ingest was added instead of a batch writer. These features support running a test like the following. * create a continuous ingest table with 1000 tablets * start 100 continuous ingest clients * have each client continually bulk import data to 10 random tablets This test situation will create a lot of bulk import and subsequent compaction activity for Accumulo to handle. These changes add bulk import to the `cingest ingest` command. There is an existing `cingest bulk` command that runs a map reduce job to create bulk files. These changes do not remove the need for the existing map reduce job, they fill a different purpose. The map reduce job can generate really large amount of data to bulk import. These changes allow generating lots of bulk imports w/ small amounts of data. These changes could never generate the amount of data for a single bulk import that the map reduce job could. The following is an example of test scenario that could use both. * create a continuous ingest table with 1000 tablets * use map reduce bulk job to create an initial 10 billion entries in the table * start 100 continuous ingest clients * have each client continually bulk import data to 10 random tablets * stop clients after 12 hours and verify data
The following is example output that shows running these changes where it continually bulk imports into 3 random tablets on a table with 20 tablets. Around every 6s it bulk imports 3 files to 3 tablets w/ one million total key values.
I have run the bulk changes and the live ingest into the same table and then successfully ran the verify map reduce job. |
ddanielr
approved these changes
Nov 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Two new continuous ingest features are introduced in this change. First options were added to limit the number of tablets written. Second an options to use bulk ingest was added instead of a batch writer.
These features support running a test like the following.
This test situation will create a lot of bulk import and subsequent compaction activity for Accumulo to handle.
These changes add bulk import to the
cingest ingest
command. There is an existingcingest bulk
command that runs a map reduce job to create bulk files. These changes do not remove the need for the existing map reduce job, they fill a different purpose. The map reduce job can generate really large amount of data to bulk import. These changes allow generating lots of bulk imports w/ small amounts of data. These changes could never generate the amount of data for a single bulk import that the map reduce job could. The following is an example of test scenario that could use both.