Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document bug in bulk batch writer and rename the class to indicate its buggy #288

Merged
merged 3 commits into from
Dec 15, 2024

Conversation

keith-turner
Copy link
Contributor

BulkBatchWriter was using Arrays.compare to compare Accumulo rows which is incorrect because it compares bytes as signed integers. Need to compare the bytes as unsigned integers. Updated the code to correctly compare. This bug would not cause data to go to the wrong tablet it could only cause needless creation of many files for a single tablet.

BulkBatchWriter was using Arrays.compare to compare Accumulo rows which
is incorrect because it compares bytes as signed integers.  Need to
compare the bytes as unsigned integers.  Updated the code to correctly
compare.  This bug would not cause data to go to the wrong tablet it
could only cause needless creation of many files for a single tablet.
@keith-turner
Copy link
Contributor Author

This bug has actually been really useful for testing as it creates bulk imports w/ lots of files for a few tablets.

@ctubbsii ctubbsii deleted the branch apache:main December 10, 2024 20:23
@ctubbsii ctubbsii closed this Dec 10, 2024
@ctubbsii ctubbsii reopened this Dec 10, 2024
@ctubbsii ctubbsii changed the base branch from elasticity to main December 10, 2024 20:27
@ctubbsii
Copy link
Member

This bug has actually been really useful for testing as it creates bulk imports w/ lots of files for a few tablets.

But you're still planning on merging it?

(Note: there's a conflict in this PR now, because it touched some of the same files as #289, which was merged)

@keith-turner
Copy link
Contributor Author

But you're still planning on merging it?

Thought about making the current behavior configurable OR just leaving the current behavior and adding a comment about how its incorrect and was left as is because its useful for testing. Would not want someone to copy this broken code and use it elsewhere.

@ctubbsii
Copy link
Member

I think the rename is sufficient to avoid the risk of somebody using it as an example. If you rename that and push it to the relevant branch(es), is there still something to keep from this PR or should it be closed?

@keith-turner
Copy link
Contributor Author

I think the rename is sufficient to avoid the risk of somebody using it as an example. If you rename that and push it to the relevant branch(es), is there still something to keep from this PR or should it be closed?

The changes in this PR no longer fix the bug they only rename the class to FlakyBulkBatchWriter and document the bug in the class. I will update the title of the PR

@keith-turner keith-turner changed the title fixes bug in bulk batch writer Document bug in bulk batch writer and rename the class to indicate its buggy Dec 15, 2024
@keith-turner keith-turner merged commit 807b62c into apache:main Dec 15, 2024
1 check passed
@keith-turner keith-turner deleted the bulk-batchwriter-fix branch December 15, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants