Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BITOP command #471

Merged
merged 3 commits into from
Mar 18, 2022
Merged

Support BITOP command #471

merged 3 commits into from
Mar 18, 2022

Conversation

ColinChamber
Copy link
Contributor

No description provided.

@git-hulk git-hulk added enhancement type enhancement feature type new feature release notes labels Feb 6, 2022
git-hulk
git-hulk previously approved these changes Feb 6, 2022
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for your hard work! @ColinChamber

This implementation may cause performance issue when the bitmap was sparsely, but it's more clear and simple than using the seek operator. So it looks good to me. HDYT @ShooterIT

@git-hulk
Copy link
Member

git-hulk commented Feb 6, 2022

cool, thanks for your hard work! @ColinChamber

This implementation may cause performance issue when the bitmap was too sparsely, but it's more clear and simple than using the seek operator. So it looks good to me. HDYT @ShooterIT

@ColinChamber
Copy link
Contributor Author

This implementation may cause performance

I think so.
Currently, the bitop scene can't simply use the seek operator. Kvrocks use the default ordering function for key, which orders bytes lexicographically. For example, the order of the bitmap subkey after iteration is key|version|1024, key|version|10240, key|version|2048, not key|version|1024, key|version|2048, key|version|10240, which will make the operation of the same offset fragment between different keys unattainable. I think this is also the reason why bitcount and bitpos can't use the seek operator. In the future, we can do some refactoring, modify the encode and decode of the bitmap, and fill in the same number of digits. @git-hulk

@git-hulk
Copy link
Member

git-hulk commented Feb 7, 2022

@ColinChamber Yeah, the key order was easy to change, what i think was to seek out all fragments and apply the operator on those fragments without depending on the key order.

@ColinChamber
Copy link
Contributor Author

Do you mean GetString and then do the operation? That seems not a suitable method that will use too much memory and is inefficient when operating on multiple keys.

@git-hulk
Copy link
Member

git-hulk commented Feb 7, 2022

@ColinChamber RocksDB write batch would also cache those fragments on memory before writing indeed, so it should be not different.

@ColinChamber
Copy link
Contributor Author

RocksDB write batch would also cache those fragments on memory before writing indeed.

That is the memory consumed by destkey.
Seeking out all fragments will increase the memory consumed by srckey.

  • BITOP AND destkey srckey1 srckey2 srckey3 ... srckeyN
  • BITOP OR destkey srckey1 srckey2 srckey3 ... srckeyN
  • BITOP XOR destkey srckey1 srckey2 srckey3 ... srckeyN
  • BITOP NOT destkey srckey

@git-hulk
Copy link
Member

git-hulk commented Feb 7, 2022

Yeah, we can apply those fragments one by one, so the number of memory was same as current solution.

@ColinChamber
Copy link
Contributor Author

It will depend on the subkey order. If we want to seek, then iterate one by one.

@git-hulk
Copy link
Member

git-hulk commented Feb 7, 2022

Sorry that I didn't explain clearly. What I mean was below:

  1. alloc the dest bitmap with the max size
  2. seek key fragments then apply the operator and fragment into the dest bitmap fragment with the subkey(it's ok that subkey was lexical order)
  3. repeat step 2 if we have many keys

@ColinChamber
Copy link
Contributor Author

Sorry for my late understanding. Your method will improve performance, but the number of calculations will increase. I think the best way is to iterate and calculate all srckeys simultaneously, which relies on getting subkeys in index order.

Getting subkeys in index order can also improve the performance of bitcount. What is the reason we don't do that? Is it to maintain compatibility with the previous version?

@git-hulk
Copy link
Member

git-hulk commented Feb 7, 2022

For why we didn't use seek operator at bitcount command, it's a design mistake because we only focus on hot commands like getbit and setbit, and didn't carefully design the key format before implementing the bitcount.

A compatible way is to use the extra bits to flag the new key format, but it's too tricky which I don't think we should do it now. So current implementation was good to me and can refactor it if those commands performance issue was the key point for us.

@ColinChamber ColinChamber requested a review from ChrisZMF March 12, 2022 15:15
@git-hulk
Copy link
Member

I have no advice on this PR. @ShooterIT @ChrisZMF

@ChrisZMF
Copy link
Contributor

LGTM @ShooterIT

@git-hulk git-hulk removed their request for review March 18, 2022 02:04
@git-hulk git-hulk merged commit 74ebdc9 into apache:unstable Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type enhancement feature type new feature release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants