-
Notifications
You must be signed in to change notification settings - Fork 0
fix: remove duplicated cell when interleave filter is applied #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -159,6 +162,9 @@ public void cellValue(ByteString newValue) { | |||
* </ul> | |||
* | |||
* A flattened version of the {@link RowCell} map will be sorted correctly. | |||
* | |||
* <p>In case user applies {@link InterleaveFilter} than a {@link Cell} can appear more than | |||
* once, but the duplicate cells will appear one after another. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than -> then
it's not clear to me, what do you mean 'cells will appear one after another'
I would rephrase:
Applying {@link InterleaveFilter} may result in a row to contain duplicated cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cells will appear one after another
=> I mean the duplicate cells will always be in group one after another (in that sequence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no 'but' here.
may be:
, but the duplicate cells will appear in a group i.e. one after another.
, where duplicates are grouped in sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It seems Bigtable classic client & veneer client behaves differently for duplicated `cells`. A Row can contain duplicated cell when user applies `Interleave` filters. Behavior of both of these clients are: - Classic client: Here, we check if a `cell` contains some `labels` or not, If it does then we include that cell in end result, if it doesn't then we compare `timestamp` and `qualifier` with previous no label cell(Because a cell with label could have been produced by applying filters). - Veneer client: Here, we do not performs these checks.(Most of other client allows duplicate cells confirmed on `Go/NodeJs/C#` bigtable client. Not sure why but `python-bigtable` does not allows duplicate cells) **Assumption:** The labels are applied to determine which filters produced those cells, So we are including all cells where labels are present. chore: added javadoc to inform user about this bug - Added class JavaDoc and code comment for future reference. - reset the `previousNoLabelCell` in reset(). - added unit test to verify dedupe logic.
9fa0895
to
84840ec
Compare
It seems Bigtable classic client & veneer client behaves differently for duplicated
cells
. A Row can contain duplicatedcells
when the user appliesInterleave
filters. The behavior of both of these clients are:cell
contains somelabels
or not, If it does then we include that cell in end result if it doesn't then we comparetimestamp
andqualifier
with previous no label cell(Because a cell with alabel
could have been produced by applying filters).Go/NodeJs/C#
bigtable client. Not sure why butpython-bigtable
does not allow duplicate cells)Assumption: The labels are applied to determine which filters produced those cells, So we are including all cells where labels are present.