Skip to content

Conversation

rahulKQL
Copy link
Owner

It seems Bigtable classic client & veneer client behaves differently for duplicated cells. A Row can contain duplicated cells when the user applies Interleave filters. The behavior of both of these clients are:

  • Classic client: Here, we check if a cell contains some labels or not, If it does then we include that cell in end result if it doesn't then we compare timestamp and qualifier with previous no label cell(Because a cell with a label could have been produced by applying filters).
  • Veneer client: Here, we do not perform these checks. (Most of the other client allows duplicate cells confirmed on Go/NodeJs/C# bigtable client. Not sure why but python-bigtable does not allow duplicate cells)

Assumption: The labels are applied to determine which filters produced those cells, So we are including all cells where labels are present.

pmakani
pmakani previously approved these changes Apr 14, 2020
athakor
athakor previously approved these changes Apr 14, 2020
@@ -159,6 +162,9 @@ public void cellValue(ByteString newValue) {
* </ul>
*
* A flattened version of the {@link RowCell} map will be sorted correctly.
*
* <p>In case user applies {@link InterleaveFilter} than a {@link Cell} can appear more than
* once, but the duplicate cells will appear one after another.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

than -> then
it's not clear to me, what do you mean 'cells will appear one after another'

I would rephrase:

Applying {@link InterleaveFilter} may result in a row to contain duplicated cells.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cells will appear one after another => I mean the duplicate cells will always be in group one after another (in that sequence)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no 'but' here.
may be:

, but the duplicate cells will appear in a group i.e. one after another.

, where duplicates are grouped in sequences.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it 🙏

@rahulKQL rahulKQL dismissed stale reviews from athakor and pmakani via 791a209 April 14, 2020 12:21
dmitry-fa
dmitry-fa previously approved these changes Apr 14, 2020
Copy link

@dmitry-fa dmitry-fa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It seems Bigtable classic client & veneer client behaves differently for duplicated `cells`. A Row can contain duplicated cell when user applies `Interleave` filters. Behavior of both of these clients are:
   - Classic client: Here, we check if a `cell` contains some `labels` or not, If it does then we include that cell in end result, if it doesn't then we compare `timestamp` and `qualifier` with previous no label cell(Because a cell with label could have been produced by applying filters).
   - Veneer client: Here, we do not performs these checks.(Most of other client allows duplicate cells confirmed on `Go/NodeJs/C#` bigtable client. Not sure why but `python-bigtable` does not allows duplicate cells)

**Assumption:** The labels are applied to determine which filters produced those cells, So we are including all cells where labels are present.

chore: added javadoc to inform user about this bug

 - Added class JavaDoc and code comment for future reference.
 - reset the `previousNoLabelCell` in reset().
 - added unit test to verify dedupe logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants