Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changelog operator #578

Merged
merged 5 commits into from
Oct 15, 2024
Merged

Add changelog operator #578

merged 5 commits into from
Oct 15, 2024

Conversation

satrana42
Copy link
Contributor

@satrana42 satrana42 commented Oct 7, 2024

In this change we add changelog operator which converts a deltaframe into an append only
changelog, with keys and kinds stored in value columns.


Important

Add changelog operator to convert deltaframe into append-only changelog with operation types stored in kind_column.

  • Changelog Operator:
    • Added changelog operator in datasets.py to convert deltaframe into append-only changelog.
    • Uses kind_column to store operation type (insert/delete).
  • Testing:
    • Added test_changelog_operator in test_dataset.py to validate changelog functionality.
  • Code Changes:
    • Updated visitChangelog in serializer.py and executor.py to handle kind_column.
    • Modified Changelog class in dataset_pb2.py and dataset_pb2.pyi to use kind_column.
  • Miscellaneous:
    • Updated version to 1.5.40 in pyproject.toml.

This description was created by Ellipsis for b2015e6. It will automatically update as commits are pushed.

@satrana42 satrana42 self-assigned this Oct 7, 2024
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 7907254 in 19 seconds

More details
  • Looked at 383 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/client_tests/test_dataset.py:4629
  • Draft comment:
    The test case for the changelog operator should include more scenarios to ensure comprehensive testing. Consider adding cases for different kinds of operations (e.g., multiple inserts, deletes, updates) and edge cases.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog operator is being added, and the test case is updated to reflect this change. However, the test case is not comprehensive enough to cover all scenarios.

Workflow ID: wflow_bLY5u3OlVDRIV0c5


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

openings = pd.DataFrame(
{
"creater_id": [1, 2, 1, 1, 1, 1],
"job_id": [1, 2, 3, 4, 5, 6],
"creation_ts": creation_ts,
}
)
log(JobOpening, openings)
client.log("fennel_webhook", "JobOpening", openings)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making kind a key is a wierd thing tbh, I was expecting creater_id and maybe join_id to be the key.
In one test you do

.latest().changelog() and assert that the chanelog produce the correct inserts and delete ( assert against client.get_dataset_df )

and another test

where we take dataset A

then derive B from A which does .groupby.latest().changelog.filter(filter deletes)groupby.latest() and we should get A back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made kind a key because -
a) get_datatset_df is not defined for integration client, so couldn't test on the whole output of changelog
b) latest only picks one row for a timestamp randomly (atleast in server code), and we have inserts and deletes at the same timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest only picks one row for a timestamp randomly (atleast in server code), and we have inserts and deletes at the same timestamp.
that is expected behavior, when you look up you should see the insert.

Anycase, i guess this test gets the job done, so marking it resolved

Comment on lines 957 to 958
df[obj.kind_col] = "insert"
delete_df[obj.kind_col] = "delete"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets convert these to bools

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 71b88be in 24 seconds

More details
  • Looked at 319 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/testing/executor.py:957
  • Draft comment:
    The logic for setting df[obj.kind_column] and delete_df[obj.kind_column] can be simplified by directly using obj.deletes without the conditional check. This will make the code cleaner and more readable.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog operator implementation in executor.py and serializer.py uses kind_column and deletes attributes, which are correctly aligned with the protobuf changes. However, the executor.py file has a redundant check for obj.deletes when setting the kind_column values. This can be simplified.

Workflow ID: wflow_5wNhszGtlRrOp5kU


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 373f171 in 1 minute and 3 seconds

More details
  • Looked at 2913 lines of code in 23 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/gen/connector_pb2.py:16
  • Draft comment:
    Ensure that the import statement for expression_pb2 is consistent across all files. It was changed from expr_pb2 to expression_pb2.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_wxI9y8eEjqQDiKJR


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 12dee17 in 1 minute and 1 seconds

More details
  • Looked at 220 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. fennel/datasets/datasets.py:374
  • Draft comment:
    Consider updating the docstring or comments for the changelog method to reflect the new usage with keyword arguments for delete and insert columns.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
2. fennel/datasets/datasets.py:581
  • Draft comment:
    Consider updating the docstring or comments for the Changelog class to reflect the new usage with keyword arguments for delete and insert columns.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
3. fennel/internal_lib/to_proto/serializer.py:386
  • Draft comment:
    Consider updating the docstring or comments for the visitChangelog method to reflect the new usage with keyword arguments for delete and insert columns.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
4. fennel/testing/executor.py:954
  • Draft comment:
    Consider updating the docstring or comments for the visitChangelog method to reflect the new usage with keyword arguments for delete and insert columns.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.

Workflow ID: wflow_4LGiaYYtPhnBvUZd


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 5b6aa8e in 37 seconds

More details
  • Looked at 19 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/client_tests/test_dataset.py:4692
  • Draft comment:
    The removal of the workaround code related to creation_ts_expected and is_deletes is appropriate if the client behavior has been updated to handle this case correctly. Ensure that the integration client no longer requires this adjustment.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The removed code seems to be a workaround for a specific client condition. The removal is justified if the condition no longer applies or if the client behavior has been updated.

Workflow ID: wflow_HBY7aNqogcto1MGf


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 812f944 in 28 seconds

More details
  • Looked at 27 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/datasets/datasets.py:598
  • Draft comment:
    Ensure self.kind_column is not already in val_fields to avoid overwriting existing data.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_0Cqa5caWsPwfXq6t


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on b2015e6 in 32 seconds

More details
  • Looked at 34 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/internal_lib/to_proto/serializer.py:389
  • Draft comment:
    Ensure kind_column is serialized as a string, not directly as an object attribute.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_t6qeRALY6QoOe1vF


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@satrana42 satrana42 enabled auto-merge (squash) October 15, 2024 17:18
@satrana42 satrana42 merged commit 44456e0 into main Oct 15, 2024
8 checks passed
@satrana42 satrana42 deleted the sat/changelog branch October 15, 2024 17:22
satrana42 added a commit that referenced this pull request Oct 18, 2024
In this change, we update or add documentation for operator
changes done in #553, #556, #535, #560 and #578.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants