-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add changelog operator #578
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 7907254 in 19 seconds
More details
- Looked at
383
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/client_tests/test_dataset.py:4629
- Draft comment:
The test case for the changelog operator should include more scenarios to ensure comprehensive testing. Consider adding cases for different kinds of operations (e.g., multiple inserts, deletes, updates) and edge cases. - Reason this comment was not posted:
Confidence changes required:50%
The changelog operator is being added, and the test case is updated to reflect this change. However, the test case is not comprehensive enough to cover all scenarios.
Workflow ID: wflow_bLY5u3OlVDRIV0c5
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
openings = pd.DataFrame( | ||
{ | ||
"creater_id": [1, 2, 1, 1, 1, 1], | ||
"job_id": [1, 2, 3, 4, 5, 6], | ||
"creation_ts": creation_ts, | ||
} | ||
) | ||
log(JobOpening, openings) | ||
client.log("fennel_webhook", "JobOpening", openings) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making kind a key is a wierd thing tbh, I was expecting creater_id and maybe join_id to be the key.
In one test you do
.latest().changelog() and assert that the chanelog produce the correct inserts and delete ( assert against client.get_dataset_df )
and another test
where we take dataset A
then derive B from A which does .groupby.latest().changelog.filter(filter deletes)groupby.latest() and we should get A back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made kind
a key because -
a) get_datatset_df
is not defined for integration client, so couldn't test on the whole output of changelog
b) latest
only picks one row for a timestamp randomly (atleast in server code), and we have inserts and deletes at the same timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest only picks one row for a timestamp randomly (atleast in server code), and we have inserts and deletes at the same timestamp.
that is expected behavior, when you look up you should see the insert.
Anycase, i guess this test gets the job done, so marking it resolved
fennel/testing/executor.py
Outdated
df[obj.kind_col] = "insert" | ||
delete_df[obj.kind_col] = "delete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets convert these to bools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 71b88be in 24 seconds
More details
- Looked at
319
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/testing/executor.py:957
- Draft comment:
The logic for settingdf[obj.kind_column]
anddelete_df[obj.kind_column]
can be simplified by directly usingobj.deletes
without the conditional check. This will make the code cleaner and more readable. - Reason this comment was not posted:
Confidence changes required:50%
The changelog operator implementation inexecutor.py
andserializer.py
useskind_column
anddeletes
attributes, which are correctly aligned with the protobuf changes. However, theexecutor.py
file has a redundant check forobj.deletes
when setting thekind_column
values. This can be simplified.
Workflow ID: wflow_5wNhszGtlRrOp5kU
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 373f171 in 1 minute and 3 seconds
More details
- Looked at
2913
lines of code in23
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/gen/connector_pb2.py:16
- Draft comment:
Ensure that the import statement forexpression_pb2
is consistent across all files. It was changed fromexpr_pb2
toexpression_pb2
. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_wxI9y8eEjqQDiKJR
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
373f171
to
12dee17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 12dee17 in 1 minute and 1 seconds
More details
- Looked at
220
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. fennel/datasets/datasets.py:374
- Draft comment:
Consider updating the docstring or comments for thechangelog
method to reflect the new usage with keyword arguments fordelete
andinsert
columns. - Reason this comment was not posted:
Confidence changes required:50%
The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
2. fennel/datasets/datasets.py:581
- Draft comment:
Consider updating the docstring or comments for theChangelog
class to reflect the new usage with keyword arguments fordelete
andinsert
columns. - Reason this comment was not posted:
Confidence changes required:50%
The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
3. fennel/internal_lib/to_proto/serializer.py:386
- Draft comment:
Consider updating the docstring or comments for thevisitChangelog
method to reflect the new usage with keyword arguments fordelete
andinsert
columns. - Reason this comment was not posted:
Confidence changes required:50%
The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
4. fennel/testing/executor.py:954
- Draft comment:
Consider updating the docstring or comments for thevisitChangelog
method to reflect the new usage with keyword arguments fordelete
andinsert
columns. - Reason this comment was not posted:
Confidence changes required:50%
The changelog method in datasets.py has been updated to accept keyword arguments for delete and insert columns. However, the docstring or comments in the method do not reflect this change. It's important to update the documentation to reflect the new usage pattern.
Workflow ID: wflow_4LGiaYYtPhnBvUZd
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 5b6aa8e in 37 seconds
More details
- Looked at
19
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/client_tests/test_dataset.py:4692
- Draft comment:
The removal of the workaround code related tocreation_ts_expected
andis_deletes
is appropriate if the client behavior has been updated to handle this case correctly. Ensure that the integration client no longer requires this adjustment. - Reason this comment was not posted:
Confidence changes required:50%
The removed code seems to be a workaround for a specific client condition. The removal is justified if the condition no longer applies or if the client behavior has been updated.
Workflow ID: wflow_HBY7aNqogcto1MGf
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 812f944 in 28 seconds
More details
- Looked at
27
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/datasets/datasets.py:598
- Draft comment:
Ensureself.kind_column
is not already inval_fields
to avoid overwriting existing data. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_0Cqa5caWsPwfXq6t
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on b2015e6 in 32 seconds
More details
- Looked at
34
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. fennel/internal_lib/to_proto/serializer.py:389
- Draft comment:
Ensurekind_column
is serialized as a string, not directly as an object attribute. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_t6qeRALY6QoOe1vF
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
In this change we add changelog operator which converts a deltaframe into an append only
changelog, with keys and kinds stored in value columns.
Important
Add changelog operator to convert deltaframe into append-only changelog with operation types stored in
kind_column
.changelog
operator indatasets.py
to convert deltaframe into append-only changelog.kind_column
to store operation type (insert/delete).test_changelog_operator
intest_dataset.py
to validate changelog functionality.visitChangelog
inserializer.py
andexecutor.py
to handlekind_column
.Changelog
class indataset_pb2.py
anddataset_pb2.pyi
to usekind_column
.pyproject.toml
.This description was created by for b2015e6. It will automatically update as commits are pushed.