-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First commit for event index #67
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughRecent updates across multiple files primarily focus on integrating an Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AggregateFunction as aggregate.py
participant ExtractSubtreeFunction as extract_subtree.py
participant PredicateFunction as predicates.py
User->>AggregateFunction: Call aggregate_temporal_window(data)
AggregateFunction->>AggregateFunction: Process data, use and drop _EVENT_INDEX
User->>ExtractSubtreeFunction: Call extract_subtree(data)
ExtractSubtreeFunction->>ExtractSubtreeFunction: Process data with _EVENT_INDEX
User->>PredicateFunction: Call get_predicates_df()
PredicateFunction->>PredicateFunction: Create and return dataframe with _EVENT_INDEX
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- src/aces/aggregate.py (27 hunks)
- src/aces/extract_subtree.py (5 hunks)
- src/aces/predicates.py (4 hunks)
- src/aces/types.py (1 hunks)
- tests/test_e2e.py (8 hunks)
Additional context used
Ruff
tests/test_e2e.py
7-7: Module level import not at top of file (E402)
8-8: Module level import not at top of file (E402)
9-9: Module level import not at top of file (E402)
11-11: Module level import not at top of file (E402)
12-12: Module level import not at top of file (E402)
13-13: Module level import not at top of file (E402)
src/aces/types.py
305-308: Use ternary operator
end_event = self.end_event[1:] if mode == "bound_to_row" else self.end_event
instead ofif
-else
-block (SIM108)Replace
if
-else
-block withend_event = self.end_event[1:] if mode == "bound_to_row" else self.end_event
src/aces/aggregate.py
874-877: Use ternary operator
right_inclusive = False if closed in ("left", "both") else True
instead ofif
-else
-block (SIM108)Replace
if
-else
-block withright_inclusive = False if closed in ("left", "both") else True
880-883: Use ternary operator
right_inclusive = True if closed in ("right", "both") else False
instead ofif
-else
-block (SIM108)Replace
if
-else
-block withright_inclusive = True if closed in ("right", "both") else False
888-891: Use ternary operator
left_inclusive = True if closed in ("left", "both") else False
instead ofif
-else
-block (SIM108)Replace
if
-else
-block withleft_inclusive = True if closed in ("left", "both") else False
894-897: Use ternary operator
left_inclusive = False if closed in ("right", "both") else True
instead ofif
-else
-block (SIM108)Replace
if
-else
-block withleft_inclusive = False if closed in ("right", "both") else True
917-924: Combine
if
branches using logicalor
operator (SIM114)Combine
if
branches
1013-1021: Combine
if
branches using logicalor
operator (SIM114)Combine
if
branches
Additional comments not posted (9)
tests/test_e2e.py (1)
132-132
: Ensure consistency of_LAST_EVENT_INDEX
in test data.The modifications to the test data to include
_LAST_EVENT_INDEX
fields are consistent with the changes described in the AI-generated summary. This ensures that the new event indexing logic is covered by the tests.Also applies to: 143-143, 156-156, 167-167, 180-180, 191-191, 204-204, 215-215
src/aces/types.py (1)
16-16
: Review of new types and constants for event indexing.The introduction of
EVENT_INDEX_TYPE
,EVENT_INDEX_COLUMN
, andLAST_EVENT_INDEX_COLUMN
is consistent with the changes described in the AI-generated summary. These additions are crucial for supporting the new event indexing features across the application.
[APROVED]Also applies to: 26-27
src/aces/extract_subtree.py (1)
11-11
: Review of modifications toextract_subtree
function.The updates to the
extract_subtree
function to include handling of the_EVENT_INDEX
column are consistent with the changes described in the AI-generated summary. These modifications are crucial for ensuring that the function correctly processes and includes this column in the input data, which is essential for the new event indexing functionality.Also applies to: 137-137, 149-160, 248-333
src/aces/predicates.py (2)
13-14
: Updated import statements to include event indexing types.The inclusion of
EVENT_INDEX_COLUMN
andEVENT_INDEX_TYPE
aligns with the changes outlined in the summary, ensuring the necessary types are available for handling the new_EVENT_INDEX
column. This is crucial for maintaining consistency across modules that will interact with event indexes.
Line range hint
490-520
: Introduction of_EVENT_INDEX
in DataFrame output.The addition of
_EVENT_INDEX
in the DataFrame output ofget_predicates_df
function is executed correctly. The column is computed and added in lines 642-645, ensuring every event has a unique index within its subject group. This is crucial for tracking events in sequence analysis or temporal comparisons.However, ensure that all downstream processes that consume this DataFrame are aware of and correctly handle the new
_EVENT_INDEX
column.src/aces/aggregate.py (4)
7-12
: Review of imports and constants:The import of
EVENT_INDEX_COLUMN
aligns with the PR's objective to handle an event index column. This addition appears necessary and correct for the functions modified in this file.
87-87
: Ensure correct handling of_EVENT_INDEX
column:The repeated use of
.drop("_EVENT_INDEX")
in different configurations of theaggregate_temporal_window
function is consistent with the need to remove this column post-aggregation. This change is in line with the PR's objectives and seems correctly implemented across different scenarios.Also applies to: 102-102, 117-117, 132-132, 147-147, 162-162, 181-181, 196-196
314-314
: Review of_EVENT_INDEX
handling inaggregate_event_bound_window
:The introduction and subsequent dropping of
_EVENT_index
withinaggregate_event_bound_window
function is consistent with the handling in other functions. This ensures that the event index does not propagate beyond its necessary scope within the data pipeline.Also applies to: 319-319, 336-336, 337-337, 353-353, 354-354, 370-370, 371-371
Line range hint
214-237
: Complex handling of conditions and_EVENT_INDEX
inboolean_expr_bound_sum
:The function
boolean_expr_bound_sum
has been modified to incorporate the_EVENT_index
in its operations, reflecting the broader PR objectives. The handling of this column and the complex conditional logic used for boundary calculations appear correctly implemented, maintaining the integrity of the data transformation logic.Also applies to: 512-512, 693-693, 715-715, 737-737, 759-759, 781-781, 803-803, 825-825, 847-847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (1)
src/aces/extract_subtree.py (1)
147-149
: Update the example output shape.The shape of the example output DataFrame has changed from
(5, 2)
to(5, 3)
due to the inclusion of the_EVENT_INDEX
column. Ensure that all example outputs are updated to reflect this change.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/aces/main.py (1 hunks)
- src/aces/extract_subtree.py (5 hunks)
- src/aces/predicates.py (5 hunks)
Additional comments not posted (7)
src/aces/__main__.py (1)
49-49
: Verify the DataFrame renaming method.The change in the
rename
method from using thecolumns
keyword to directly passing a dictionary could potentially cause issues if the DataFrame does not handle this method signature as expected. Ensure that the DataFrame correctly interprets this method signature.src/aces/extract_subtree.py (4)
11-11
: Ensure consistency in column handling.The addition of the
EVENT_INDEX_COLUMN
import indicates that the_EVENT_INDEX
column is now being handled. Ensure that this column is consistently handled throughout the function.
137-137
: Verify the inclusion of_EVENT_INDEX
in the DataFrame.The addition of the
_EVENT_INDEX
column in the example DataFrame indicates that this column is now part of the data processing. Ensure that this column is correctly handled in all relevant DataFrame operations.
248-250
: Exclude_EVENT_INDEX
frompredicate_cols
.The exclusion of the
_EVENT_INDEX
column frompredicate_cols
is necessary to ensure that this column is not treated as a predicate. This change is correct and should be maintained.
333-333
: Include_EVENT_INDEX
in the summary.The inclusion of the
_EVENT_INDEX
column in the summary DataFrame ensures that this column is part of the output. This change is correct and should be maintained.src/aces/predicates.py (2)
13-14
: Ensure consistency in column handling.The addition of the
EVENT_INDEX_COLUMN
andEVENT_INDEX_TYPE
imports indicates that the_EVENT_INDEX
column is now being handled. Ensure that these constants are consistently used throughout the function.
490-500
: Verify the inclusion of_EVENT_INDEX
in the DataFrame.The addition of the
_EVENT_INDEX
column in the example DataFrame indicates that this column is now part of the data processing. Ensure that this column is correctly handled in all relevant DataFrame operations.
... }) | ||
>>> aggregate_event_bound_window(df, ToEventWindowBounds(True, "is_C", True, None)) | ||
>>> aggregate_event_bound_window(df, ToEventWindowBounds(True, "is_C", True, None)).drop("timestamp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the "timestamp" column in these tests, and just set the pl.Config
params to print wider dataframes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I did this more to satistfy the pre-commit rules of lines not extending past 120 characters - should doctests still follow these rules? What's the best way to go around this?
I think more generally here, each aggregate function should return:
Note that the distinction between "the window existing" and "events existing" is as follows:
|
@justin13601, as we'll discuss more shortly, I'd actually suggest de-prioritizing this in favor of other things, and closing the PR for now as a result (with plans to re-visit later). Does that sound reasonable to you? If not, feel free to re-open. |
For #37
Summary by CodeRabbit
New Features
_EVENT_INDEX
column in various aggregation and predicate functions.Improvements
Tests
_LAST_EVENT_INDEX
fields.