Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator tp.where() #265

Merged
merged 15 commits into from
Sep 28, 2023
Merged

Operator tp.where() #265

merged 15 commits into from
Sep 28, 2023

Conversation

DonBraulio
Copy link
Collaborator

@DonBraulio DonBraulio commented Sep 20, 2023

Also, changes in normalize_features:

  • Fixed a bug in which array features of type np.str_ (unicode strings) were not casted to np.bytes_
  • Added a message if conversion fails due to invalid unicode characters (also for np.object_ ).
  • np.datetime gets converted to float64 instead of int64, to keep it consistent with timestamps handling.

@ianspektor
Copy link
Collaborator

Changelog 🔔

Copy link
Collaborator

@achoum achoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome

### Improvements

- Print `EventSet` timestamps as datetimes instead of float.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of those improvements are in fact "features".
A feature add a new functionality. An improvement does not: For example, increasing the speed of something is an improvement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that this isn't adding a new functionality, rather improving an existing one (user was already able to print an eventset, this just makes it better). The sampling support on the other hand I would consider a feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are minor changes that don't enable new use cases (the arg. for cumsum was already available through moving_sum, since it's an alias).
I'd keep the "Features" section for "new things you can do with the library". Does it make sense?

temporian/core/data/dtype.py Outdated Show resolved Hide resolved
temporian/core/event_set_ops.py Outdated Show resolved Hide resolved
pb.OperatorDef.Attribute(
key="on_true",
type=pb.OperatorDef.Attribute.Type.ANY,
is_optional=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both on_true and on_false required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't think it's obvious what value to use by default if some of them is not provided.

temporian/core/operators/where.py Outdated Show resolved Hide resolved
temporian/implementation/numpy/data/dtype_normalization.py Outdated Show resolved Hide resolved
temporian/implementation/numpy/data/dtype_normalization.py Outdated Show resolved Hide resolved
### Improvements

- Print `EventSet` timestamps as datetimes instead of float.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that this isn't adding a new functionality, rather improving an existing one (user was already able to print an eventset, this just makes it better). The sampling support on the other hand I would consider a feature.

temporian/core/event_set_ops.py Outdated Show resolved Hide resolved

def where(
self: EventSetOrNode,
on_true: Union[EventSetOrNode, Any],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create an alias for Union[int, float, str, bool]? we have T_SCALAR defined in event_set_ops which is only (int, float), but int that same file in __ne__'s def we join it with (bool, str) too. Maybe this definition could improve the signature's legibility

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. T_SCALAR is actually a tuple, we're not using it in typehints.

This would make sense if we define types in typing.py like PythonValue and PythonScalarValue and use it consistently in the event_set_ops types.
But I think we should discuss it previously, I would not add it in this PR.

temporian/core/operators/where.py Outdated Show resolved Hide resolved
@@ -68,6 +68,7 @@ def __init__(
"`window_length` must have exactly one float64 feature."
)
self.add_input("window_length", window_length)
self._window_length = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch :) and great example of the problems associated with having tons of tests on the implementation and not so tons of them on the core op

Copy link
Collaborator

@achoum achoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@DonBraulio DonBraulio merged commit cb44592 into main Sep 28, 2023
16 checks passed
@DonBraulio DonBraulio deleted the op-where branch September 28, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants