-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runtime validation in setAttribute #348
Add runtime validation in setAttribute #348
Conversation
Add lists as an accepted data type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution and apologies for the late reply. Most of the team is out in an end of year break, but I'll look at this tomorrow 👍
No problem at all. I know it's a slow time of year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good in general, some changes requested 👍
Codecov Report
@@ Coverage Diff @@
## master #348 +/- ##
==========================================
+ Coverage 84.99% 85.13% +0.14%
==========================================
Files 38 38
Lines 1859 1911 +52
Branches 224 225 +1
==========================================
+ Hits 1580 1627 +47
- Misses 214 219 +5
Partials 65 65
Continue to review full report at Codecov.
|
Implemented your suggested changes. I am now just waiting on your feedback with regards to where the validation logic should live. I just made it a private method for now. However, considering that there aren't any other custom private methods in the file, I am assuming it belongs elsewhere. Thanks for your feedback so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, just a minor request to use is not
.
Add lists as an accepted data type
Looking good! Please request your Linux Foundation CLA permissions so that I can approve. 👍 |
I signed it |
@ocelotl All checks are now passing. Thanks for the guidance on all this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few code conciseness and naming comments that I think we should resolve before merging this in.
Thanks for working on this! The tests and code overall looks good.
@@ -208,8 +209,38 @@ def set_attribute(self, key: str, value: types.AttributeValue) -> None: | |||
if has_ended: | |||
logger.warning("Setting attribute on ended span.") | |||
return | |||
|
|||
if not isinstance(value, (int, float, bool, str, list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this int, float be consolidated into Number here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I use Number
rather than int, float
here, I will need to change it to:
if not isinstance(value, (bool, str, list, tuple)) and not issubclass(type(value), Number):
I think just using int, float
here is more concise, but I can understand the appeal of being consistent with what is done in _check_attribute_value_sequence
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance seems to work fine:
$ python -c "from numbers import Number; print(isinstance(1, (bool, Number)))"
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, please consider using collections.abc.Sequence
instead of list, tuple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if using Number
here, also use it in the AttributeValue
type for consistency (and note that this allows Decimal
and Fraction
numbers too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toumorokoshi I'm not sure why my tests were failing after originally making this change. Just tried this again and it works.
|
||
for element in sequence: | ||
|
||
if not isinstance(element, (bool, str, Number)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this tuple be moved into a constant in the module? This set should be shared with the tuple on line 213 (could modify line 213 to be that list + list, tuple:
VALID_ATTRIBUTE_TYPES + (list, tuple)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be a constant. But do you think it might be misleading to have the constant VALID_ATTRIBUTE_TYPES
without list, tuple
included, since they are actually valid?
Maybe we could define VALID_ATTRIBUTE_SEQUENCE_TYPES
and VALID_ATTRIBUTE_NON_SEQUENCE_TYPES
, then define VALID_ATTRIBUTE_TYPES
as the union of the two. What do you think?
if return_code is not None: | ||
logger.warning("%s in attribute value sequence", return_code) | ||
return | ||
|
||
self.attributes[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential edge case where AttributeValues that are lists can be mutated afterward, resulting in invalid types that exporters will run into.
I've added a followup ticket on that here: #352
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great catch. Would adding a copy of the list rather than the original list resolve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could store the copies of the sequence values in tuples. Now that you mention this, instead of accepting lists or tuples we should accept sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think that's a good solution. Would be good to make a separate PR for that.
…m:jakemalachowski/opentelemetry-python into ISSUE-347/attribue-value-type-enforcement
Co-Authored-By: Yusuke Tsutsumi <tsutsumi.yusuke@gmail.com>
…try-python into ISSUE-347/attribue-value-type-enforcement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good! One last reply that I think makes my suggestion possible, but let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, please split the change of the AttributeValue definition into a separate PR (I would "request changes" for this, but seeing as I have far too little time for OpenTelemetry-Python work recently, I don't want to hold up things).
Second, I am worried that the trade offs for this feature could be wrong. The benefit of this PR is that if the API is used wrongly, the user now gets a log message right at the point of the mistake instead of a (caught) exception later in the exporter, and the span can still be processed correctly (except for the wrong attribute).
However, for users which use the API correctly (maybe they even use mypy to check for these kinds of mistakes), we slow down setting attributes (a very common operation, probably the most-common even in the whole OpenTelemetry-Python API).
Change typing to prevent heterogeneous types in lists
@Oberon00 in response to your performance concerns: You bring up a valid point. Is it specifically the list value validation that you see as a performance concern or do you think the I'm okay with implementing this either way but it would be great to get some additional input on what's best here. |
hey hey @Oberon00 I'm also thinking about this. I'm slightly in favor of not doing this validation, and letting exporters handle this. Of course, there's no perfect solution here, but a matter of trade offs. In any case, this is a choice that maintainers will have to take ;) (And of course, the code can be later updated if/as needed). |
@carlosalberto @Oberon00 to address performance concerns: I do agree that adding this check will lead to slightly more expensive span creation, at the gain of giving helpful feedback to the user if span attributes are not valid, and removing the concern of type correctness from the exporters. It's very well possible that, in the future, we may need to remove of pare down this code. But I think this is a very easy check to remove after the fact, but a hard one to add. And real life use cases will probably better inform whether this will be a performance bottleneck. from a rudimentary timeit benchmark, checking isinstance is roughly a 188ns operation:
Let me know if you're strongly against it, and we can discuss further. I'll leave this PR up for another day to allow more discussion, but I'm currently inclined to merge (also since @jakemalachowski put the effort in to author it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only very slightly against the checks. 😃
And the argument that we can remove it later is actually a good one. For example if we decide that we want to allow exporters to support arbitrary value types, we can still do that later by removing the checks if they are in place. If we added the checks later, on the other hand, that would be a breaking change.
@@ -208,8 +209,36 @@ def set_attribute(self, key: str, value: types.AttributeValue) -> None: | |||
if has_ended: | |||
logger.warning("Setting attribute on ended span.") | |||
return | |||
|
|||
if not isinstance(value, (bool, str, Number, Sequence)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this check is more lenient than the AttributeValue
definition (Number vs (int, float)). Also note that it seems checking for isinstance(x, ABC-type) is more expensive:
In [5]: %timeit isinstance(3, (float, Number))
319 ns ± 6.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit isinstance(3, (float, int))
110 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really interesting. Good optimization to keep in mind if this becomes a bottleneck in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a technical perspective, I think this PR is fine now (apart from nit https://github.com/open-telemetry/opentelemetry-python/pull/348/files#r367960226). 👌
Great, thanks! Congrats on your first merged PR! |
4fca8c9 ("Add runtime validation in setAttribute (open-telemetry#348)") added a robust attribute validation using numbers.Number to validate numeric types. Although the approach is correct, it presents some complications because Complex, Fraction and Decimal are accepted because they are Numbers. This presents a problem to the exporters because they will have to consider all these different cases when converting attributes to the underlying exporter representation. This commit simplifies the logic by accepting only int and float as numeric values.
4fca8c9 ("Add runtime validation in setAttribute (open-telemetry#348)") added a robust attribute validation using numbers.Number to validate numeric types. Although the approach is correct, it presents some complications because Complex, Fraction and Decimal are accepted because they are Numbers. This presents a problem to the exporters because they will have to consider all these different cases when converting attributes to the underlying exporter representation. This commit simplifies the logic by accepting only int and float as numeric values.
4fca8c9 ("Add runtime validation in setAttribute (#348)") added a robust attribute validation using numbers.Number to validate numeric types. Although the approach is correct, it presents some complications because Complex, Fraction and Decimal are accepted because they are Numbers. This presents a problem to the exporters because they will have to consider all these different cases when converting attributes to the underlying exporter representation. This commit simplifies the logic by accepting only int and float as numeric values.
* chore: add plugin developer guide * chore: add a link to example plugin
Fixes Issue #347
int
,float
,str
,bool
orlist
I wasn't sure what the best way to enforce valid attribute values was. I decided to just drop invalid values since that's what was being done in the PR linked in the issue. But I was wondering if it made sense to throw an exception or maybe attempt to coerce the value into a valid type instead.