-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-49876: Create spans for observer.py itself #5870
CP-49876: Create spans for observer.py itself #5870
Conversation
python3/packages/observer.py
Outdated
@@ -20,6 +20,12 @@ | |||
passed script without any instrumentation. | |||
""" | |||
|
|||
import time | |||
def to_otel_timestamp(ts): | |||
int(ts * 1000000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to return...
here I think, see comment from reviewdog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep thanks, I've obviously gotten used to Ocaml 😅
34a7956
to
fc38882
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed, looks good.
I've one review comment:
time.time()
returns the time since the start of the epoch, which may be slewed by chronyd
and it may jump, and even go backwards when larger or stronger time adjustments are made.
The slewing by chronyd
should be far below the deviation from noise and time jumps should be very infrequent (should only occur by administrative action).
As I expect the captures of durations here to be short, I guess that this such a time jump will hit the observer only very rarely.
You may look at
open-telemetry/opentelemetry-ruby#709
and the corresponding merge of
https://github.com/open-telemetry/opentelemetry-ruby/pull/782/files
An excerpt from the PR's description:
WARNING: This is a bunch of text from a rubber-ducking exercise. It may provide context or may just be confusing.
Two scenarios: we have a local parent or we don't have a local parent ... (either because we're the root or because our parent is remote)... if we don't have a local parent, we need to record both the current "realtime" clock and the current monotonic clock. The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as start_timestamp + (current_monotonic - start_monotonic).
This means, as far as I can understand: In addition to CLOCK_REALTIME
, they also get CLOCK_MONOTONIC
(monotonically incrementing time, not influenced by time adjustments) at the start and the end, and they adjust the duration gathered using CLOCK_REALTIME
using the elapsed monotonic time difference from the difference of the two CLOCK_MONOTONIC
values.
Because this is quite complicated, and I think it should rarely affect the observer, I think the observer should not need to use CLOCK_MONOTONIC like the https://github.com/open-telemetry/opentelemetry-ruby implemented it.
I am assuming that the Python SDK for OpenTelemetry
may also use CLOCK_MONOTONIC
, and if there are time adjustments, the spans gathered could diverge, but again it should be very rare (and I don't have the time to check that now).
PS: Because the C library calls should normally go into clock_gettime() calls that are implemented using the fast user context implementation in the Linux VDSO
(Virtual Dynamic Shared Object) of the Linux kernel, all these clock_gettime()
calls should be very fast and do not actually cause context switches into the kernel to get those time stamps:
https://elixir.bootlin.com/linux/v6.10/source/arch/x86/entry/vdso/vclock_gettime.c
@bernhardkaindl That's very interesting thank you. Python3 does have time.monotonic() so it would be possible to record the initial timestamp then work out monotonic times for the other timestamps and add them to the initial timestamp (see below); but as you say it should be rare so perhaps it's not worth it, what do you think? -observer_ts_start = to_otel_timestamp(time.time())
+observer_ts_start = time.time()
+observer_mono_start = time.monotonic()
import configparser
import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
# On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
simplefilter(action="ignore", category=DeprecationWarning)
- import_ts_start = to_otel_timestamp(time.time())
+ import_mono_start = time.monotonic()
+ import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)
import wrapt # type: ignore[import-untyped]
from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
TraceContextTextMapPropagator,
)
- import_ts_end = to_otel_timestamp(time.time())
+ import_mono_end = time.time()
+ import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)
except ImportError as err:
syslog.error("missing opentelemetry dependencies: %s", err)
return _span_noop, _patch_module_noop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snwoods: At least how it's written in your comment, it would be too hard to read or too complicated to understand for me. So, I'm not asking to follow my idea further. But, maybe just to clear up any misunderstandings ;-), here is what I understand and don't understand:
Context: The quote from the ruby otel repo:
The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as
start_timestamp + (current_monotonic - start_monotonic).
At the start of the script:
> -observer_ts_start = to_otel_timestamp(time.time())
> +observer_ts_start = time.time()
> +observer_mono_start = time.monotonic()
Fine, but since we need observer_ts_start
to be int()
later, maybe keep it as int here,
and see next review after this review.
- import_ts_start = to_otel_timestamp(time.time())
+ import_mono_start = time.monotonic()
+ import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)
Likewise, but just as a matter of logical sense, I like the expression (even if it's mathematically the same) that the otel-ruby authors used in their description:
+ import_ts_start = observer_ts_start + (time.monotonic() - observer_mono_start).
So, the import start is set to the startup time + the monotonic elapsed time.
Maybe put that in a function for clarity:
def monotonic_elapsed_time()
"""Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
global observer_mono_start
return time.monotonic() - observer_mono_start
And then it's obvious to me:
+ import_ts_start = observer_ts_start + monotonic_elapsed_time()
- import_ts_end = to_otel_timestamp(time.time())
+ import_mono_end = time.time()
time.time()
isn't monotonic, so this confuses me.
+ import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)
Likewise:
+ import_ts_end = observer_ts_start + monotonic_elapsed_time()
My primary concern would be to avoid mistakes by keeping things simple to read and understand.
Secondly, I think the cognitive load of reading the code should be kept to the minimum by not repeating yourself using best practises like DRY (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) and instead (where it helps to read the code) "label" details by using a function call that makes the formula dead simple to read like prose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this structure:
- Keep
observer_ts_start = to_otel_timestamp(time.time())
, so you can pass it to otel directly - Use a function that returns the otel time, incremented by the monotonic leapsed time.
Diff:
import time
def to_otel_timestamp(ts):
return int(ts * 1000000000)
observer_ts_start = to_otel_timestamp(time.time())
+ observer_mono_start = time.monotonic()
+
+ def monotonic_elapsed_time()
+ """Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
+ global observer_mono_start
+ return time.monotonic() - observer_mono_start
+
+ def current_time()
+ """Return the current time as an otel timestamp, relative to the start time"""
+ global observer_ts_start
+ return observer_ts_start + to_otel_timestamp(monotonic_elapsed_time())
import configparser
import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
# On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
simplefilter(action="ignore", category=DeprecationWarning)
- import_ts_start = to_otel_timestamp(time.time())
+ import_ts_start = current_time()
import wrapt # type: ignore[import-untyped]
from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
TraceContextTextMapPropagator,
)
- import_ts_end = to_otel_timestamp(time.time())
+ import_ts_end = current_time()
except ImportError as err:
syslog.error("missing opentelemetry dependencies: %s", err)
return _span_noop, _patch_module_noop
Sorry yes that was a mistake. I think your changes are good, I'll take them on board! |
Signed-off-by: Steven Woods <steven.woods@citrix.com>
fc38882
to
08b9f14
Compare
^ Updated to take @bernhardkaindl 's comments on board |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks easy and good to me! (I hope it works like the initial PR submit)
08b9f14
to
120258a
Compare
Creates two spans for observer.py, one for the entire script and one for the opentelemtry import statements. This fixes the current discrepancy between the sm script span and the outer xapi span. Signed-off-by: Steven Woods <steven.woods@citrix.com>
120258a
to
29344a7
Compare
|
Creates two spans for observer.py, one for the entire script and one for the opentelemetry import statements. This fixes the current discrepancy between the sm script span and the outer xapi span.
In an example, there's a time lag of 70ms between the Forkhelpers.safe_close_and_exec and the start of observer.py and another 90ms between the end of the sm script and the end of Forkhelpers.safe_close_and_exec . Taking this with the 0.5s of observer.py and the 1.11s of the sm script adds up perfectly to the duration of the Forkhelpers span: 1.77s. Most of the observer.py time is taken up by imports, I'm doing some further tests to see if this is improved by e.g. python 3.11