Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-49876: Create spans for observer.py itself #5870

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

snwoods
Copy link
Contributor

@snwoods snwoods commented Jul 22, 2024

Creates two spans for observer.py, one for the entire script and one for the opentelemetry import statements. This fixes the current discrepancy between the sm script span and the outer xapi span.

In an example, there's a time lag of 70ms between the Forkhelpers.safe_close_and_exec and the start of observer.py and another 90ms between the end of the sm script and the end of Forkhelpers.safe_close_and_exec . Taking this with the 0.5s of observer.py and the 1.11s of the sm script adds up perfectly to the duration of the Forkhelpers span: 1.77s. Most of the observer.py time is taken up by imports, I'm doing some further tests to see if this is improved by e.g. python 3.11

@@ -20,6 +20,12 @@
passed script without any instrumentation.
"""

import time
def to_otel_timestamp(ts):
int(ts * 1000000000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to return... here I think, see comment from reviewdog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep thanks, I've obviously gotten used to Ocaml 😅

Copy link
Collaborator

@bernhardkaindl bernhardkaindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, looks good.

I've one review comment:

time.time() returns the time since the start of the epoch, which may be slewed by chronyd and it may jump, and even go backwards when larger or stronger time adjustments are made.

The slewing by chronyd should be far below the deviation from noise and time jumps should be very infrequent (should only occur by administrative action).

As I expect the captures of durations here to be short, I guess that this such a time jump will hit the observer only very rarely.

You may look at
open-telemetry/opentelemetry-ruby#709
and the corresponding merge of
https://github.com/open-telemetry/opentelemetry-ruby/pull/782/files

An excerpt from the PR's description:

WARNING: This is a bunch of text from a rubber-ducking exercise. It may provide context or may just be confusing.

Two scenarios: we have a local parent or we don't have a local parent ... (either because we're the root or because our parent is remote)... if we don't have a local parent, we need to record both the current "realtime" clock and the current monotonic clock. The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as start_timestamp + (current_monotonic - start_monotonic).

This means, as far as I can understand: In addition to CLOCK_REALTIME, they also get CLOCK_MONOTONIC (monotonically incrementing time, not influenced by time adjustments) at the start and the end, and they adjust the duration gathered using CLOCK_REALTIME using the elapsed monotonic time difference from the difference of the two CLOCK_MONOTONIC values.

Because this is quite complicated, and I think it should rarely affect the observer, I think the observer should not need to use CLOCK_MONOTONIC like the https://github.com/open-telemetry/opentelemetry-ruby implemented it.

I am assuming that the Python SDK for OpenTelemetry may also use CLOCK_MONOTONIC, and if there are time adjustments, the spans gathered could diverge, but again it should be very rare (and I don't have the time to check that now).

PS: Because the C library calls should normally go into clock_gettime() calls that are implemented using the fast user context implementation in the Linux VDSO (Virtual Dynamic Shared Object) of the Linux kernel, all these clock_gettime() calls should be very fast and do not actually cause context switches into the kernel to get those time stamps:
https://elixir.bootlin.com/linux/v6.10/source/arch/x86/entry/vdso/vclock_gettime.c

@snwoods
Copy link
Contributor Author

snwoods commented Jul 23, 2024

Because this is quite complicated, and I think it should rarely affect the observer, I think the observer should not need to use CLOCK_MONOTONIC like the https://github.com/open-telemetry/opentelemetry-ruby implemented it.

@bernhardkaindl That's very interesting thank you. Python3 does have time.monotonic() so it would be possible to record the initial timestamp then work out monotonic times for the other timestamps and add them to the initial timestamp (see below); but as you say it should be rare so perhaps it's not worth it, what do you think?

-observer_ts_start = to_otel_timestamp(time.time())
+observer_ts_start = time.time()
+observer_mono_start = time.monotonic()

 import configparser
 import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
         # On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
         simplefilter(action="ignore", category=DeprecationWarning)

-        import_ts_start = to_otel_timestamp(time.time())
+        import_mono_start = time.monotonic()
+        import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)

         import wrapt # type: ignore[import-untyped]
         from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
             TraceContextTextMapPropagator,
         )

-        import_ts_end = to_otel_timestamp(time.time())
+        import_mono_end = time.time()
+        import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)
     except ImportError as err:
         syslog.error("missing opentelemetry dependencies: %s", err)
         return _span_noop, _patch_module_noop

Copy link
Collaborator

@bernhardkaindl bernhardkaindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snwoods: At least how it's written in your comment, it would be too hard to read or too complicated to understand for me. So, I'm not asking to follow my idea further. But, maybe just to clear up any misunderstandings ;-), here is what I understand and don't understand:

Context: The quote from the ruby otel repo:

The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as
start_timestamp + (current_monotonic - start_monotonic).

At the start of the script:

> -observer_ts_start = to_otel_timestamp(time.time())
> +observer_ts_start = time.time()
> +observer_mono_start = time.monotonic()

Fine, but since we need observer_ts_start to be int() later, maybe keep it as int here,
and see next review after this review.

-        import_ts_start = to_otel_timestamp(time.time())
+        import_mono_start = time.monotonic()
+        import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)

Likewise, but just as a matter of logical sense, I like the expression (even if it's mathematically the same) that the otel-ruby authors used in their description:

+        import_ts_start = observer_ts_start + (time.monotonic() - observer_mono_start).

So, the import start is set to the startup time + the monotonic elapsed time.
Maybe put that in a function for clarity:

def monotonic_elapsed_time()
   """Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
   global observer_mono_start
   return time.monotonic() - observer_mono_start

And then it's obvious to me:

+        import_ts_start = observer_ts_start + monotonic_elapsed_time()
-        import_ts_end = to_otel_timestamp(time.time())
+        import_mono_end = time.time()

time.time() isn't monotonic, so this confuses me.

+        import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)

Likewise:

+        import_ts_end = observer_ts_start + monotonic_elapsed_time()

My primary concern would be to avoid mistakes by keeping things simple to read and understand.

Secondly, I think the cognitive load of reading the code should be kept to the minimum by not repeating yourself using best practises like DRY (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) and instead (where it helps to read the code) "label" details by using a function call that makes the formula dead simple to read like prose.

Copy link
Collaborator

@bernhardkaindl bernhardkaindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this structure:

  • Keep observer_ts_start = to_otel_timestamp(time.time()), so you can pass it to otel directly
  • Use a function that returns the otel time, incremented by the monotonic leapsed time.

Diff:

  import time
  def to_otel_timestamp(ts):
      return int(ts * 1000000000)
    
  observer_ts_start = to_otel_timestamp(time.time())
+ observer_mono_start = time.monotonic()
+
+ def monotonic_elapsed_time()
+     """Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
+     global observer_mono_start
+     return time.monotonic() - observer_mono_start
+
+ def current_time()
+     """Return the current time as an otel timestamp, relative to the start time"""
+     global observer_ts_start 
+     return observer_ts_start + to_otel_timestamp(monotonic_elapsed_time())

 import configparser
 import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
         # On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
         simplefilter(action="ignore", category=DeprecationWarning)

-        import_ts_start = to_otel_timestamp(time.time())
+        import_ts_start = current_time()

         import wrapt # type: ignore[import-untyped]
         from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
             TraceContextTextMapPropagator,
         )

-        import_ts_end = to_otel_timestamp(time.time())
+        import_ts_end = current_time()
     except ImportError as err:
         syslog.error("missing opentelemetry dependencies: %s", err)
         return _span_noop, _patch_module_noop

@snwoods
Copy link
Contributor Author

snwoods commented Jul 25, 2024

time.time() isn't monotonic, so this confuses me.

Sorry yes that was a mistake. I think your changes are good, I'll take them on board!

Signed-off-by: Steven Woods <steven.woods@citrix.com>
@snwoods
Copy link
Contributor Author

snwoods commented Jul 29, 2024

^ Updated to take @bernhardkaindl 's comments on board

Copy link
Collaborator

@bernhardkaindl bernhardkaindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks easy and good to me! (I hope it works like the initial PR submit)

Creates two spans for observer.py, one for the entire script and one for
the opentelemtry import statements. This fixes the current discrepancy
between the sm script span and the outer xapi span.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
@snwoods
Copy link
Contributor Author

snwoods commented Jul 29, 2024

image
@bernhardkaindl yep, all working still!

@snwoods snwoods merged commit bc76f7c into xapi-project:master Aug 6, 2024
15 checks passed
@snwoods snwoods deleted the private/stevenwo/CP-49876 branch August 6, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants