CP-49876: Create spans for observer.py itself #5870

snwoods · 2024-07-22T13:29:17Z

Creates two spans for observer.py, one for the entire script and one for the opentelemetry import statements. This fixes the current discrepancy between the sm script span and the outer xapi span.

In an example, there's a time lag of 70ms between the Forkhelpers.safe_close_and_exec and the start of observer.py and another 90ms between the end of the sm script and the end of Forkhelpers.safe_close_and_exec . Taking this with the 0.5s of observer.py and the 1.11s of the sm script adds up perfectly to the duration of the Forkhelpers span: 1.77s. Most of the observer.py time is taken up by imports, I'm doing some further tests to see if this is improved by e.g. python 3.11

edwintorok · 2024-07-22T13:46:13Z

python3/packages/observer.py

@@ -20,6 +20,12 @@
 passed script without any instrumentation.
 """

+import time
+def to_otel_timestamp(ts):
+    int(ts * 1000000000)


you need to return... here I think, see comment from reviewdog.

Yep thanks, I've obviously gotten used to Ocaml 😅

bernhardkaindl

Reviewed, looks good.

I've one review comment:

time.time() returns the time since the start of the epoch, which may be slewed by chronyd and it may jump, and even go backwards when larger or stronger time adjustments are made.

The slewing by chronyd should be far below the deviation from noise and time jumps should be very infrequent (should only occur by administrative action).

As I expect the captures of durations here to be short, I guess that this such a time jump will hit the observer only very rarely.

You may look at
open-telemetry/opentelemetry-ruby#709
and the corresponding merge of
https://github.com/open-telemetry/opentelemetry-ruby/pull/782/files

An excerpt from the PR's description:

WARNING: This is a bunch of text from a rubber-ducking exercise. It may provide context or may just be confusing.

Two scenarios: we have a local parent or we don't have a local parent ... (either because we're the root or because our parent is remote)... if we don't have a local parent, we need to record both the current "realtime" clock and the current monotonic clock. The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as start_timestamp + (current_monotonic - start_monotonic).

This means, as far as I can understand: In addition to CLOCK_REALTIME, they also get CLOCK_MONOTONIC (monotonically incrementing time, not influenced by time adjustments) at the start and the end, and they adjust the duration gathered using CLOCK_REALTIME using the elapsed monotonic time difference from the difference of the two CLOCK_MONOTONIC values.

Because this is quite complicated, and I think it should rarely affect the observer, I think the observer should not need to use CLOCK_MONOTONIC like the https://github.com/open-telemetry/opentelemetry-ruby implemented it.

I am assuming that the Python SDK for OpenTelemetry may also use CLOCK_MONOTONIC, and if there are time adjustments, the spans gathered could diverge, but again it should be very rare (and I don't have the time to check that now).

PS: Because the C library calls should normally go into clock_gettime() calls that are implemented using the fast user context implementation in the Linux VDSO (Virtual Dynamic Shared Object) of the Linux kernel, all these clock_gettime() calls should be very fast and do not actually cause context switches into the kernel to get those time stamps:
https://elixir.bootlin.com/linux/v6.10/source/arch/x86/entry/vdso/vclock_gettime.c

snwoods · 2024-07-23T10:36:29Z

Because this is quite complicated, and I think it should rarely affect the observer, I think the observer should not need to use CLOCK_MONOTONIC like the https://github.com/open-telemetry/opentelemetry-ruby implemented it.

@bernhardkaindl That's very interesting thank you. Python3 does have time.monotonic() so it would be possible to record the initial timestamp then work out monotonic times for the other timestamps and add them to the initial timestamp (see below); but as you say it should be rare so perhaps it's not worth it, what do you think?

-observer_ts_start = to_otel_timestamp(time.time())
+observer_ts_start = time.time()
+observer_mono_start = time.monotonic()

 import configparser
 import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
         # On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
         simplefilter(action="ignore", category=DeprecationWarning)

-        import_ts_start = to_otel_timestamp(time.time())
+        import_mono_start = time.monotonic()
+        import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)

         import wrapt # type: ignore[import-untyped]
         from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
             TraceContextTextMapPropagator,
         )

-        import_ts_end = to_otel_timestamp(time.time())
+        import_mono_end = time.time()
+        import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)
     except ImportError as err:
         syslog.error("missing opentelemetry dependencies: %s", err)
         return _span_noop, _patch_module_noop

bernhardkaindl

@snwoods: At least how it's written in your comment, it would be too hard to read or too complicated to understand for me. So, I'm not asking to follow my idea further. But, maybe just to clear up any misunderstandings ;-), here is what I understand and don't understand:

Context: The quote from the ruby otel repo:

The "realtime" clock becomes the start_timestamp subsequent timestamps on this Span (including for Events) are computed as
start_timestamp + (current_monotonic - start_monotonic).

At the start of the script:

> -observer_ts_start = to_otel_timestamp(time.time())
> +observer_ts_start = time.time()
> +observer_mono_start = time.monotonic()

Fine, but since we need observer_ts_start to be int() later, maybe keep it as int here,
and see next review after this review.

-        import_ts_start = to_otel_timestamp(time.time())
+        import_mono_start = time.monotonic()
+        import_ts_start = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_start)

Likewise, but just as a matter of logical sense, I like the expression (even if it's mathematically the same) that the otel-ruby authors used in their description:

+        import_ts_start = observer_ts_start + (time.monotonic() - observer_mono_start).

So, the import start is set to the startup time + the monotonic elapsed time.
Maybe put that in a function for clarity:

def monotonic_elapsed_time()
   """Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
   global observer_mono_start
   return time.monotonic() - observer_mono_start

And then it's obvious to me:

+        import_ts_start = observer_ts_start + monotonic_elapsed_time()

-        import_ts_end = to_otel_timestamp(time.time())
+        import_mono_end = time.time()

time.time() isn't monotonic, so this confuses me.

+        import_ts_end = to_otel_timestamp(observer_ts_start - observer_mono_start + import_mono_end)

Likewise:

+        import_ts_end = observer_ts_start + monotonic_elapsed_time()

My primary concern would be to avoid mistakes by keeping things simple to read and understand.

Secondly, I think the cognitive load of reading the code should be kept to the minimum by not repeating yourself using best practises like DRY (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) and instead (where it helps to read the code) "label" details by using a function call that makes the formula dead simple to read like prose.

bernhardkaindl

What about this structure:

Keep observer_ts_start = to_otel_timestamp(time.time()), so you can pass it to otel directly
Use a function that returns the otel time, incremented by the monotonic leapsed time.

Diff:

  import time
  def to_otel_timestamp(ts):
      return int(ts * 1000000000)
    
  observer_ts_start = to_otel_timestamp(time.time())
+ observer_mono_start = time.monotonic()
+
+ def monotonic_elapsed_time()
+     """Return the elapsed time using CLOCK_MONOTONIC(unaffected by time adjustments)"""
+     global observer_mono_start
+     return time.monotonic() - observer_mono_start
+
+ def current_time()
+     """Return the current time as an otel timestamp, relative to the start time"""
+     global observer_ts_start 
+     return observer_ts_start + to_otel_timestamp(monotonic_elapsed_time())

 import configparser
 import functools
@@ -124,7 +125,8 @@ def _init_tracing(configs: List[str], config_dir: str):
         # On 3.10-3.12, the import of wrapt might trigger warnings, filter them:
         simplefilter(action="ignore", category=DeprecationWarning)

-        import_ts_start = to_otel_timestamp(time.time())
+        import_ts_start = current_time()

         import wrapt # type: ignore[import-untyped]
         from opentelemetry import context, trace
@@ -137,7 +139,8 @@ def _init_tracing(configs: List[str], config_dir: str):
             TraceContextTextMapPropagator,
         )

-        import_ts_end = to_otel_timestamp(time.time())
+        import_ts_end = current_time()
     except ImportError as err:
         syslog.error("missing opentelemetry dependencies: %s", err)
         return _span_noop, _patch_module_noop

snwoods · 2024-07-25T09:12:41Z

time.time() isn't monotonic, so this confuses me.

Sorry yes that was a mistake. I think your changes are good, I'll take them on board!

Signed-off-by: Steven Woods <steven.woods@citrix.com>

snwoods · 2024-07-29T10:55:14Z

^ Updated to take @bernhardkaindl 's comments on board

bernhardkaindl

Looks easy and good to me! (I hope it works like the initial PR submit)

Creates two spans for observer.py, one for the entire script and one for the opentelemtry import statements. This fixes the current discrepancy between the sm script span and the outer xapi span. Signed-off-by: Steven Woods <steven.woods@citrix.com>

snwoods · 2024-07-29T17:10:05Z

@bernhardkaindl yep, all working still!

edwintorok reviewed Jul 22, 2024

View reviewed changes

snwoods force-pushed the private/stevenwo/CP-49876 branch from 34a7956 to fc38882 Compare July 22, 2024 13:48

bernhardkaindl approved these changes Jul 23, 2024

View reviewed changes

bernhardkaindl approved these changes Jul 24, 2024

View reviewed changes

bernhardkaindl reviewed Jul 24, 2024

View reviewed changes

Catch system exit in observer.py to close gracefully

0623d8d

Signed-off-by: Steven Woods <steven.woods@citrix.com>

snwoods force-pushed the private/stevenwo/CP-49876 branch from fc38882 to 08b9f14 Compare July 29, 2024 10:54

bernhardkaindl approved these changes Jul 29, 2024

View reviewed changes

bernhardkaindl requested a review from edwintorok July 29, 2024 11:50

snwoods force-pushed the private/stevenwo/CP-49876 branch from 08b9f14 to 120258a Compare July 29, 2024 17:04

snwoods force-pushed the private/stevenwo/CP-49876 branch from 120258a to 29344a7 Compare July 29, 2024 17:07

mg12 approved these changes Aug 6, 2024

View reviewed changes

snwoods merged commit bc76f7c into xapi-project:master Aug 6, 2024
15 checks passed

snwoods deleted the private/stevenwo/CP-49876 branch August 6, 2024 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP-49876: Create spans for observer.py itself #5870

CP-49876: Create spans for observer.py itself #5870

snwoods commented Jul 22, 2024 •

edited

Loading

edwintorok Jul 22, 2024

snwoods Jul 22, 2024

bernhardkaindl left a comment •

edited

Loading

snwoods commented Jul 23, 2024 •

edited by bernhardkaindl

Loading

bernhardkaindl left a comment •

edited

Loading

bernhardkaindl left a comment

snwoods commented Jul 25, 2024 •

edited

Loading

snwoods commented Jul 29, 2024

bernhardkaindl left a comment

snwoods commented Jul 29, 2024

CP-49876: Create spans for observer.py itself #5870

CP-49876: Create spans for observer.py itself #5870

Conversation

snwoods commented Jul 22, 2024 • edited Loading

edwintorok Jul 22, 2024

Choose a reason for hiding this comment

snwoods Jul 22, 2024

Choose a reason for hiding this comment

bernhardkaindl left a comment • edited Loading

Choose a reason for hiding this comment

snwoods commented Jul 23, 2024 • edited by bernhardkaindl Loading

bernhardkaindl left a comment • edited Loading

Choose a reason for hiding this comment

bernhardkaindl left a comment

Choose a reason for hiding this comment

snwoods commented Jul 25, 2024 • edited Loading

snwoods commented Jul 29, 2024

bernhardkaindl left a comment

Choose a reason for hiding this comment

snwoods commented Jul 29, 2024

snwoods commented Jul 22, 2024 •

edited

Loading

bernhardkaindl left a comment •

edited

Loading

snwoods commented Jul 23, 2024 •

edited by bernhardkaindl

Loading

bernhardkaindl left a comment •

edited

Loading

snwoods commented Jul 25, 2024 •

edited

Loading