⚗️ [RUMF-1181] collect telemetry events #1351

bcaudan · 2022-02-21T16:57:32Z

Motivation

Experiment with dual shipping telemetry events in existing and new system

Next steps:

Add telemetry sample rate configuration
Remove old system when new system is OK
Rename internal monitoring to telemetry

Changes

core: update internalMonitoring to handle new telemetry events

Behind telemetry feature flag, add telemetry events:

for RUM: in the batch of RUM events
for logs: in a new "RUM" batch

Testing

Local
Staging
Unit
End to end

I have gone over the contributing documentation.

codecov-commenter · 2022-02-25T10:01:23Z

Codecov Report

Merging #1351 (3c7f20c) into main (ae20574) will decrease coverage by 0.22%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##             main    #1351      +/-   ##
==========================================
- Coverage   91.12%   90.89%   -0.23%     
==========================================
  Files         104      104              
  Lines        4269     4304      +35     
  Branches      950      965      +15     
==========================================
+ Hits         3890     3912      +22     
- Misses        379      392      +13

Impacted Files	Coverage Δ
packages/rum-core/src/boot/startRum.ts	`35.13% <0.00%> (-6.81%)`	⬇️
packages/logs/src/boot/startLogs.ts	`85.24% <55.55%> (-5.14%)`	⬇️
.../domain/internalMonitoring/startMonitoringBatch.ts	`81.81% <80.00%> (ø)`
...rc/domain/internalMonitoring/internalMonitoring.ts	`95.23% <88.88%> (-3.25%)`	⬇️
packages/rum-core/src/domain/lifeCycle.ts	`100.00% <100.00%> (ø)`
packages/rum-core/src/transport/startRumBatch.ts	`77.14% <100.00%> (+0.67%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae20574...3c7f20c. Read the comment docs.

BenoitZugmeyer · 2022-02-23T15:28:27Z

packages/rum-core/src/boot/startRum.ts

+  internalMonitoring.telemetryEventObservable.subscribe((event) => {
+    lifeCycle.notify(LifeCycleEventType.TELEMETRY_EVENT_COLLECTED, event)
+  })


💭 thought: ‏Maybe we don't need to have a proper LifeCycle event for this, we could telemetryEventObservable.subscribe direcly in RumBatch

I though about it but:

it was an easier change since rum batch is in rum event collection so it would add a new dependency to a couple of methods signatures and their corresponding tests

it is more consistent with how we deal with RUM_EVENT_COLLECTED

any reason to push the other way?

As I see it, adding complexity to the LifeCycle adds complexity to the whole RUM SDK since it is a central piece used everywhere. I understand that it's a good way to decouple some modules to ease unit tests.

But here we have a chance to not use the LifeCycle and keep the complexity local instead. It would be one less indirection, making things easier to follow. Also it would be more consistent with Logs.

As I see it, it is the tradeoff with having this "event bus"-like architecture, you have this dependency used almost everywhere.
If you find it bothering, we should probably discuss removing lifeCycle all together at some point, it would avoid this kind of debates.

I'll see where we are after #1351 (comment)

packages/core/src/domain/internalMonitoring/internalMonitoring.spec.ts

packages/core/src/domain/internalMonitoring/internalMonitoring.ts

BenoitZugmeyer · 2022-03-01T10:43:31Z

packages/core/src/domain/internalMonitoring/internalMonitoring.ts

@@ -31,18 +39,34 @@ const monitoringConfiguration: {
  sentMessageCount: number
 } = { maxMessagesPerPage: 0, sentMessageCount: 0 }

-let onInternalMonitoringMessageCollected: ((message: MonitoringMessage) => void) | undefined
+let monitoringMessageObservable: Observable<MonitoringMessage> | undefined


💬 suggestion: ‏FMU building the observable here would be against our treeshakability rules, but maybe you could use a getter function like this to avoid handling the "undefined" case:

function getMonitoringMessageObservable(): Observable<MonitoringMessage> { if (!monitoringMessageObservable) { monitoringMessageObservable = new Observable() } return monitoringMessageObservable }

FMU you are proposing to do:

function addToMonitoring(message: MonitoringMessage) { if ( monitoringConfiguration.sentMessageCount < monitoringConfiguration.maxMessagesPerPage ) { monitoringConfiguration.sentMessageCount += 1 getMonitoringMessageObservable().notify(message) } }

I found it clearer to explicitly create the observable on the start methods and since we have a reset method, I would find it surprising that after a reset, adding a message recreate the observable.
wdyt?

Well, this is not an unusual pattern: we use it for xhr and fetch observables, and even more similarly in replayStats where when we add a stat, the map gets created.

the design seems different to me:

for xhr/fetch, we expose an init function that get or create the singleton

for replayStats, we only expose methods to interact with the replay stats, none to explicitly initialize it

here we expose a start method, so it feels weird to me to create it outside of it.

However, for replay stats the undefined case is also handled in getReplayStats with a ?.
Would you prefer this approach?

My point is, appart from not being able to treeshake it, there is no real reason not to do:

let monitoringMessageObservable = new Observable<MonitoringMessage>()

where monitoringMessageObservable is never undefined.

If you think that a getter to work around the treeshakability limitation is not worth it, then let's keep your solution as it is.

packages/core/src/domain/internalMonitoring/internalMonitoring.ts

BenoitZugmeyer · 2022-03-01T10:57:07Z

packages/core/src/domain/internalMonitoring/startMonitoringBatch.ts


-export function startMonitoringBatch(configuration: Configuration) {
-  const primaryBatch = createMonitoringBatch(configuration.internalMonitoringEndpointBuilder!)
+export function startMonitoringBatch<T extends Context>(


💭 thought: ‏This is interesting, because now this function has no "monitoring"-related logic, and could be use in other places (could be factorized with startLoggerBatch or startRumBatch with a bit of changes). The Batch class could be seen as an internal implementation detail, and this new generic function could be used instead.

Indeed, startLoggerBatch and startMonitoringBatch were almost the same thing 🤔
I had in mind to only keep this abstraction while having both systems but we could use that for logs as well and keep it.

For RUM, it seems a bit trickier since there are more behaviors related to the batch (upsert, unload, replica app id).

Wdyt of only mutualizing that with logs?

Indeed, for RUM it is trickier. I would still factorize it though:

while only used by RUM, upsert and unload are implemented in Batch, so it wouldn't hurt to expose it through the more abstract startBatch function.

for replica specificities, we could have a replicaContext

It could be done in a future task though

BenoitZugmeyer · 2022-03-01T11:20:48Z

packages/logs/src/boot/startLogs.ts

+  const monitoringBatch = startMonitoringBatch(
+    configuration,
+    configuration.rumEndpointBuilder,
+    configuration.replica?.rumEndpointBuilder
+  )
+  internalMonitoring.telemetryEventObservable.subscribe((event) => monitoringBatch.add(event))


💭 thought: ‏This got me confused, and maybe it could be clearer if startInternalMonitoring received the monitoringBatch as an argument. It would require a bit of work in RUM because currently the batch is created only when we got the first View, but we could create it earlier. In this PR, any telemetry event produced before connecting the observable to the batch is ignored, so I think it make sense to create the batch as early as possible.

We could have a batchInterface as a dependency but with the observable we don't really need to be coupled with it and we can just let the caller do the wiring.
I think we could even refactor the current state to have a monitoringMessageObservable and let RUM and logs handle the wiring with either the monitoring batch or the bridge.

About ensuring to have the RUM batch early, I can experiment with that and see if there is any blocker.

wdyt?

Sounds good to me, let's experiment!

bcaudan · 2022-03-03T11:02:53Z

closed in favor of #1374

bcaudan force-pushed the bcaudan/telemetry-exp branch from c175887 to 8fd540d Compare February 23, 2022 14:11

bcaudan changed the base branch from main to bcaudan/telemetry-schema February 23, 2022 14:12

bcaudan force-pushed the bcaudan/telemetry-exp branch from 8fd540d to 8e26ff2 Compare February 23, 2022 14:40

bcaudan force-pushed the bcaudan/telemetry-schema branch from e57a036 to bfcb23a Compare February 24, 2022 13:49

bcaudan force-pushed the bcaudan/telemetry-exp branch 3 times, most recently from cee4ea0 to 7efef8d Compare February 24, 2022 14:40

Base automatically changed from bcaudan/telemetry-schema to main February 24, 2022 15:25

bcaudan force-pushed the bcaudan/telemetry-exp branch 3 times, most recently from a074bff to e1cf3ae Compare February 25, 2022 09:58

bcaudan added 4 commits February 25, 2022 11:23

update internalMonitoring to handle new telemetry events

67bc5e1

add telemetry event to RUM batch

982df2d

♻️ expose startMonitoringBatch

d935cd9

Create a "RUM" batch for telemetry events in logs

5f884b7

bcaudan force-pushed the bcaudan/telemetry-exp branch from e1cf3ae to 5f884b7 Compare February 25, 2022 10:23

bcaudan marked this pull request as ready for review February 25, 2022 10:23

bcaudan requested a review from a team as a code owner February 25, 2022 10:23

bcaudan changed the title ~~⚗️ collect telemetry events~~ ⚗️ [RUMF-1181] collect telemetry events Feb 25, 2022

amortemousque approved these changes Feb 25, 2022

View reviewed changes

BenoitZugmeyer reviewed Mar 1, 2022

View reviewed changes

bcaudan added 4 commits March 1, 2022 16:10

Merge branch 'main' into bcaudan/telemetry-exp

5032033

👌 remove explicit cast

7f21383

👌 use timeStampNow

dee6ae4

👌 remove useless ?

3c7f20c

This was referenced Mar 2, 2022

♻️ [RUMF-1181] preliminary refactorings for telemetry events collection #1371

Merged

⚗️ [RUMF-1181] collect telemetry events #1374

Merged

bcaudan closed this Mar 3, 2022

bcaudan deleted the bcaudan/telemetry-exp branch October 4, 2022 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚗️ [RUMF-1181] collect telemetry events #1351

⚗️ [RUMF-1181] collect telemetry events #1351

bcaudan commented Feb 21, 2022 •

edited

Loading

codecov-commenter commented Feb 25, 2022 •

edited

Loading

BenoitZugmeyer Feb 23, 2022

bcaudan Mar 1, 2022

BenoitZugmeyer Mar 1, 2022

bcaudan Mar 2, 2022 •

edited

Loading

BenoitZugmeyer Mar 1, 2022

bcaudan Mar 1, 2022

BenoitZugmeyer Mar 1, 2022

bcaudan Mar 2, 2022

BenoitZugmeyer Mar 2, 2022

BenoitZugmeyer Mar 1, 2022

bcaudan Mar 1, 2022

BenoitZugmeyer Mar 1, 2022

BenoitZugmeyer Mar 1, 2022 •

edited

Loading

bcaudan Mar 1, 2022 •

edited

Loading

BenoitZugmeyer Mar 1, 2022

bcaudan commented Mar 3, 2022

⚗️ [RUMF-1181] collect telemetry events #1351

⚗️ [RUMF-1181] collect telemetry events #1351

Conversation

bcaudan commented Feb 21, 2022 • edited Loading

Motivation

Changes

Testing

codecov-commenter commented Feb 25, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcaudan Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenoitZugmeyer Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

bcaudan Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcaudan commented Mar 3, 2022

bcaudan commented Feb 21, 2022 •

edited

Loading

codecov-commenter commented Feb 25, 2022 •

edited

Loading

bcaudan Mar 2, 2022 •

edited

Loading

BenoitZugmeyer Mar 1, 2022 •

edited

Loading

bcaudan Mar 1, 2022 •

edited

Loading