Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚗️ [RUMF-1181] collect telemetry events #1351

Closed
wants to merge 8 commits into from

Conversation

bcaudan
Copy link
Contributor

@bcaudan bcaudan commented Feb 21, 2022

Motivation

Experiment with dual shipping telemetry events in existing and new system

Next steps:

  • Add telemetry sample rate configuration
  • Remove old system when new system is OK
  • Rename internal monitoring to telemetry

Changes

  • core: update internalMonitoring to handle new telemetry events

Behind telemetry feature flag, add telemetry events:

  • for RUM: in the batch of RUM events
  • for logs: in a new "RUM" batch

Testing

  • Local
  • Staging
  • Unit
  • End to end

I have gone over the contributing documentation.

@bcaudan bcaudan force-pushed the bcaudan/telemetry-exp branch from c175887 to 8fd540d Compare February 23, 2022 14:11
@bcaudan bcaudan changed the base branch from main to bcaudan/telemetry-schema February 23, 2022 14:12
@bcaudan bcaudan force-pushed the bcaudan/telemetry-exp branch from 8fd540d to 8e26ff2 Compare February 23, 2022 14:40
@bcaudan bcaudan force-pushed the bcaudan/telemetry-schema branch from e57a036 to bfcb23a Compare February 24, 2022 13:49
@bcaudan bcaudan force-pushed the bcaudan/telemetry-exp branch 3 times, most recently from cee4ea0 to 7efef8d Compare February 24, 2022 14:40
Base automatically changed from bcaudan/telemetry-schema to main February 24, 2022 15:25
@bcaudan bcaudan force-pushed the bcaudan/telemetry-exp branch 3 times, most recently from a074bff to e1cf3ae Compare February 25, 2022 09:58
@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2022

Codecov Report

Merging #1351 (3c7f20c) into main (ae20574) will decrease coverage by 0.22%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1351      +/-   ##
==========================================
- Coverage   91.12%   90.89%   -0.23%     
==========================================
  Files         104      104              
  Lines        4269     4304      +35     
  Branches      950      965      +15     
==========================================
+ Hits         3890     3912      +22     
- Misses        379      392      +13     
Impacted Files Coverage Δ
packages/rum-core/src/boot/startRum.ts 35.13% <0.00%> (-6.81%) ⬇️
packages/logs/src/boot/startLogs.ts 85.24% <55.55%> (-5.14%) ⬇️
.../domain/internalMonitoring/startMonitoringBatch.ts 81.81% <80.00%> (ø)
...rc/domain/internalMonitoring/internalMonitoring.ts 95.23% <88.88%> (-3.25%) ⬇️
packages/rum-core/src/domain/lifeCycle.ts 100.00% <100.00%> (ø)
packages/rum-core/src/transport/startRumBatch.ts 77.14% <100.00%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae20574...3c7f20c. Read the comment docs.

@bcaudan bcaudan force-pushed the bcaudan/telemetry-exp branch from e1cf3ae to 5f884b7 Compare February 25, 2022 10:23
@bcaudan bcaudan marked this pull request as ready for review February 25, 2022 10:23
@bcaudan bcaudan requested a review from a team as a code owner February 25, 2022 10:23
@bcaudan bcaudan changed the title ⚗️ collect telemetry events ⚗️ [RUMF-1181] collect telemetry events Feb 25, 2022
Comment on lines +65 to +67
internalMonitoring.telemetryEventObservable.subscribe((event) => {
lifeCycle.notify(LifeCycleEventType.TELEMETRY_EVENT_COLLECTED, event)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏Maybe we don't need to have a proper LifeCycle event for this, we could telemetryEventObservable.subscribe direcly in RumBatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about it but:

  • it was an easier change since rum batch is in rum event collection so it would add a new dependency to a couple of methods signatures and their corresponding tests
  • it is more consistent with how we deal with RUM_EVENT_COLLECTED

any reason to push the other way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, adding complexity to the LifeCycle adds complexity to the whole RUM SDK since it is a central piece used everywhere. I understand that it's a good way to decouple some modules to ease unit tests.

But here we have a chance to not use the LifeCycle and keep the complexity local instead. It would be one less indirection, making things easier to follow. Also it would be more consistent with Logs.

Copy link
Contributor Author

@bcaudan bcaudan Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, it is the tradeoff with having this "event bus"-like architecture, you have this dependency used almost everywhere.
If you find it bothering, we should probably discuss removing lifeCycle all together at some point, it would avoid this kind of debates.

I'll see where we are after #1351 (comment)

@@ -31,18 +39,34 @@ const monitoringConfiguration: {
sentMessageCount: number
} = { maxMessagesPerPage: 0, sentMessageCount: 0 }

let onInternalMonitoringMessageCollected: ((message: MonitoringMessage) => void) | undefined
let monitoringMessageObservable: Observable<MonitoringMessage> | undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏FMU building the observable here would be against our treeshakability rules, but maybe you could use a getter function like this to avoid handling the "undefined" case:

function getMonitoringMessageObservable(): Observable<MonitoringMessage> {
  if (!monitoringMessageObservable) {
    monitoringMessageObservable = new Observable()
  }
  return monitoringMessageObservable
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMU you are proposing to do:

function addToMonitoring(message: MonitoringMessage) {
  if (
    monitoringConfiguration.sentMessageCount < monitoringConfiguration.maxMessagesPerPage
  ) {
    monitoringConfiguration.sentMessageCount += 1
    getMonitoringMessageObservable().notify(message)
  }
}

I found it clearer to explicitly create the observable on the start methods and since we have a reset method, I would find it surprising that after a reset, adding a message recreate the observable.
wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not an unusual pattern: we use it for xhr and fetch observables, and even more similarly in replayStats where when we add a stat, the map gets created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the design seems different to me:

  • for xhr/fetch, we expose an init function that get or create the singleton
  • for replayStats, we only expose methods to interact with the replay stats, none to explicitly initialize it

here we expose a start method, so it feels weird to me to create it outside of it.

However, for replay stats the undefined case is also handled in getReplayStats with a ?.
Would you prefer this approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, appart from not being able to treeshake it, there is no real reason not to do:

let monitoringMessageObservable = new Observable<MonitoringMessage>()

where monitoringMessageObservable is never undefined.

If you think that a getter to work around the treeshakability limitation is not worth it, then let's keep your solution as it is.


export function startMonitoringBatch(configuration: Configuration) {
const primaryBatch = createMonitoringBatch(configuration.internalMonitoringEndpointBuilder!)
export function startMonitoringBatch<T extends Context>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏This is interesting, because now this function has no "monitoring"-related logic, and could be use in other places (could be factorized with startLoggerBatch or startRumBatch with a bit of changes). The Batch class could be seen as an internal implementation detail, and this new generic function could be used instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, startLoggerBatch and startMonitoringBatch were almost the same thing 🤔
I had in mind to only keep this abstraction while having both systems but we could use that for logs as well and keep it.

For RUM, it seems a bit trickier since there are more behaviors related to the batch (upsert, unload, replica app id).

Wdyt of only mutualizing that with logs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, for RUM it is trickier. I would still factorize it though:

  • while only used by RUM, upsert and unload are implemented in Batch, so it wouldn't hurt to expose it through the more abstract startBatch function.
  • for replica specificities, we could have a replicaContext

It could be done in a future task though

Comment on lines +36 to +41
const monitoringBatch = startMonitoringBatch(
configuration,
configuration.rumEndpointBuilder,
configuration.replica?.rumEndpointBuilder
)
internalMonitoring.telemetryEventObservable.subscribe((event) => monitoringBatch.add(event))
Copy link
Member

@BenoitZugmeyer BenoitZugmeyer Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏This got me confused, and maybe it could be clearer if startInternalMonitoring received the monitoringBatch as an argument. It would require a bit of work in RUM because currently the batch is created only when we got the first View, but we could create it earlier. In this PR, any telemetry event produced before connecting the observable to the batch is ignored, so I think it make sense to create the batch as early as possible.

Copy link
Contributor Author

@bcaudan bcaudan Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a batchInterface as a dependency but with the observable we don't really need to be coupled with it and we can just let the caller do the wiring.
I think we could even refactor the current state to have a monitoringMessageObservable and let RUM and logs handle the wiring with either the monitoring batch or the bridge.

About ensuring to have the RUM batch early, I can experiment with that and see if there is any blocker.

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, let's experiment!

@bcaudan
Copy link
Contributor Author

bcaudan commented Mar 3, 2022

closed in favor of #1374

@bcaudan bcaudan closed this Mar 3, 2022
@bcaudan bcaudan deleted the bcaudan/telemetry-exp branch October 4, 2022 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants