Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logger for CLI telemetry #2083

Open
wants to merge 5 commits into
base: refactor-bundle-init-squashed
Choose a base branch
from

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Jan 6, 2025

Changes

Replaces #2037.

This PR adds a telemetry logger library that allows us to log arbitrary events as long as a corresponding proto exists in the universe.

We add a timeout of 3 seconds for logging the telemetry. Manual benchmarks show that API requests can take anywhere from 1-3 seconds to finish. We start here with a pragmatic approach and can improve this it of the UX in either of two ways:

  1. Add a new command to publish telemetry, and spawn a new process so that the main process can exit without waiting for the telemetry to publish.
  2. Maintain a warm TCP connection so that logging the telemetry will only take the round trip time and avoid the overhead associated with

Tests

Integration and unit tests.

@shreyas-goenka shreyas-goenka changed the base branch from main to refactor-bundle-init-squashed January 6, 2025 14:05
// right about as the CLI command is about to exit. The API endpoint can handle
// payloads with ~1000 events easily. Thus we log all the events at once instead of
// batching the logs across multiple API calls.
func (l *defaultLogger) Flush(ctx context.Context, apiClient DatabricksApiClient) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, this function could spawn a new process to avoid the latency cost altogether. Flushing the events to disk and having a new process consume them seems more reliable than this timeout based approach.

@shreyas-goenka
Copy link
Contributor Author

Integration tests are green.

@shreyas-goenka shreyas-goenka marked this pull request as ready for review January 6, 2025 18:51
// to avoid making actual API calls. Thus WithDefaultLogger should silently
// ignore the mock logger.
default:
panic(fmt.Errorf("unexpected telemetry logger type: %T", v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this typecheck here? Why panic (as opposed to logging a message).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reaching here typically is a developer error. Panicking informs the developer there's a problem without having to return and propagate the error. For example, we also panic here: dbr.RunsOnRuntime(ctx)

We do the type check because we do not want to error or override if a mock logger is configured.


type mockLogger struct {
events []DatabricksCliLog
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need mockLogger? The real logger already accepts interface for API call, which can be no-op in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid adding the Introspect() method to the real logger and limit it's usage to strictly unit / integration tests.

With the mock logger you can also directly assert on a typed object for the logs / events. If we do the assertions on the apiClient directly (like we did in libs/telemetry/logger_test.go) then you need to assert on the serialized JSON strings for the events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants