-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: add skaffold trace information (design + plumbing) #5756
Labels
area/performance
kind/design discussion
kind/feature-request
kind/todo
implementation task/epic for the skaffold team
planning/Q2-21
priority/p1
High impact feature/bug.
Milestone
Comments
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Apr 29, 2021
What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
trying to make triage-party happy :) |
aaron-prindle
added a commit
that referenced
this issue
May 2, 2021
What is the problem being solved? Part of #5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing. Why is this the best approach? This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.). The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects. See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c What future work remains to be done? N/A
aaron-prindle
changed the title
Feature Request: Skaffold Performance Observability - skaffold trace information
Feature Request: skaffold trace information (design + plumbing)
May 5, 2021
aaron-prindle
changed the title
Feature Request: skaffold trace information (design + plumbing)
Feature Request: add skaffold trace information (design + plumbing)
May 5, 2021
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What other approaches did you consider? N/A What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 17, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 24, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 24, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 24, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 25, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 26, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
May 27, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 1, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 1, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 1, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 1, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 1, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 2, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
to aaron-prindle/skaffold
that referenced
this issue
Jun 2, 2021
…rters What is the problem being solved? Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle
added a commit
that referenced
this issue
Jun 2, 2021
…rters (#5854) What is the problem being solved? Part of #5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold). Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger. This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly. Why is this the best approach? Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard. Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out. What other approaches did you consider? There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations (eg: how long to hash a file, etc.) it makes sense to not integrate them. What side effects will this approach have? There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally. I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold. What future work remains to be done? Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.). Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added. Some future work mentioned in PR review included: - OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
Fixed with #5854 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/performance
kind/design discussion
kind/feature-request
kind/todo
implementation task/epic for the skaffold team
planning/Q2-21
priority/p1
High impact feature/bug.
For getting more information about skaffold's performance and ensuring that skaffold does not have performance degradation over time, skaffold should adding trace information to skaffold commands. Currently skaffold metrics can track the length of a
skaffold dev
session but there is no "child" trace information showing how specific actions took - yaml parsing, builds, deploys, etc.The steps that I have outlined for this work include:
otel
) libs (v0.13.0 -> v0.20.0) for skaffold (to use updated APIs & trace implementation) - update otel libs from v0.13.0 -> v0.20.0 #5757The text was updated successfully, but these errors were encountered: