Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add skaffold trace information (design + plumbing) #5756

Closed
3 tasks done
aaron-prindle opened this issue Apr 29, 2021 · 2 comments
Closed
3 tasks done
Assignees
Labels
Milestone

Comments

@aaron-prindle
Copy link
Contributor

aaron-prindle commented Apr 29, 2021

For getting more information about skaffold's performance and ensuring that skaffold does not have performance degradation over time, skaffold should adding trace information to skaffold commands. Currently skaffold metrics can track the length of a skaffold dev session but there is no "child" trace information showing how specific actions took - yaml parsing, builds, deploys, etc.

The steps that I have outlined for this work include:

aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Apr 29, 2021
What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
@MarlonGamez MarlonGamez added the kind/todo implementation task/epic for the skaffold team label Apr 30, 2021
@MarlonGamez
Copy link
Contributor

trying to make triage-party happy :)

aaron-prindle added a commit that referenced this issue May 2, 2021
What is the problem being solved?
Part of #5756, adding opentelemetry trace information to skaffold commands. Updating out libs to the latest otel version adds additional useful functionality for tracing.

Why is this the best approach?
This approach uses go mod (updated via "go get <pkg>") and minor API changes to our otel API usage for the update.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach, the changes to otel's API were renaming/moving things (packages, functions, etc.).  The only option removed was stdout Quantile aggregation (stdout.WithQuantiles) but I do not think this will have side effects.  See open-telemetry/opentelemetry-go@49f699d#diff-2b283a7fb9f9b66e31a2b51a9ae9cad3599650a633f02fea9a956c4f6a714c6c

What future work remains to be done?
N/A
@aaron-prindle aaron-prindle added this to the v1.24.0 milestone May 3, 2021
@aaron-prindle aaron-prindle self-assigned this May 4, 2021
@aaron-prindle aaron-prindle changed the title Feature Request: Skaffold Performance Observability - skaffold trace information Feature Request: skaffold trace information (design + plumbing) May 5, 2021
@aaron-prindle aaron-prindle changed the title Feature Request: skaffold trace information (design + plumbing) Feature Request: add skaffold trace information (design + plumbing) May 5, 2021
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.  Additionally there was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What other approaches did you consider?
N/A

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 17, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
@tejal29 tejal29 added the priority/p1 High impact feature/bug. label May 17, 2021
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 24, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 25, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
@tejal29 tejal29 modified the milestones: v1.25.0, v1.26.0 May 26, 2021
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 26, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue May 27, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 1, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 2, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit to aaron-prindle/skaffold that referenced this issue Jun 2, 2021
…rters

What is the problem being solved?
Part of GoogleContainerTools#5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
aaron-prindle added a commit that referenced this issue Jun 2, 2021
…rters (#5854)

What is the problem being solved?
Part of #5756, adding opentelemetry trace information to skaffold commands. Added trace information to specific performance critical skaffold functions (identified in go/cloud-trace-skaffold).  Also added 4 trace exporters - gcp-skaffold, gcp-adc, stdout, and jaeger.  This PR uses env var based enabling/disabling for the trace for simplicity and to hide it from users directly.

Why is this the best approach?
Using opentelemetry tracing is the obvious choice as we use open telemetry libs for metrics and it is becoming the metrics/tracing standard.  Using an env var in this PR and later integrating the flag setup was considered optimal as currently skaffold tracing will be used for benchmarking/bottleneck-identifying for select use cases while the user facing UX w/ jaeger, etc. is still being worked out.

What other approaches did you consider?
There was the possibility of building tracing directly into skaffold events but I think with the current wrapper setup in pkg/skaffold/instrumentation/trace.go (w/ the minimal code required) and the fact that many trace locations will not be event locations  (eg: how long to hash a file, etc.) it makes sense to not integrate them.

What side effects will this approach have?
There shouldn't be any side effects w/ this approach as the default "off" for tracing and the minimal user visibility for now should mean that it used only for select use cases experimentally.  I have done timing tests with the no-op/empty trace (SKAFFOLD_TRACE unset) and it does not change the performance of skaffold.

What future work remains to be done?
Future work includes wiring up a --trace flag through dev, build, deploy, etc. and working on how skaffold might be able to do distributed tracing w/ other tools (minikube, buildpacks, etc.).  Additionally the ability to allow for more sporadic sampling (vs AlwaysSample) should be added.  Some future work mentioned in PR review included:
- OTEL_TRACES_EXPORTER=* support (vs SKAFFOLD_TRACE)
@aaron-prindle
Copy link
Contributor Author

Fixed with #5854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants