-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(instrumentation): add OpenTelemetry tracing and metrics with basic configurations #5175
Changes from 84 commits
a5a7f42
9e1b2d0
514792a
c3b0c37
2269b57
14cb744
f53be22
a4a4621
78efb44
7116e9f
92d3679
eb0ccd3
38cae61
45d1794
b107f80
cd17588
2e44270
a083146
30ee9e3
3998e2f
30409c2
e2ee862
a0bfaf8
60be044
9da9eaf
adb96ba
0af8ffb
a241f62
528e38b
47ed0a8
3f436da
578e882
aa5a34a
87c15f5
f7b4af4
3a2e1de
82dad9c
42d00e6
9ade3b6
4132396
4efbbd7
8e9abcb
8eed211
175a399
030b980
c686498
6d21a3a
00c6c12
92c0e1f
6e27829
366a20e
822b541
963b82d
6433930
ffadb73
3f6eeff
6b35909
2906369
f1ad7a2
d7bb8d9
5e31dca
c23f30a
bcc39a8
c540628
2ce9c67
adcb457
0ae5f99
b45de43
bbd2fb8
222cfb9
dcf7296
57be55e
7266abc
e9e78ae
3656afc
4f83c47
a9d5b1b
a706480
ef4a232
1c0aedd
c292234
01d543b
4afc51b
70146e4
7f20c06
550a975
b644004
d55d86c
132a932
bb0b003
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,12 @@ | |
| `port_monitoring` | The port on which the prometheus server is exposed, default is a random port between [49152, 65535] | `string` | `random in [49152, 65535]` | | ||
| `retries` | Number of retries per gRPC call. If <0 it defaults to max(3, num_replicas) | `number` | `-1` | | ||
| `floating` | If set, the current Pod/Deployment can not be further chained, and the next `.add()` will chain after the last Pod/Deployment not this current one. | `boolean` | `False` | | ||
| `tracing` | If set, the sdk implementation of the OpenTelemetry tracer will be available and will be enabled for automatic tracing of requests and customer span creation. Otherwise a no-op implementation will be provided. | `boolean` | `False` | | ||
| `span_exporter_host` | If tracing is enabled, this hostname will be used to configure the trace exporter agent. | `string` | `None` | | ||
| `span_exporter_port` | If tracing is enabled, this port will be used to configure the trace exporter agent. | `number` | `None` | | ||
| `metrics` | If set, the sdk implementation of the OpenTelemetry metrics will be available for default monitoring and custom measurements. Otherwise a no-op implementation will be provided. | `boolean` | `False` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand the sentence. Isn't it going to overlap with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, my intention is to use the same terms as OpenTelemetry. If people read the OpenTelemetry documentation then the terms are aligned. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be renamed to |
||
| `metrics_exporter_host` | If tracing is enabled, this hostname will be used to configure the metrics exporter agent. | `string` | `None` | | ||
| `metrics_exporter_port` | If tracing is enabled, this port will be used to configure the metrics exporter agent. | `number` | `None` | | ||
| `install_requirements` | If set, install `requirements.txt` in the Hub Executor bundle to local | `boolean` | `False` | | ||
| `force_update` | If set, always pull the latest Hub Executor bundle even it exists on local | `boolean` | `False` | | ||
| `compression` | The compression mechanism used when sending requests from the Head to the WorkerRuntimes. For more details, check https://grpc.github.io/grpc/python/grpc.html#compression. | `string` | `None` | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a small thing that I mentioned already, so sorry to be a PITA about this, but I really think we should switch these around to ''host/port_span_exporter" to align them with the nomenclature of the prometheus feature. It's the small things that make a good user experience imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohannesMessner what name would u suggest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The port_monitoring won't exist in the near future and there will be only the
span_exporter
attributes. I'm generally used to seeing and using_host
as the suffix rather than as a prefix.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then we might introduce a breaking change right ? We need to be careful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can deprecated an argument if needed but this should be thinked ahead.
@girishc13 could you show here what would be the relevant arguments on this near future where
port_monitoring
does not exist ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you can think of the naming in terms on the yaml configuration for the OpenTelemetry collector. The hierarchy that I'm implicitly used to is:
dependency -> service -> host, port, ...
. So this naturally follows the convention ofservice.host
andservice.port
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of deprecating our current
port_monitoring
so quickly after it being introduced, but if it leads to a nicer and more unified experience moving forward then we'll have to do it.But apart from the argument naming, am I understanding correctly that, according to this plan, the user won't be able to use prometheus to collect metrics anymore? Or will the setup on the user side remain the same, and we only change the way we expose these metrics from our internals?
Because on the otel collector site I still see some prometheus logos but some of them are not connected to the system, so I am a bit lost.
If this is the case, then I don't think we should remove the current way users set up their metrics pipeline. This would be a huge breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main concern from my understanding is introducing a breaking change for the metrics data which requires new setup. Do we have data on how many users are using the Prometheus client for monitoring except for JCloud users? Also the lack of interior between OpenTelemetry monitoring and Prometheus monitoring makes it a bit hard to just remove the current monitoring setup.
I can think of the following ways to tackle this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would declared the old metric system as deprecated (TO BE REMOVED in a couple of minors) and go with full OpenTelemetry approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The official Prometheus library already supports OpenTelemetry api's and sdk's. The OpenTelemetry Collector also supports scraping data from the existing Prometheus client. We might need some elaborate configuration for metrics and OpenTelemetry Collector to support the existing mechanism but OpenTelemetry is the way to go.