diff --git a/.chloggen/961.yaml b/.chloggen/961.yaml new file mode 100644 index 0000000000..b2b5becd07 --- /dev/null +++ b/.chloggen/961.yaml @@ -0,0 +1,22 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: bug_fix + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: url + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: Query string values are now redacted by default due to concerns around leaking sensitive data. + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +# The values here must be integers. +issues: [ 860 ] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: diff --git a/docs/attributes-registry/url.md b/docs/attributes-registry/url.md index 69ca3fb8f3..4a46f5ec8c 100644 --- a/docs/attributes-registry/url.md +++ b/docs/attributes-registry/url.md @@ -12,11 +12,11 @@ linkTitle: URL | `url.domain` | string | Domain extracted from the `url.full`, such as "opentelemetry.io". [1] | `www.foo.bar`; `opentelemetry.io`; `3.12.167.2`; `[1080:0:0:0:8:800:200C:417A]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `url.extension` | string | The file extension extracted from the `url.full`, excluding the leading dot. [2] | `png`; `gz` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `url.fragment` | string | The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| `url.full` | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [3] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `url.full` | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [3] | `https://www.foo.bar/search?q=REDACTED&v=REDACTED#SemConv`; `//localhost` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `url.original` | string | Unmodified original URL as seen in the event source. [4] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `search?q=OpenTelemetry` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `url.path` | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [5] | `/search` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `url.port` | int | Port extracted from the `url.full` | `443` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `url.query` | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [6] | `q=OpenTelemetry` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| `url.query` | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [6] | `q=REDACTED&v=REDACTED` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `url.registered_domain` | string | The highest registered url domain, stripped of the subdomain. [7] | `example.com`; `foo.co.uk` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `url.scheme` | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. [8] | `east`; `sub2.sub1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -28,14 +28,17 @@ linkTitle: URL **[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). +Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `https://www.example.com/path?q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[4]:** In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not. `url.original` might contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same. **[5]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it. -**[6]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +**[6]:** Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[7]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for `foo.example.com` is `example.com`. Trying to approximate this by simply taking the last two labels will not work well for TLDs such as `co.uk`. diff --git a/docs/database/elasticsearch.md b/docs/database/elasticsearch.md index f95b4ffe80..6440cda155 100644 --- a/docs/database/elasticsearch.md +++ b/docs/database/elasticsearch.md @@ -57,7 +57,9 @@ Tracing instrumentations that do so, MUST also set `http.request.method_original **[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). +Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `https://www.example.com/path?q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[4]:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. diff --git a/docs/http/http-spans.md b/docs/http/http-spans.md index 73a6915e58..9c402b791c 100644 --- a/docs/http/http-spans.md +++ b/docs/http/http-spans.md @@ -127,7 +127,7 @@ For an HTTP client span, `SpanKind` MUST be `Client`. | [`http.request.method`](../attributes-registry/http.md) | string | HTTP request method. [1] | `GET`; `POST`; `HEAD` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.address`](../attributes-registry/server.md) | string | Host identifier of the ["URI origin"](https://www.rfc-editor.org/rfc/rfc9110.html#name-uri-origin) HTTP request is sent to. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](../attributes-registry/server.md) | int | Port identifier of the ["URI origin"](https://www.rfc-editor.org/rfc/rfc9110.html#name-uri-origin) HTTP request is sent to. [3] | `80`; `8080`; `443` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`url.full`](../attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [4] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`url.full`](../attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [4] | `https://www.foo.bar/search?q=REDACTED&v=REDACTED#SemConv`; `//localhost` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [5] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If request has ended with an error. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`http.request.method_original`](../attributes-registry/http.md) | string | Original HTTP method sent by the client in the request line. | `GeT`; `ACL`; `foo` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`http.response.status_code`](../attributes-registry/http.md) | int | [HTTP response status code](https://tools.ietf.org/html/rfc7231#section-6). | `200` | `Conditionally Required` If and only if one was received/sent. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -163,7 +163,9 @@ Tracing instrumentations that do so, MUST also set `http.request.method_original **[4]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). +Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `https://www.example.com/path?q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[5]:** If the request fails with an error before response status code was sent or received, `error.type` SHOULD be set to exception type (its fully-qualified class name, if applicable) @@ -332,7 +334,7 @@ For an HTTP server span, `SpanKind` MUST be `Server`. | [`http.route`](../attributes-registry/http.md) | string | The matched route, that is, the path template in the format used by the respective server framework. [6] | `/users/:userID?`; `{controller}/{action}/{id?}` | `Conditionally Required` If and only if it's available | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`network.protocol.name`](../attributes-registry/network.md) | string | [OSI application layer](https://osi-model.com/application-layer/) or non-OSI equivalent. [7] | `http`; `spdy` | `Conditionally Required` [8] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](../attributes-registry/server.md) | int | Port of the local HTTP server that received the request. [9] | `80`; `8080`; `443` | `Conditionally Required` If `server.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [10] | `q=OpenTelemetry` | `Conditionally Required` If and only if one was received/sent. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [10] | `q=REDACTED&v=REDACTED` | `Conditionally Required` If and only if one was received/sent. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`client.address`](../attributes-registry/client.md) | string | Client address - domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [11] | `83.164.160.102` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`network.peer.address`](../attributes-registry/network.md) | string | Peer address of the network connection - IP address or Unix domain socket name. | `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`network.peer.port`](../attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | `Recommended` If `network.peer.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -393,7 +395,8 @@ SHOULD include the [application root](/docs/http/http-spans.md#http-server-defin **[9]:** See [Setting `server.address` and `server.port` attributes](/docs/http/http-spans.md#setting-serveraddress-and-serverport-attributes). -**[10]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +**[10]:** Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[11]:** The IP address of the original client behind all proxies, if known (e.g. from [Forwarded#for](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#for), [X-Forwarded-For](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-For), or a similar header). Otherwise, the immediate client peer address. diff --git a/docs/url/url.md b/docs/url/url.md index a33c8f046b..e7a00d4067 100644 --- a/docs/url/url.md +++ b/docs/url/url.md @@ -26,18 +26,21 @@ This document defines semantic conventions that describe URL and its components. | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`url.fragment`](../attributes-registry/url.md) | string | The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`url.full`](../attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [1] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`url.full`](../attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [1] | `https://www.foo.bar/search?q=REDACTED&v=REDACTED#SemConv`; `//localhost` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`url.path`](../attributes-registry/url.md) | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [2] | `/search` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=OpenTelemetry` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=REDACTED&v=REDACTED` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`url.scheme`](../attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless. `url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. -`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. +`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). +Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `https://www.example.com/path?q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. **[2]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it. -**[3]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. +**[3]:** Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). +Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. ## Sensitive information diff --git a/model/registry/url.yaml b/model/registry/url.yaml index e0bd823f1c..ee7994c704 100644 --- a/model/registry/url.yaml +++ b/model/registry/url.yaml @@ -42,8 +42,12 @@ groups: In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`. `url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). - Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it. - examples: ['https://www.foo.bar/search?q=OpenTelemetry#SemConv', '//localhost'] + + Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. + `https://www.example.com/path?q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). + + Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. + examples: ['https://www.foo.bar/search?q=REDACTED&v=REDACTED#SemConv', '//localhost'] - id: original type: string stability: experimental @@ -75,9 +79,12 @@ groups: type: string brief: > The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component - examples: ["q=OpenTelemetry"] + examples: ["q=REDACTED&v=REDACTED"] note: > - Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it. + Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. + `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). + + Instrumentation SHOULD provide a configuration option to capture the full query string without any redaction. - id: registered_domain type: string stability: experimental