From cb4f9f2125c00d6344276573bc0ac8b2110278bb Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Thu, 24 Nov 2022 13:30:38 +0100 Subject: [PATCH 1/9] Initial sceleton --- src/docs/sdk/data-handling.mdx | 48 ++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 5fb67d9079..5fc51cbf55 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -15,19 +15,19 @@ and is **disabled by default**. That means that data that is naturally sensitive Some examples of data guarded by this flag: - - When attaching HTTP requests to events - - Request Body: "raw" bodies (bodies which cannot be parsed as JSON or formdata) are removed - - HTTP Headers: known sensitive headers such as `Authorization` or `Cookies` are removed too. - - *Note* that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK. - - User-specific information (e.g. the current user ID according to the used web-framework) is not sent at all. - - On desktop applications - - The username logged in the device is not included. This is often a person's name. - - The machine name is not included, for example `Bruno's laptop` - - SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address.* +- When attaching HTTP requests to events + - Request Body: "raw" bodies (bodies which cannot be parsed as JSON or formdata) are removed + - HTTP Headers: known sensitive headers such as `Authorization` or `Cookies` are removed too. + - _Note_ that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK. +- User-specific information (e.g. the current user ID according to the used web-framework) is not sent at all. +- On desktop applications + - The username logged in the device is not included. This is often a person's name. + - The machine name is not included, for example `Bruno's laptop` +- SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address.\* * Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. -This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. -But also developers to understand if issues in their application are being triggered by a single malicious source. + This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. + But also developers to understand if issues in their application are being triggered by a single malicious source. Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS. All other platforms require the event to include `user.ip={{auto}}` which happens if `sendDefaultPii` is set to true. @@ -51,6 +51,32 @@ Some examples of auto instrumentation that could attach sensitive data: - Desktop apps including window title. - A Web framework routing instrumentation attaching route `to` and `from`. +## Structuring Data + +For better data scrubbing on the server side, SDKs should save data in a strucutured way, when possible. + +Starting point of the discussion was this [RFC-0038](https://github.com/getsentry/rfcs/blob/main/text/0038-scrubbing-sensitive-data.md) + +Here a list of data that can be collected and how it should be stored: + +### Spans + +- `http` spans containing urls: + + tbd + +- `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) + + tbd + +### Breadcrumbs + +tbd + +### Local variables + +tbd + ## Variable Size Fields in the event payload that allow user-specified or dynamic values are restricted in size. This applies to most meta data fields, such as variables in a stack trace, as well as contexts, tags and extra data: From 282f29c6b5fe1e05f737a78805936e26cfd74607 Mon Sep 17 00:00:00 2001 From: Michi Hoffmann Date: Mon, 28 Nov 2022 14:18:57 +0100 Subject: [PATCH 2/9] Update data-handling.mdx --- src/docs/sdk/data-handling.mdx | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 5fc51cbf55..89746abe84 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -63,7 +63,16 @@ Here a list of data that can be collected and how it should be stored: - `http` spans containing urls: - tbd + The description of http spans must follow the format `HTTP_METHOD scheme://host/path` (`GET https://example.com/foo`). + If an authority is present in the URL (`https://username:password@example.com`), the authority must be omitted completely. + If query strings or fragments are present in the URL, both are set into the data attribute of the span. + + ```js + span.setData({ + 'http.query': url.getQuery(), + 'http.fragment: url.getFragment(), + }) + ``` - `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) From c5e19170232ff27007a6150e61977d3279a09044 Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Thu, 1 Dec 2022 16:53:31 +0100 Subject: [PATCH 3/9] Added otel semantic conventions --- src/docs/sdk/data-handling.mdx | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 89746abe84..21fd2efcf0 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -53,17 +53,15 @@ Some examples of auto instrumentation that could attach sensitive data: ## Structuring Data -For better data scrubbing on the server side, SDKs should save data in a strucutured way, when possible. - -Starting point of the discussion was this [RFC-0038](https://github.com/getsentry/rfcs/blob/main/text/0038-scrubbing-sensitive-data.md) - -Here a list of data that can be collected and how it should be stored: +For better data scrubbing on the server side, SDKs should save data in a strucutured way, when possible. Starting point of the discussion was [RFC-0038](https://github.com/getsentry/rfcs/blob/main/text/0038-scrubbing-sensitive-data.md) ### Spans +This helps Relay to know what kind of data it receives and this helps with scrubbing sensitive data. + - `http` spans containing urls: - The description of http spans must follow the format `HTTP_METHOD scheme://host/path` (`GET https://example.com/foo`). + The description of spans with `op` set to `http` must follow the format `HTTP_METHOD scheme://host/path` (ex. `GET https://example.com/foo`). If an authority is present in the URL (`https://username:password@example.com`), the authority must be omitted completely. If query strings or fragments are present in the URL, both are set into the data attribute of the span. @@ -74,17 +72,20 @@ Here a list of data that can be collected and how it should be stored: }) ``` -- `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) + Additionally all semantic conventions of OpenTelementry for http spans should be set in the `span.data` if applicable: + https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/ - tbd +- `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) -### Breadcrumbs + The `description` fields should include the saniticed database command. All sensitive data should be removed and replaced with a placeholder. -tbd + Additionally all semantic conventions of OpenTelementry for database spans should be set in the `span.data` if applicable: + https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/ -### Local variables +### Breadcrumbs -tbd +If the `message` in a breadcrumb contains an URL it should be formatted the same way as in `http` spans (see above). +The query and the fragment should also be set in the data attribute like with `http` spans. ## Variable Size From 39c4b1b5e0a26af57d013dd5ec06fa301e8843dd Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 14 Dec 2022 14:06:53 +0100 Subject: [PATCH 4/9] Update src/docs/sdk/data-handling.mdx Co-authored-by: Michi Hoffmann --- src/docs/sdk/data-handling.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 21fd2efcf0..3454fa217f 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -77,7 +77,8 @@ This helps Relay to know what kind of data it receives and this helps with scrub - `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) - The `description` fields should include the saniticed database command. All sensitive data should be removed and replaced with a placeholder. +The description of spans with `op` set to `db` must not include any query parameters. +Instead, use placeholders like `SELECT FROM 'users' WHERE id = ?` Additionally all semantic conventions of OpenTelementry for database spans should be set in the `span.data` if applicable: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/ From 9fcca59084df1f33b0c6bd7c0c7e5ff71087ca48 Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 14 Dec 2022 14:14:10 +0100 Subject: [PATCH 5/9] Auto formatting --- src/docs/sdk/data-handling.mdx | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 3454fa217f..d43edbaec6 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -24,10 +24,9 @@ Some examples of data guarded by this flag: - The username logged in the device is not included. This is often a person's name. - The machine name is not included, for example `Bruno's laptop` - SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address.\* - -* Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. - This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. - But also developers to understand if issues in their application are being triggered by a single malicious source. +- Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. + - This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. + - But also developers to understand if issues in their application are being triggered by a single malicious source. Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS. All other platforms require the event to include `user.ip={{auto}}` which happens if `sendDefaultPii` is set to true. @@ -67,9 +66,9 @@ This helps Relay to know what kind of data it receives and this helps with scrub ```js span.setData({ - 'http.query': url.getQuery(), - 'http.fragment: url.getFragment(), - }) + "http.query": url.getQuery(), + "http.fragment": url.getFragment(), + }); ``` Additionally all semantic conventions of OpenTelementry for http spans should be set in the `span.data` if applicable: @@ -77,11 +76,11 @@ This helps Relay to know what kind of data it receives and this helps with scrub - `db` spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...) -The description of spans with `op` set to `db` must not include any query parameters. -Instead, use placeholders like `SELECT FROM 'users' WHERE id = ?` + The description of spans with `op` set to `db` must not include any query parameters. + Instead, use placeholders like `SELECT FROM 'users' WHERE id = ?` - Additionally all semantic conventions of OpenTelementry for database spans should be set in the `span.data` if applicable: - https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/ +Additionally all semantic conventions of OpenTelementry for database spans should be set in the `span.data` if applicable: +https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/ ### Breadcrumbs From d92805ff7ad572728be6869ff03a9960705fbce8 Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 11 Jan 2023 11:30:58 +0100 Subject: [PATCH 6/9] Update src/docs/sdk/data-handling.mdx Co-authored-by: Manoel Aranda Neto <5731772+marandaneto@users.noreply.github.com> --- src/docs/sdk/data-handling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index d43edbaec6..bfc67dee36 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -15,7 +15,7 @@ and is **disabled by default**. That means that data that is naturally sensitive Some examples of data guarded by this flag: -- When attaching HTTP requests to events +- When attaching HTTP requests and responses to events - Request Body: "raw" bodies (bodies which cannot be parsed as JSON or formdata) are removed - HTTP Headers: known sensitive headers such as `Authorization` or `Cookies` are removed too. - _Note_ that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK. From a974157981a718ff71a2d6fa72a21815ef47407e Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 11 Jan 2023 14:13:03 +0100 Subject: [PATCH 7/9] Made breadcrumbs more explicit. --- src/docs/sdk/data-handling.mdx | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index bfc67dee36..ec090f4a6b 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -8,7 +8,7 @@ Data handling is the standardized context in how we want SDKs help users filter ## Sensitive Data SDKs should not include PII or other sensitive data in the payload by default. -When building an SDK we can come across to some API that can give useful information to debug a problem. +When building an SDK we can come across some API that can give useful information to debug a problem. In the event that API returns data considered PII, we guard that behind a flag called _Send Default PII_. This is an option in the SDK called [_send-default-pii_](https://docs.sentry.io/platforms/python/configuration/options/#send-default-pii) and is **disabled by default**. That means that data that is naturally sensitive is not sent by default. @@ -16,14 +16,14 @@ and is **disabled by default**. That means that data that is naturally sensitive Some examples of data guarded by this flag: - When attaching HTTP requests and responses to events - - Request Body: "raw" bodies (bodies which cannot be parsed as JSON or formdata) are removed + - Request Body: "raw" HTTP bodies (bodies which cannot be parsed as JSON or formdata) are removed - HTTP Headers: known sensitive headers such as `Authorization` or `Cookies` are removed too. - _Note_ that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK. - User-specific information (e.g. the current user ID according to the used web-framework) is not sent at all. - On desktop applications - The username logged in the device is not included. This is often a person's name. - The machine name is not included, for example `Bruno's laptop` -- SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address.\* +- SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address. - Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. - This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. - But also developers to understand if issues in their application are being triggered by a single malicious source. @@ -87,6 +87,22 @@ https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions If the `message` in a breadcrumb contains an URL it should be formatted the same way as in `http` spans (see above). The query and the fragment should also be set in the data attribute like with `http` spans. +```js +getCurrentHub().addBreadcrumb({ + type: "http", + category: "xhr", + data: { + method: "POST", + url: "https://example.com/api/users/create.php", + "http.query": "username=ada&password=123&newsletter=0", + "http.fragment": "#foo", + }, +}); +``` + +Additionally all semantic conventions of OpenTelementry for database spans should be set in the `data` if applicable: +https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/ + ## Variable Size Fields in the event payload that allow user-specified or dynamic values are restricted in size. This applies to most meta data fields, such as variables in a stack trace, as well as contexts, tags and extra data: From 3717ed22d8ccfd063cd94866b3729159e38d4083 Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 11 Jan 2023 14:25:34 +0100 Subject: [PATCH 8/9] Clarified how IP addresses are handled, and removed general info not related to Sentry. --- src/docs/sdk/data-handling.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index ec090f4a6b..7ec7b4101d 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -24,9 +24,7 @@ Some examples of data guarded by this flag: - The username logged in the device is not included. This is often a person's name. - The machine name is not included, for example `Bruno's laptop` - SDKs don't set `{{auto}}` as `user.ip`. This instructs the server to keep the connection's IP address. -- Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. - - This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. - - But also developers to understand if issues in their application are being triggered by a single malicious source. +- Server SDKs remove the IP address of incoming HTTP requests. Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS. All other platforms require the event to include `user.ip={{auto}}` which happens if `sendDefaultPii` is set to true. From f42708ed7fd5291261eaa9b19a145ddefd7f8d26 Mon Sep 17 00:00:00 2001 From: Anton Pirker Date: Wed, 11 Jan 2023 14:26:48 +0100 Subject: [PATCH 9/9] Wording --- src/docs/sdk/data-handling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/data-handling.mdx b/src/docs/sdk/data-handling.mdx index 7ec7b4101d..2267f558a7 100644 --- a/src/docs/sdk/data-handling.mdx +++ b/src/docs/sdk/data-handling.mdx @@ -15,7 +15,7 @@ and is **disabled by default**. That means that data that is naturally sensitive Some examples of data guarded by this flag: -- When attaching HTTP requests and responses to events +- When attaching data of HTTP requests and/or responses to events - Request Body: "raw" HTTP bodies (bodies which cannot be parsed as JSON or formdata) are removed - HTTP Headers: known sensitive headers such as `Authorization` or `Cookies` are removed too. - _Note_ that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK.