Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/mysqlreceiver] Mysql receiver - Uint64 to Int64 overflow #35495

Closed
dloucasfx opened this issue Sep 30, 2024 · 10 comments · Fixed by #36914
Closed

[receiver/mysqlreceiver] Mysql receiver - Uint64 to Int64 overflow #35495

dloucasfx opened this issue Sep 30, 2024 · 10 comments · Fixed by #36914
Labels
bug Something isn't working help wanted Extra attention is needed receiver/mysql

Comments

@dloucasfx
Copy link
Contributor

dloucasfx commented Sep 30, 2024

Component(s)

receiver/mysql

What happened?

Description

User is hitting this error:
2024-09-12T10:05:08.788Z error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "mysql/replica", "data_type": "metrics", "error": "sql: Scan error on column index 7, name "SUM_TIMER_FETCH": converting driver.Value type uint64 ("10607806269779284266") to a int64: value out of range; sql: Scan error on column index 7, name "SUM_TIMER_FETCH": converting driver.Value type uint64 ("10607806392347803736") to a int64: value out of range; failed to parse int64 for MysqlBufferPoolPages, value was 18446744073709551264: strconv.ParseInt: parsing "18446744073709551264": value out of range", "scraper": "mysql"}

Analysis

  • For SUM_TIMER_FETCH we can safely avoid the overflow, by setting timeFetch to uint64 and convert it back to int64 after the division

    m.mb.RecordMysqlTableIoWaitTimeDataPoint(
    now, s.timeFetch/picosecondsInNanoseconds, metadata.AttributeIoWaitsOperationsFetch, s.name, s.schema,
    )

  • For MysqlBufferPoolPages not sure if we even support uint64 and what options we have to send this value

Collector version

v0.110.0

@dloucasfx dloucasfx added bug Something isn't working needs triage New item requiring triage labels Sep 30, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax
Copy link
Member

dmitryax commented Oct 1, 2024

OTLP doesn't support uint64, unfortunately.

mysql.buffer_pool.pages is marked as non-monotonic sum. But such a huge number makes it sound like it's actually a plain monotonic sum. In that case, we can probably just drop the first bit of uint64, because we care more about the difference between the scrapes rather than the magnitude of each value...

@crobert-1
Copy link
Member

Removing needs triage based on a path forward being proposed by a project maintainer.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Oct 1, 2024
@dloucasfx
Copy link
Contributor Author

mysql.buffer_pool.pages is marked as non-monotonic sum. But such a huge number makes it sound like it's actually a plain monotonic sum. In that case, we can probably just drop the first bit of uint64, because we care more about the difference between the scrapes rather than the magnitude of each value...
@dmitryax
going through the logs, it does not look like it’s a monotonically increasing counter. I used awk to preserve the order.

perl -nle'print $& while m{failed to parse int64 for MysqlBufferPoolPages, value was\s+\K\d+}g' ~/Downloads/extract-2024-10-02T01_12_38.009Z.csv | awk '!a[$0]++'                         ok
18446744073709551264
18446744073709551265
18446744073709551266
18446744073709551271
18446744073709551267
18446744073709551276
18446744073709551272
18446744073709551270
18446744073709551284
18446744073709551306
18446744073709551269
18446744073709551275
18446744073709551285
18446744073709551287
18446744073709551321

Example of full log statement showing the timestamp and value not always incrementing:

1- 18446744073709551264 at 2024-10-02T01:12:07.891Z

"2024-10-02T01:12:07.891Z","""splunk-otel-collector-k8s-cluster-receiver-6864bbf5-pc6px""","""splunk-otel-collector""","2024-10-02T01:12:06.163Z        error   scraperhelper/scrapercontroller.go:197  Error         scraping metrics  {""kind"": ""receiver"", ""name"": ""mysql/primary"", ""data_type"": ""metrics"", ""error"": ""failed to parse int64 for MysqlBufferPoolPages, value was 18446744073709551264: strconv.ParseInt:    parsing \""18446744073709551264\"": value out of range"", ""scraper"": ""mysql""}"

2- 18446744073709551271 at 2024-10-02T00:21:24.036Z

"2024-10-02T00:21:24.036Z","""splunk-otel-collector-k8s-cluster-receiver-6864bbf5-pc6px""","""splunk-otel-collector""","2024-10-02T00:12:06.162Z        error   scraperhelper/scrapercontroller.go:197  Error         scraping metrics  {""kind"": ""receiver"", ""name"": ""mysql/primary"", ""data_type"": ""metrics"", ""error"": ""failed to parse int64 for MysqlBufferPoolPages, value was 18446744073709551271: strconv.ParseInt:    parsing \""18446744073709551271\"": value out of range"", ""scraper"": ""mysql""}

3- 18446744073709551269 at 2024-10-01T22:15:54.667Z

"2024-10-01T22:15:54.667Z","""splunk-otel-collector-k8s-cluster-receiver-6864bbf5-pc6px""","""splunk-otel-collector""","2024-10-01T22:06:06.155Z        error   scraperhelper/scrapercontroller.go:197  Error         scraping metrics  {""kind"": ""receiver"", ""name"": ""mysql/primary"", ""data_type"": ""metrics"", ""error"": ""failed to parse int64 for MysqlBufferPoolPages, value was 18446744073709551269: strconv.ParseInt:    parsing \""18446744073709551269\"": value out of range"", ""scraper"": ""mysql""}"

Copy link
Contributor

github-actions bot commented Dec 3, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 3, 2024
@mohammedx3
Copy link

any updates on this?
version 0.110.3 with the same issue

@djaglowski djaglowski added help wanted Extra attention is needed and removed Stale labels Dec 3, 2024
@atoulme
Copy link
Contributor

atoulme commented Dec 18, 2024

I'm not sure this is it, but here is a modest attempt. Since time is reported in picoseconds, the scraper currently divides all values by 1000 at recording time. I moved up the division to SQL and we will get nanos instead, and they will likely not overflow.
This fix doesn't solve the whole problem ; it has the benefit of being a no-op functionality wise.

Let's see if it passes review: #36879

@djaglowski
Copy link
Member

Closed by #36879

@atoulme
Copy link
Contributor

atoulme commented Dec 20, 2024

Hang on, I don't know how to attend to MysqlBufferPoolPages yet.

@djaglowski djaglowski reopened this Dec 20, 2024
@atoulme
Copy link
Contributor

atoulme commented Dec 20, 2024

I have done more digging and found this interesting tidbit:
https://dev.mysql.com/doc/refman/8.4/en/server-status-variables.html#statvar_Innodb_buffer_pool_pages_misc

When using compressed tables, Innodb_buffer_pool_pages_misc may report an out-of-bounds value (Bug #59550).

Here is the bug:
https://bugs.mysql.com/bug.php?id=59550

The bug mentions this type of value:

Seeing same on 5.5.24 on Debian Squeeze, with compression enabled on one huge table.

| Innodb_buffer_pool_pages_data         | 116915               |
| Innodb_buffer_pool_pages_dirty        | 206                  |
| Innodb_buffer_pool_pages_misc         | 18446744073709532994 |
| Innodb_buffer_pool_pages_total        | 98304                |

I recommend we deal with this issue for this particular metric by not recording it and not reporting an error, instead choosing to report a warning with a reference to the bug report. Please let me know if this path is advisable.

EDIT: I have opened this PR for your review which implements the fix outlined in this comment.
#36914

djaglowski pushed a commit that referenced this issue Jan 6, 2025
…36914)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Ignore the value returned by the stat Innodb_buffer_pool_pages_misc if
out-of-bounds.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes #35495

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Add an integration test.
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this issue Jan 13, 2025
…pen-telemetry#36914)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Ignore the value returned by the stat Innodb_buffer_pool_pages_misc if
out-of-bounds.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35495

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Add an integration test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed receiver/mysql
Projects
None yet
6 participants