Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsynchronousMetrics Cannot read from file /sys/block/rbd4/stat #34702

Closed
electrical opened this issue Feb 17, 2022 · 14 comments
Closed

AsynchronousMetrics Cannot read from file /sys/block/rbd4/stat #34702

electrical opened this issue Feb 17, 2022 · 14 comments

Comments

@electrical
Copy link

Describe the unexpected behaviour

At the moment we are receiving this alert every second:

<Error> void DB::AsynchronousMetrics::update(std::chrono::system_clock::time_point): Code: 74, e.displayText() = DB::ErrnoException: Cannot read from file /sys/block/rbd4/stat

While checking on the filesystem seems to indicate it's there.

clickhouse@chi-clickhouse-main-0-0-0:/$ cat /sys/block/rbd4/stat
     181      272    18778      202       57       41      784      207        0      340      100        0        0        0        0

clickhouse@chi-clickhouse-main-0-0-0:/$ ls -lash /sys/block/rbd4/stat
0 -r--r--r-- 1 root root 4.0K Feb 17 22:13 /sys/block/rbd4/stat

Clickhouse version: 21.8.14.5
OS: Docker/ubuntu ( running official yandex/clickhouse-server docker image )

@dhwell
Copy link

dhwell commented Feb 18, 2022

2022.02.18 14:31:06.003181 [ 197 ] {} <Error> void DB::AsynchronousMetrics::update(std::chrono::system_clock::time_point): Code: 74, e.displayText() = DB::ErrnoException: Cannot read from file /sys/block/sdc/stat, errno: 19, strerror: No such device, Stack trace (when copying this message, always include the lines below):

We had the same issue on k8s.

But the stat file already exist.
image

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Feb 19, 2022

It may be related to Linux namespaces in containers.
The error message is correct.

This is related to gathering of system metrics inside ClickHouse and does not affect queries processing and normal server operation. You can ignore this message.

@alexey-milovidov alexey-milovidov added the not planned Known issue, no plans to fix it currenlty label Feb 19, 2022
@alexey-milovidov alexey-milovidov self-assigned this Feb 19, 2022
@electrical
Copy link
Author

Hi @alexey-milovidov I strongly disagree with your statement and closing the issue. Can you please re-open it?
Although it doesn't effect query processing, it does fill up the logfile with useless log information since it gets logged every second.
Its interesting that other block devices don't have this issue ( I have rbd0 to rbd8 )

@alexey-milovidov
Copy link
Member

Will it be sufficient if we change log level of these messages to "Debug"?

@den-crane
Copy link
Contributor

den-crane commented Feb 21, 2022

related #32807 #28852

@SaltTan
Copy link
Contributor

SaltTan commented Feb 21, 2022

I was getting an exception like this one after a disk was replaced in a server.
A restart fixed the issue.

@electrical
Copy link
Author

Will it be sufficient if we change log level of these messages to "Debug"?

That is one option yeah, but i'm still confused to why we can access the file inside the container but Clickhouse can't access it.
Restarting CH will solve it for a while but at some point it starts to alert again.

@jpiper
Copy link

jpiper commented Mar 10, 2022

@alexey-milovidov I think setting the log level to debug makes sense, I do can't just ignore these logs in our production environment as they're really frequent and it causes confusion to people that aren't aware these logs are not an issue

not a contribution

@jun0tpyrc
Copy link

same issues for me, it is kind of spamming our log pipelines

@aadant
Copy link

aadant commented Jun 2, 2022

can you make that happen (debug instead of error) ?

@filimonov
Copy link
Contributor

filimonov commented Jun 6, 2022

Related to #24416

Similar to #27031 and #33639

I think that exception should be either suppressed at all (or maybe reported once)

@Slach
Copy link
Contributor

Slach commented Jun 28, 2022

@electrical
according to
https://github.com/ClickHouse/ClickHouse/blame/master/src/Interpreters/AsynchronousMetrics.cpp#L1082-L1096
should device list should re-create after error
change available in 21.9+

@alexey-milovidov alexey-milovidov added st-fixed and removed not planned Known issue, no plans to fix it currenlty labels Aug 8, 2022
@Slach
Copy link
Contributor

Slach commented Aug 8, 2022

@alexey-milovidov could you share pull request where issue was fixed?

@filimonov
Copy link
Contributor

filimonov commented Jan 13, 2023

#33639 (22.2)
#44895 (23.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests