Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]smartctl checksum errors #46

Closed
paulmorabito opened this issue Sep 28, 2020 · 26 comments · Fixed by #88
Closed

[BUG]smartctl checksum errors #46

paulmorabito opened this issue Sep 28, 2020 · 26 comments · Fixed by #88
Labels
bug Something isn't working waiting for response

Comments

@paulmorabito
Copy link

Hi,

i'm using the linuxserver.io docker image (latest tag) and currently am getting the following errors when running "scrutiny-collector-metrics run"

`root@abc9cc899866:/# scrutiny-collector-metrics run


/ ) / )( _ ( )( )( )( )( ( )( / )
_
( (
) / )(
)( )( )( ) ( \ /
(
/ _)()_)() () ()()_) (__)
AnalogJ/scrutiny/metrics dev-0.1.13

INFO[0000] Verifying required tools type=metrics
INFO[0000] Sending detected devices to API, for filtering & validation type=metrics
INFO[0000] Main: Waiting for workers to finish type=metrics
INFO[0000] Collecting smartctl results for sdd type=metrics
INFO[0000] Collecting smartctl results for sda type=metrics
INFO[0000] Collecting smartctl results for sdb type=metrics
INFO[0000] Collecting smartctl results for sdc type=metrics
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
1
],
"svn_revision": "5022",
"platform_info": "x86_64-linux-4.14.24-qnap",
"build_info": "(local build)",
"argv": [
"smartctl",
"-a",
"-j",
"/dev/sda"
],
"exit_status": 4
},
"device": {
"name": "/dev/sda",
"info_name": "/dev/sda",
"type": "scsi",
"protocol": "SCSI"
},
"vendor": "WDC",
"product": "WD100EMAZ-00WJTA",
"model_name": "WDC WD100EMAZ-00WJTA",
"revision": "83.H",
"scsi_version": "SPC-3",
"user_capacity": {
"blocks": 19532873728,
"bytes": 10000831348736
},
"logical_block_size": 512,
"physical_block_size": 4096,
"rotation_rate": 5400,
"form_factor": {
"scsi_value": 2,
"name": "3.5 inches"
},
"serial_number": "2YJDN6SD",
"device_type": {
"scsi_value": 0,
"name": "disk"
},
"local_time": {
"time_t": 1601292556,
"asctime": "Mon Sep 28 20:29:16 2020 KST"
},
"temperature": {
"current": 0,
"drive_trip": 0
}
}
ERRO[0000] smartctl returned an error code (4) while processing sda type=metrics
ERRO[0000] smartctl detected a checksum error type=metrics
INFO[0000] Publishing smartctl results for unknown type=metrics
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
1
],
"svn_revision": "5022",
"platform_info": "x86_64-linux-4.14.24-qnap",
"build_info": "(local build)",
"argv": [
"smartctl",
"-a",
"-j",
"/dev/sdb"
],
"exit_status": 4
},
"device": {
"name": "/dev/sdb",
"info_name": "/dev/sdb",
"type": "scsi",
"protocol": "SCSI"
},
"vendor": "WDC",
"product": "WD100EMAZ-00WJTA",
"model_name": "WDC WD100EMAZ-00WJTA",
"revision": "83.H",
"scsi_version": "SPC-3",
"user_capacity": {
"blocks": 19532873728,
"bytes": 10000831348736
},
"logical_block_size": 512,
"physical_block_size": 4096,
"rotation_rate": 5400,
"form_factor": {
"scsi_value": 2,
"name": "3.5 inches"
},
"serial_number": "2YJ8S5BD",
"device_type": {
"scsi_value": 0,
"name": "disk"
},
"local_time": {
"time_t": 1601292556,
"asctime": "Mon Sep 28 20:29:16 2020 KST"
},
"temperature": {
"current": 0,
"drive_trip": 0
}
}
ERRO[0000] smartctl returned an error code (4) while processing sdb type=metrics
ERRO[0000] smartctl detected a checksum error type=metrics
INFO[0000] Publishing smartctl results for unknown type=metrics
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
1
],
"svn_revision": "5022",
"platform_info": "x86_64-linux-4.14.24-qnap",
"build_info": "(local build)",
"argv": [
"smartctl",
"-a",
"-j",
"/dev/sdd"
],
"exit_status": 4
},
"device": {
"name": "/dev/sdd",
"info_name": "/dev/sdd",
"type": "scsi",
"protocol": "SCSI"
},
"vendor": "WDC",
"product": "WD100EMAZ-00WJTA",
"model_name": "WDC WD100EMAZ-00WJTA",
"revision": "83.H",
"scsi_version": "SPC-3",
"user_capacity": {
"blocks": 19532873728,
"bytes": 10000831348736
},
"logical_block_size": 512,
"physical_block_size": 4096,
"rotation_rate": 5400,
"form_factor": {
"scsi_value": 2,
"name": "3.5 inches"
},
"serial_number": "2YJDUTKD",
"device_type": {
"scsi_value": 0,
"name": "disk"
},
"local_time": {
"time_t": 1601292556,
"asctime": "Mon Sep 28 20:29:16 2020 KST"
},
"temperature": {
"current": 0,
"drive_trip": 0
}
}
ERRO[0000] smartctl returned an error code (4) while processing sdd type=metrics
ERRO[0000] smartctl detected a checksum error type=metrics
INFO[0000] Publishing smartctl results for unknown type=metrics
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
1
],
"svn_revision": "5022",
"platform_info": "x86_64-linux-4.14.24-qnap",
"build_info": "(local build)",
"argv": [
"smartctl",
"-a",
"-j",
"/dev/sdc"
],
"exit_status": 4
},
"device": {
"name": "/dev/sdc",
"info_name": "/dev/sdc",
"type": "scsi",
"protocol": "SCSI"
},
"vendor": "WDC",
"product": "WD100EMAZ-00WJTA",
"model_name": "WDC WD100EMAZ-00WJTA",
"revision": "83.H",
"scsi_version": "SPC-3",
"user_capacity": {
"blocks": 19532873728,
"bytes": 10000831348736
},
"logical_block_size": 512,
"physical_block_size": 4096,
"rotation_rate": 5400,
"form_factor": {
"scsi_value": 2,
"name": "3.5 inches"
},
"serial_number": "JEHN4M1N",
"device_type": {
"scsi_value": 0,
"name": "disk"
},
"local_time": {
"time_t": 1601292556,
"asctime": "Mon Sep 28 20:29:16 2020 KST"
},
"temperature": {
"current": 0,
"drive_trip": 0
}
}
ERRO[0000] smartctl returned an error code (4) while processing sdc type=metrics
ERRO[0000] smartctl detected a checksum error type=metrics
INFO[0000] Publishing smartctl results for unknown type=metrics
INFO[0001] Main: Completed type=metrics
root@abc9cc899866:/# `

After running, I can only see /dev/sda in the web UI and it has no details (SMART reports as failed).

I'm running this on a QNAP TS453Be.

Thanks,

@paulmorabito paulmorabito added the bug Something isn't working label Sep 28, 2020
@Ma3-a
Copy link

Ma3-a commented Sep 28, 2020

I had the same issue on unraid with the initial release but 0.2.1 fixed it. It looks like https://hub.docker.com/r/linuxserver/scrutiny is not updated, see if you have the same issue with https://hub.docker.com/r/analogj/scrutiny

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 28, 2020

Yeah, please try my latest image on Docker Hub @paulmorabito. There's been a lot of fixes recently related to missing SMART data, and I'm not sure how far behind the LSIO image is.

@paulmorabito
Copy link
Author

Thanks @AnalogJ & @Ma3a-exe.

I changed the docker image to analogj/scrutiny:latest and am still getting the same errors:

ERRO[0001] smartctl returned an error code (4) while processing sdd type=metrics ERRO[0001] smartctl detected a checksum error type=metrics INFO[0001] Publishing smartctl results for unknown type=metrics INFO[0001] Main: Completed type=metrics

Is there anything I can do to debug further?

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 29, 2020

Yeah, Can you run the following commands, making sure to replace the --device calls with the path to the devices on your machine.

docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e DEBUG=true \
-e COLLECTOR_LOG_FILE=/tmp/collector.log \
-e SCRUTINY_LOG_FILE=/tmp/web.log \
--name scrutiny \
analogj/scrutiny

# in another terminal trigger the collector
docker exec scrutiny scrutiny-collector-metrics run

# then use docker cp to copy the log files out of the container.
docker cp scrutiny:/tmp/collector.log collector.log
docker cp scrutiny:/tmp/web.log web.log

Once you've copied the log files from your container, please attach them here

@paulmorabito
Copy link
Author

web.log
collector.log

Here you go. See attached.

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 29, 2020

Can you run the container with --privileged and see if docker exec scrutiny smartctl -a -j /dev/sda works.

Also can you give me some information about your system/OS?

@paulmorabito
Copy link
Author

I'm running the following command:

docker run -it --rm -p 8180:8080 \ -v /lib/udev:/run/udev:ro \ --cap-add SYS_RAWIO \ --device=/dev/sda \ --device=/dev/sdb \ --device=/dev/sdc \ --device=/dev/sdd \ --privileged \ -e DEBUG=true \ -e COLLECTOR_LOG_FILE=/tmp/collector.log \ -e SCRUTINY_LOG_FILE=/tmp/web.log \ --name scrutiny \ analogj/scrutiny

logs are attached. Error looks the same as previously.
collector.log
web.log

I don't have /run/udev but do have /lib/udev so I changed it. However, even if it is kept at /run/udev, the result is the same.

I'm running this on a QNAP TS453Be.

[~] # docker -v Docker version 17.09.1-ce, build 0bbe3ac

[~] # uname -a Linux jamong 4.14.24-qnap #1 SMP Fri May 29 01:04:45 CST 2020 x86_64 GNU/Linux

It's running QNAP's custom Linux so if there is a particular library or anything you want me to check for then let me know.

@maxxie85
Copy link

maxxie85 commented Oct 2, 2020

Hello,

I'm also getting the same error output when running the collector. The version I have installed is linux.amd-0.2.4.

ERRO[0000] smartctl returned an error code (4) while processing sdb type=metrics
ERRO[0000] smartctl detected a checksum error type=metrics
INFO[0000] Publishing smartctl results for 0x50026b727a01b7c2 type=metrics
INFO[0000] Main: Completed type=metrics

I used the manual install guide to install this on my proxmox host.

Command I use to run the collector
./bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://localhost:8080"

Smartctl version
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.65-1-pve] (local build)

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 4, 2020

The checksum error you mentioned is similar to this: https://www.smartmontools.org/ticket/347

Basically, the checksum error is from smartctl, and if smartctl doesn't like firmware/device/controller, there's not much that Scrutiny do about it.

Having said that, the issue seems to imply that the checksum error is mostly found with empty Error Logs, which Scrutiny doesn't even utilize yet.

Can you try getting smartctl to work directly on your host (outside the container)?
Possibly play around with the various device flags?

  • smartctl -x -j -d scsi /dev/sda
  • smartctl -x -j -d sat /dev/sda
  • smartctl -x -j -d ata /dev/sda

Once you have smartctl working on your host, we can figure out how to fix/override the detection in Scrutiny

@AnalogJ AnalogJ changed the title smartctl errors [BUG] [BUG]smartctl checksum errors Oct 4, 2020
@paulmorabito
Copy link
Author

I installed smartmontools 6.5 on the host via a QNAP package. Using -j always gives an unknown option error however this works:

smartctl -x -d sat /dev/sda

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 5, 2020

@paulmorabito does it work correctly if you run it in the container?

@paulmorabito
Copy link
Author

@AnalogJ "smartctl -x -j -d sat /dev/sda" works fine (as does with -j).

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 8, 2020

Hey everyone,

I just released a beta version of the Scrutiny docker image with support for overriding the collector device detection.

The instructions for how to create the collecttor config file, and the new docker image tag are available in the PR description:

#88

All feedback (success & failure) is appreciated :)

@TheCatLady
Copy link

@AnalogJ Beta image is working great over here on a QNAP TS-653D with both SATA & NVMe drives!

@paulmorabito
Copy link
Author

I'm getting no devices detected. Not sure why. Can you please check below:

scrutiny.yaml:

# Version
#
# version specifies the version of this configuration file schema, not
# the scrutiny binary. There is only 1 version available at the moment
version: 1

# This block allows you to override/customize the settings for devices detected by
# Scrutiny via `smartctl --scan`
# See the "--device=TYPE" section of https://linux.die.net/man/8/smartctl
# type can be a 'string' or a 'list'
devices:
  # example for forcing device type detection for a single disk
  - device: /dev/sda
    type: 'sat'
  - device: /dev/sdb
    type: 'sat'
  - device: /dev/sdc
    type: 'sat'
  - device: /dev/sdd
    type: 'sat'

docker-compose:

  scrutiny:
    image: analogj/scrutiny:detect
    container_name: scrutiny
    privileged: true
    cap_add:
      - SYS_RAWIO
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Asia/Seoul
      - SCRUTINY_API_ENDPOINT=http://localhost:8080
      - SCRUTINY_WEB=true
      - SCRUTINY_COLLECTOR=true
    volumes:
      - /share/persistent/scrutiny:/scrutiny/config
    devices:
      - /dev/sda:/dev/sda
      - /dev/sdb:/dev/sdb
      - /dev/sdc:/dev/sdc
      - /dev/sdd:/dev/sdd
    ports:
      - 8180:8080
    restart: always

logs:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.


[s6-init] ensuring user provided files have correct perms...exited 0.


[fix-attrs.d] applying ownership & permissions fixes...


[fix-attrs.d] done.


[cont-init.d] executing container initialization scripts...


[cont-init.d] done.


[services.d] starting services


[services.d] done.


starting jobber/cron


starting scrutiny


2020/10/08 11:33:44 Loading configuration file: /scrutiny/config/scrutiny.yaml




 ___   ___  ____  __  __  ____  ____  _  _  _  _


/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )


\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /


(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)


github.com/AnalogJ/scrutiny                             dev-0.2.4




Start the scrutiny server


[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.


 - using env:	export GIN_MODE=release


 - using code:	gin.SetMode(gin.ReleaseMode)




Trying to connect to database stored: /scrutiny/config/scrutiny.db


[GIN-debug] GET    /api/health               --> github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup.func1 (5 handlers)


[GIN-debug] POST   /api/health/notify        --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.SendTestNotification (5 handlers)


[GIN-debug] POST   /api/devices/register     --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.RegisterDevices (5 handlers)


[GIN-debug] GET    /api/summary              --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.GetDevicesSummary (5 handlers)


[GIN-debug] POST   /api/device/:wwn/smart    --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.UploadDeviceMetrics (5 handlers)


[GIN-debug] POST   /api/device/:wwn/selftest --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.UploadDeviceSelfTests (5 handlers)


[GIN-debug] GET    /api/device/:wwn/details  --> github.com/analogj/scrutiny/webapp/backend/pkg/web/handler.GetDeviceDetails (5 handlers)


[GIN-debug] GET    /web/*filepath            --> github.com/gin-gonic/gin.(*RouterGroup).createStaticHandler.func1 (5 handlers)


[GIN-debug] HEAD   /web/*filepath            --> github.com/gin-gonic/gin.(*RouterGroup).createStaticHandler.func1 (5 handlers)


[GIN-debug] GET    /                         --> github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup.func2 (5 handlers)


[GIN-debug] Listening and serving HTTP on 0.0.0.0:8080

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 8, 2020

Hey @paulmorabito it's actually a new config file named collector.yaml (separate from the existing scrutiny.yaml file.

@paulmorabito
Copy link
Author

Hey @paulmorabito it's actually a new config file named collector.yaml (separate from the existing scrutiny.yaml file.

I tried that first as I thought there might have been a typo with the yaml file naming. Either way, I am still getting no devices detected using either docker directly or with a compose file. I guess its something specific to my setup though as it's confirmed working elsehere.

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 9, 2020

@paulmorabito and just to confirm, you ran the collector manually right? It runs daily at midnight, but for now you need to trigger the first run manually:

docker exec scrutiny scrutiny-collector-metrics run

Apologies, I saw that you had uploaded your collector logs above. Can you open up another issue, we can try to debug this there.

@paulmorabito
Copy link
Author

@AnalogJ I removed the container, stack, db and yaml file and started again with the detect branch. Everything worked. Thank you for your help trouble shooting and also for fixing this. Really appreciated.

@maxxie85
Copy link

maxxie85 commented Oct 9, 2020

@AnalogJ I'm running the collector without docker (using manual installal) how can I point the collector to the config file, is there also a command line option to point the collector to the config file. e.g.

/opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --config /opt/scrutiny/config/collector.yaml --api-endpoint "http://localhost:8080"

Output of running the collector
output.txt

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 9, 2020

Hey @paulmorabito the code for the detect branch has been merged into master, you should be able to pull the latest analogj/scrutiny and it should just work for you :)

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 9, 2020

@maxxie85 that's actually the correct command.
--api-endpoint "http://localhost:8080" is actually unnecessary since that's the default value already.

Your log file says

time="2020-10-09T14:33:05+02:00" level=error msg="smartctl returned an error code (4) while processing sdb\n" type=metrics
time="2020-10-09T14:33:05+02:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2020-10-09T14:33:05+02:00" level=info msg="Publishing smartctl results for 0x50026b727a01b7c2\n" type=metrics
time="2020-10-09T14:33:05+02:00" level=info msg="Main: Completed" type=metrics

But do you see anything in your dashboard?

@paulmorabito
Copy link
Author

Hey @paulmorabito the code for the detect branch has been merged into master, you should be able to pull the latest analogj/scrutiny and it should just work for you :)

I just tried and it's still getting 0.2.1-dev on :latest rather than 0.2.4 which is the detect branch. I'll try again in another day and hopefully the docker image has been built and propagated to the docker hub repository by then.

@maxxie85
Copy link

@AnalogJ
Yes I can see the drives in the Dashboard, they do have both attributes that are shown, that wasn't the case for one drive in 0.2.x release.

@AnalogJ
Copy link
Owner

AnalogJ commented Oct 11, 2020

@maxxie85 Ah perfect. Yeah, previously non-zero smartctl exit codes would stop data from being sent back to the API.

@enoch85
Copy link
Contributor

enoch85 commented Dec 15, 2024

Sorry for necroposting, but I have the exact same issue, been reading this issue, and also looked elsewhere to try to solve it. My sitation is that the logs show

ERRO[0049] smartctl returned an error code (4) while processing sdb  type=metrics
ERRO[0049] smartctl detected a checksum error            type=metrics
INFO[0049] Publishing smartctl results for 0x55cdredacted  type=metrics
INFO[0049] Collecting smartctl results for sdc           type=metrics
INFO[0049] Executing command: smartctl --xall --json --device sat /dev/sdc  type=metrics
INFO[0049] Publishing smartctl results for 0x5000redacted  type=metrics

But it looks totally fine in Scrutiny GUI. Running smartctl manually doesn't show any error either.

Any advice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for response
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants