Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Displayed failed on passed disks #270

Closed
azukaar opened this issue May 29, 2022 · 13 comments
Closed

[BUG] Displayed failed on passed disks #270

azukaar opened this issue May 29, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@azukaar
Copy link

azukaar commented May 29, 2022

Describe the bug

A whole bunch of disks are marked as failed but in the details, they actually pass everything.
Might be worth nothing that it seems like the test has been done while my RAID 6 of 10 disks had 2 disks missing.

image

image

Log gathering did not work, first command seems to run fine but when copyinng them out the log file doesnt exist?

@azukaar azukaar added the bug Something isn't working label May 29, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented May 29, 2022

hm, I think I may have a logic bug causing Scrutiny attribute warnings to mark the disk as failed.
Let me confirm

@shamoon
Copy link
Contributor

shamoon commented May 29, 2022

You also mentioned it’s not possible to “un-set” failed for disks that already failed, is that the case even for these? Is there a simple way to manually do that? Thanks

@darkknight7777777
Copy link

I'm seeing similar failures on dashboard (but show passing in details) after migrating from linuxserver docker container to official omnibus.

@AnalogJ
Copy link
Owner

AnalogJ commented May 31, 2022

You can do it manually by connecting to the web/api container, and running the following commands:

# connect to scrutiny docker container
docker exec -it scrutiny bash

# install sqlite CLI tools (inside container)
apt update && apt install -y sqlite3

# connect to the scrutiny database
sqlite3 /opt/scrutiny/config/scrutiny.db

# reset/update the devices table, unset the failure status. 
UPDATE devices SET device_status = null

# exit sqlite CLI
.exit

Those steps will reset the devices, but you'll want to re-run the collector afterwards (or wait for a scheduled run) to ensure that a failing disk was not accidentally set to passing.

Closing this issue for now, please feel free to reopen/comment if you have any questions.

@AnalogJ AnalogJ closed this as completed May 31, 2022
@azukaar
Copy link
Author

azukaar commented May 31, 2022

Sorry I'd like to re-open the issue, since my original issue was not an issue with previously failed disk not restting, but disks that never seemed to have failed ever, passing all the test but still showing as failed

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 1, 2022

ah, apologies @azukaar you're correct.

Can you follow these instructions to generate some debug log files for me?

docker run -it --rm -p 8080:8080 \
-v `pwd`/config:/opt/scrutiny/config \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e DEBUG=true \
-e COLLECTOR_LOG_FILE=/opt/scrutiny/config/collector.log \
-e SCRUTINY_LOG_FILE=/opt/scrutiny/config/web.log \
--name scrutiny \
ghcr.io/analogj/scrutiny:master-omnibus

# in another terminal trigger the collector
docker exec scrutiny scrutiny-collector-metrics run

The log files will be available on your host in the config directory. Please attach them to this issue.

@AnalogJ AnalogJ reopened this Jun 1, 2022
@azukaar
Copy link
Author

azukaar commented Jun 7, 2022

I just tried starting with the following setup
image

But I dont seem to be able to access the webui anymore. Removing the new ENV does NOT fix the issue.
image

Is that a known issue? Did I do something wrong?
Thanks

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 8, 2022

@azukaar sorry that was a issue with the automated build system (#287) It's been fixed in the latest images. Please do a docker pull.

@azukaar
Copy link
Author

azukaar commented Jun 8, 2022

Thanks @AnalogJ I pulled again it works, except / redirect to /web instead of /web/dashboard
There you go for the logs: https://pastebin.com/TiC1PzW5

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 11, 2022

Hey everyone, I think theres another unrelated logic bug related to the sort order for data coming from InfluxDB.
I'm fixing the issue right now, I should have a fix in the beta branch momentarily.

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 11, 2022

@azukaar the / redirect to /web is correct, and then the angular app should pick it up and route it to /web/dashboard

Are you not seeing that happen?

@AnalogJ AnalogJ closed this as completed Jun 11, 2022
@azukaar
Copy link
Author

azukaar commented Jun 13, 2022

it wasnt doing it for a while, may be cache issue, the redirect is working now
So just so I understand, you found the issue with the logic causing the disks to show as failed?

@AnalogJ
Copy link
Owner

AnalogJ commented Jun 14, 2022

I'll be posting a summary in #255 soon, please take a look there if I forget to update this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants