Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Option to 'remove' disk from monitoring #69

Closed
teambvd opened this issue Sep 29, 2020 · 13 comments · Fixed by #263
Closed

[FEAT] Option to 'remove' disk from monitoring #69

teambvd opened this issue Sep 29, 2020 · 13 comments · Fixed by #263
Labels
enhancement New feature or request

Comments

@teambvd
Copy link
Contributor

teambvd commented Sep 29, 2020

Is your feature request related to a problem? Please describe.
As drives fail and are replaced, the ability to remove previously existing drives will become paramount. I came across this while testing, and had an 'orphaned' version of a drive - it still shows as Green status, but hasn't reported since the 21st. Currently the only method to do this is to manually edit the DB.

Describe the solution you'd like
In the menu for a given drive, below the 'View Details' option, add a 'Remove' or 'Unmonitor' button. This button could have two potential actions, depending on how you wish to implement it:

  • Simply drop all records of the drive from the DB, OR
  • Move the data for that drive to an 'Archive' section, away from the main dashboard.

First option is far easier in the short run, but the second option allows the user to maintain a full historical record of their system and would be preferable long term.

@AnalogJ AnalogJ added the enhancement New feature or request label Oct 1, 2020
@AnalogJ
Copy link
Owner

AnalogJ commented Oct 2, 2020

Yep, this is a great idea, which I definitely plan on implementing.

@joe-eklund
Copy link

I just had to replace two drives and would love this feature a lot! I agree with @teambvd in that having a history of drives would be useful, so my vote is to have some sort of archive section.

In the mean time, since I have so many drives and needed the clear out the old ones I just edited the database manually like @teambvd suggested. For anyone wanting to do the same, here were the commands I used:

  • Identify the serial number of the drive through Scrutiny Web that you want to remove.
  • Open scrutiny.db with sqlite3 by running sqlite3 scrutiny.db. You may have to use sudo to get write privileges. Also make sure to spin down Scrutiny Web and Scrutiny Collector while you are editing the database.
  • Make sure you have the right entry by running SELECT * FROM devices WHERE serial_number LIKE 'YOUR_SERIAL_HERE';
  • Delete by running DELETE FROM devices WHERE serial_number LIKE 'YOUR_SERIAL_HERE';

It is now removed 😃

@AnalogJ
Copy link
Owner

AnalogJ commented Dec 19, 2020

thanks for documenting this workaround @joe-eklund ! It's definitely on my todo list.

@zilexa
Copy link

zilexa commented Apr 26, 2021

@joe-eklund
What if ther is a device that has no details? How can I remove it just by its path?
Screenshot from 2021-04-26 11-11-14

This appeared after adding a disk with sat type to collector.yaml, it wasn't recognised and had to use sat,auto instead. But now there are 2: one fully recognised with all details and this dangling item..

@FingerlessGlov3s
Copy link

+1

@joe-eklund
Copy link

@zilexa you could try to find which entry it is in the database by running: SELECT * FROM devices and seeing which entry in there has missing information.

I believe the WWN is used as the primary key, given this is the output of sqlite> PRAGMA table_info(devices);

0|created_at|datetime|0||0
1|updated_at|datetime|0||0
2|deleted_at|datetime|0||0
3|wwn|text|0||1
4|host_id|text|0||0
5|device_name|text|0||0
6|manufacturer|text|0||0
7|model_name|text|0||0
8|interface_type|text|0||0
9|interface_speed|text|0||0
10|serial_number|text|0||0
11|firmware|text|0||0
12|rotation_speed|integer|0||0
13|capacity|integer|0||0
14|form_factor|text|0||0
15|smart_support|numeric|0||0
16|device_protocol|text|0||0
17|device_type|text|0||0

@zilexa
Copy link

zilexa commented Apr 28, 2021

In this case there is no WWN for that one that needs to be removed:

2021-04-26 10:51:27.726505126+02:00|2021-04-28 17:39:00.935623477+02:00||0x5000c500d3e24eae||sde||ST5000LM000-2AN170||6.0 Gb/s|WCJ3V582|0001|5526|5000981078016|2.5 inches|0|ATA|scsi
2021-04-26 10:57:04.961882191+02:00|2021-04-26 10:57:04.961882191+02:00||||nvme0n1|||||||0|0||0||sat

the first one is OK, I want to keep that, the second one needs to be deleted from the db.

@joe-eklund
Copy link

@zilexa ok I believe you can accomplish what you want by doing this:

SELECT * FROM devices WHERE created_at LIKE '2021-04-26 10:57:04.961882191+02:00';

Verify the only entry returned is the one you want to delete, then run:

DELETE FROM devices WHERE created_at LIKE '2021-04-26 10:57:04.961882191+02:00';

It should then be deleted.

@joe-eklund
Copy link

Side note, @AnalogJ it's probably a good idea to also generate an internal UUID as part of the PK so you can fix issues like this more easily. That way we can always count on some part of the PK to be there, instead of having to use something like the created_at.

@zilexa
Copy link

zilexa commented Apr 29, 2021

@joe-eklund that worked and thanks this gives me a bit more insight in how to use SQL in general :)

@zilexa
Copy link

zilexa commented May 1, 2021

@AnalogJ
Unfortunately, after every run nvme1n1 keeps appearing twice: once with all info correctly and once completely empty in the db.

2021-04-26 10:59:18.768063702+02:00|2021-05-01 10:18:35.865140488+02:00||eui.e8238fa6bf530001001b444a465b5590||nvme1n1||WDC WDS100T2B0C-00PXH0|||2043CC458603|211070WD|0|1000204886016||0|NVMe|auto
2021-05-01 10:18:34.174438725+02:00|2021-05-01 10:18:34.174438725+02:00||||nvme1n1|||||||0|0||0||sat

My log:

  • /dev/nvme0n1: m.2 system nvme ssd.
  • /dev/nvme1n1: m.2 in a PCI Express card (without chip).
  • /dev/sdc is a SATA SSD. All others are the same type/model Seagate 5TB 2.5".
  • dev/sda and /dev/sdd had no mountpoint (unmounted) during this run.
  • I do not know why there is a checksum error with /dev/sde
time="2021-05-01T10:18:33+02:00" level=info msg="Verifying required tools" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --scan -j" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j /dev/sde" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Generating WWN" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j -d sat /dev/nvme0n1" type=metrics
time="2021-05-01T10:18:33+02:00" level=error msg="Could not retrieve device information for nvme0n1: exit status 2" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j -d auto /dev/nvme0n1" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Using WWN Fallback" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j -d sat /dev/nvme1n1" type=metrics
time="2021-05-01T10:18:33+02:00" level=error msg="Could not retrieve device information for nvme1n1: exit status 2" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j -d auto /dev/nvme1n1" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Using WWN Fallback" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j /dev/sda" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Generating WWN" type=metrics
time="2021-05-01T10:18:33+02:00" level=info msg="Executing command: smartctl --info -j /dev/sdb" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Generating WWN" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Executing command: smartctl --info -j /dev/sdc" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Generating WWN" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Executing command: smartctl --info -j /dev/sdd" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Generating WWN" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Collecting smartctl results for sde\n" type=metrics
time="2021-05-01T10:18:34+02:00" level=info msg="Executing command: smartctl -x -j /dev/sde" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl returned an error code (4) while processing sde\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Publishing smartctl results for 0x5000c500d3e24eae\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Collecting smartctl results for nvme0n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Executing command: smartctl -x -j -d sat /dev/nvme0n1" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl returned an error code (2) while processing nvme0n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl could not open device" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Publishing smartctl results for \n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Collecting smartctl results for nvme0n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Executing command: smartctl -x -j -d auto /dev/nvme0n1" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Publishing smartctl results for eui.0025388601d346a9\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Collecting smartctl results for nvme1n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Executing command: smartctl -x -j -d sat /dev/nvme1n1" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl returned an error code (2) while processing nvme1n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=error msg="smartctl could not open device" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Publishing smartctl results for \n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Collecting smartctl results for nvme1n1\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Executing command: smartctl -x -j -d auto /dev/nvme1n1" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Publishing smartctl results for eui.e8238fa6bf530001001b444a465b5590\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2021-05-01T10:18:35+02:00" level=info msg="Executing command: smartctl -x -j /dev/sda" type=metrics
time="2021-05-01T10:18:37+02:00" level=error msg="smartctl returned an error code (4) while processing sda\n" type=metrics
time="2021-05-01T10:18:37+02:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2021-05-01T10:18:37+02:00" level=info msg="Publishing smartctl results for 0x5000c500cf9b57ba\n" type=metrics
time="2021-05-01T10:18:37+02:00" level=info msg="Collecting smartctl results for sdb\n" type=metrics
time="2021-05-01T10:18:37+02:00" level=info msg="Executing command: smartctl -x -j /dev/sdb" type=metrics
time="2021-05-01T10:18:38+02:00" level=error msg="smartctl returned an error code (4) while processing sdb\n" type=metrics
time="2021-05-01T10:18:38+02:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Publishing smartctl results for 0x5000c500d3e33f1d\n" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Collecting smartctl results for sdc\n" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Executing command: smartctl -x -j /dev/sdc" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Publishing smartctl results for 0x5001b444a750cd9f\n" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Collecting smartctl results for sdd\n" type=metrics
time="2021-05-01T10:18:38+02:00" level=info msg="Executing command: smartctl -x -j /dev/sdd" type=metrics
time="2021-05-01T10:18:40+02:00" level=error msg="smartctl returned an error code (4) while processing sdd\n" type=metrics
time="2021-05-01T10:18:40+02:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2021-05-01T10:18:40+02:00" level=info msg="Publishing smartctl results for 0x5000c500d4da47bc\n" type=metrics
time="2021-05-01T10:18:40+02:00" level=info msg="Main: Completed" type=metrics

To solve this AND still allow my nvme drives to be recognised as was solved here via workaround:
#153 (comment)
Instead of sat,auto I had to use auto only:

devices:
  - device: /dev/nvme0n1
    type: 'auto'
  - device: /dev/nvme1n1
    type: 'auto'

@FingerlessGlov3s
Copy link

FingerlessGlov3s commented Aug 6, 2021

Hi Guys,

I've got some updated SQL commands, so you can remove a device and then clean up the rest of the DB.

Use this command to enter the container docker exec -it scrutiny bash install sqlite3 command apt update && apt install sqlite3 -y
Enter the database sqlite3 /scrutiny/config/scrutiny.db

Then you'll need to either get the serial number or wwn of the device you want to remove and run the respective SQL command.
Serial: DELETE FROM devices WHERE serial_number LIKE 'YOUR_SERIAL_HERE';
WWN: DELETE FROM devices WHERE wwn LIKE 'YOUR_WWN_HERE';

Then to clean up the rest of the tables, run the below commands. Make sure its done in the displayed order

DELETE FROM smart_ata_attributes WHERE smart_id IN (SELECT id FROM smarts WHERE device_wwn NOT IN (SELECT wwn FROM devices));
DELETE FROM smart_nvme_attributes WHERE smart_id IN (SELECT id FROM smarts WHERE device_wwn NOT IN (SELECT wwn FROM devices));
DELETE FROM smart_scsi_attributes WHERE smart_id IN (SELECT id FROM smarts WHERE device_wwn NOT IN (SELECT wwn FROM devices));
DELETE FROM smarts WHERE device_wwn NOT IN (SELECT wwn FROM devices);
DELETE FROM self_tests WHERE device_wwn NOT IN (SELECT wwn FROM devices);

AnalogJ added a commit that referenced this issue May 26, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented May 28, 2022

fixed in v0.4.8 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants