Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ApiServerCommunicationError appears when SQLITE db is busy when serving concurrent clients (collectors) #347

Closed
KF5JWC opened this issue Aug 2, 2022 · 3 comments · Fixed by #352
Labels
bug Something isn't working

Comments

@KF5JWC
Copy link
Contributor

KF5JWC commented Aug 2, 2022

Describe the bug

When multiple collectors initiate a connection to the server at the same time, the server returns HTTP 500. The logs contain hints that the database lock is the root cause, but clients are not able to recover/retry.

Expected behavior

Multiple collectors initiate a connection to the server at the same time, and are able to complete their upload even if the request must be delayed and/or retried.

Example: 2 clients make the same request at the same time. One gets served (database locked), and the other gets a HTTP 503 with a Retry-After: rand(2,30) header.

Log Files

Client/Collector:

time="2022-08-01T18:55:02-05:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2022-08-01T18:55:07-05:00" level=error msg="An error occurred while retrieving filtered devices" type=metrics
2022/08/01 18:55:07 ERROR: ApiServerCommunicationError: "An error occurred while retrieving filtered devices"

Server:

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[1.007ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.13","2022-08-01 18:55:07.13",NULL,"","nvme1n1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.509ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.131","2022-08-01 18:55:07.131",NULL,"","nvme0","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.174ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.132","2022-08-01 18:55:07.132",NULL,"","nvme1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.225ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.132","2022-08-01 18:55:07.132",NULL,"","sata1","","","","","","","6.0 Gb/s","","",7200,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.213ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.133","2022-08-01 18:55:07.133",NULL,"","sata2","","","","","","","6.0 Gb/s","","",7200,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.189ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.133","2022-08-01 18:55:07.133",NULL,"","sata3","","","","","","","6.0 Gb/s","","",5425,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.219ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.134","2022-08-01 18:55:07.134",NULL,"","sata4","","","","","","","6.0 Gb/s","","",5425,,"3.5 inches",false,"ATA","sat","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`

2022/08/01 18:55:07 /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository_device.go:23 database is locked (5) (SQLITE_BUSY)
[0.215ms] [rows:0] INSERT INTO `devices` (`created_at`,`updated_at`,`deleted_at`,`wwn`,`device_name`,`device_uuid`,`device_serial_id`,`device_label`,`manufacturer`,`model_name`,`interface_type`,`interface_speed`,`serial_number`,`firmware`,`rotation_speed`,`capacity`,`form_factor`,`smart_support`,`device_protocol`,`device_type`,`label`,`host_id`,`device_status`) VALUES ("2022-08-01 18:55:07.134","2022-08-01 18:55:07.134",NULL,"","nvme0n1","","","","","","","","","",0,,"",false,"NVMe","nvme","","synology.local","0") ON CONFLICT (`wwn`) DO UPDATE SET `host_id`=`excluded`.`host_id`,`device_name`=`excluded`.`device_name`,`device_type`=`excluded`.`device_type`,`device_uuid`=`excluded`.`device_uuid`,`device_serial_id`=`excluded`.`device_serial_id`,`device_label`=`excluded`.`device_label`
time="2022-08-01T18:55:07-05:00" level=error msg="An error occurred while registering devices [database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY) database is locked (5) (SQLITE_BUSY)]"
time="2022-08-01T18:55:07-05:00" level=error msg="10.2.2.1 - scrutiny-5894d5497b-xpj94 [01/Aug/2022:18:55:07 -0500] \"POST /api/devices/register\" 500 17 \"\" \"Go-http-client/2.0\" (6ms)" clientIP=10.2.2.1 hostname=scrutiny-5894d5497b-xpj94 latency=6 method=POST path=/api/devices/register referer= respLength=17 statusCode=500 userAgent=Go-http-client/2.0

more logs here: #220 (comment)

@KF5JWC KF5JWC added the bug Something isn't working label Aug 2, 2022
@KF5JWC KF5JWC changed the title [BUG] ApiServerCommunicationError appears when SQLITE db is locked [BUG] ApiServerCommunicationError appears when SQLITE db is busy Aug 2, 2022
@KF5JWC
Copy link
Contributor Author

KF5JWC commented Aug 2, 2022

Scheduling the collectors to fire 2 minutes apart from each other avoids hitting this error, so I suspect it is accurate that the clients are in a race to be the first served and to get the database lock in their favor.

@AnalogJ
Copy link
Owner

AnalogJ commented Aug 2, 2022

hey @KF5JWC
Thanks for debugging this issue for me, this is great.
I'm going to close this as a dupe of #341 , but please follow up there if you have any other information you'd like to share.

@KF5JWC
Copy link
Contributor Author

KF5JWC commented Aug 2, 2022

That makes sense :)

Thanks for your work on this project!

@KF5JWC KF5JWC changed the title [BUG] ApiServerCommunicationError appears when SQLITE db is busy [BUG] ApiServerCommunicationError appears when SQLITE db is busy when serving concurrent clients (collectors) Aug 2, 2022
@AnalogJ AnalogJ closed this as completed Aug 3, 2022
@AnalogJ AnalogJ mentioned this issue Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants