Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk error on NetBSD #99

Closed
fish4terrisa-MSDSM opened this issue May 9, 2023 · 8 comments
Closed

Disk error on NetBSD #99

fish4terrisa-MSDSM opened this issue May 9, 2023 · 8 comments
Labels
bug Something isn't working inefficiency Better implementation is desired

Comments

@fish4terrisa-MSDSM
Copy link
Contributor

I have just got this error:

WARN: Possible deadlock at src/devices/nvme.c@346                                                 
WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown                                                                  
WARN: Attempting to recover execution...                                                           
* * * * * * *                                                                                                                                                                                      WARN: Possible deadlock at src/devices/nvme.c@346                                                 
* WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown
WARN: Attempting to recover execution...
 * * * * * * *

Is that a bug of RVVM or SDF?
P.S. The time of the files on SDF is inaccurate.That might be the problem(?)
Is it possible to disable the lock by the user?

@fish4terrisa-MSDSM fish4terrisa-MSDSM added the bug Something isn't working label May 9, 2023
@fish4terrisa-MSDSM
Copy link
Contributor Author

I have just got this error:

WARN: Possible deadlock at src/devices/nvme.c@346                                                 
WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown                                                                  
WARN: Attempting to recover execution...                                                           
* * * * * * *                                                                                                                                                                                      WARN: Possible deadlock at src/devices/nvme.c@346                                                 
* WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown
WARN: Attempting to recover execution...
 * * * * * * *

Is that a bug of RVVM or SDF? P.S. The time of the files on SDF is inaccurate.That might be the problem(?) Is it possible to disable the lock by the user?

Maybe it just waited a bit long , or the time is wrong .So is it possible to keep wait even through a deadlock is possible?

@fish4terrisa-MSDSM
Copy link
Contributor Author

fish4terrisa-MSDSM commented May 9, 2023

I have just got this error:

WARN: Possible deadlock at src/devices/nvme.c@346                                                 
WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown                                                                  
WARN: Attempting to recover execution...                                                           
* * * * * * *                                                                                                                                                                                      WARN: Possible deadlock at src/devices/nvme.c@346                                                 
* WARN: The lock was previously held at src/devices/nvme.c@330
WARN: Version: RVVM v0.6-unknown
WARN: Attempting to recover execution...
 * * * * * * *

Is that a bug of RVVM or SDF? P.S. The time of the files on SDF is inaccurate.That might be the problem(?) Is it possible to disable the lock by the user?

Maybe it just waited a bit long , or the time is wrong .So is it possible to keep wait even through a deadlock is possible?

Now I just linked spin_lock() to spin_lock_slow_real(),it seems worked fine , but I will wait for more testing,but the file reads and writes seems faster

@fish4terrisa-MSDSM
Copy link
Contributor Author

And , I got the disk read and write slow when installing perl,that might give you some clues.

@LekKit
Copy link
Owner

LekKit commented May 9, 2023

It seems your host storage is executing IO syscalls for very long and the device is impatient, and thinks that something locked up somewhere (Yeah that's how I debug those things).
I'll see what can be done to improve locking efficiency, thanks for the report.

Is that a bug of RVVM or SDF?

Inefficiency of SDF disk access coupled with RVVM inefficient locking in nvme emulation

Is it possible to disable the lock by the user?

No, the lock is there to protect from dataraces. Without it, chaos and corruption ensues.
It is unrelated from on-disk file locks that protect the image from being used by >1 VM at once.

@LekKit LekKit added the inefficiency Better implementation is desired label May 9, 2023
@fish4terrisa-MSDSM
Copy link
Contributor Author

So maybe the user can choose to use _slow_real to replace the _real lock?

@LekKit
Copy link
Owner

LekKit commented May 10, 2023

Maybe. But I think this needs a more elaborate fix that excludes the lock from being held in IO path

@LekKit
Copy link
Owner

LekKit commented Jun 20, 2023

Working on re-queuing IO commands from NVMe ring to the threadpool queue. This could mean more IO parallelism, running IO without any locks and less stuttering when many VMs are involved.

So far it is working good enough but a slight performance drop is observed in terms of maximum IOPS. Hope to get this completed till v0.6 release.

@LekKit
Copy link
Owner

LekKit commented Jun 21, 2023

Commit 5a6466f should fix this. The locks are never held under IO, and I purposely inserted long (30 second) sleep into IO commands and can't reproduce lockup warnings anymore.
I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inefficiency Better implementation is desired
Projects
Status: Testing
Development

No branches or pull requests

2 participants