Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup correct nvme controller and make child add v1 idempotent #695

Merged
merged 3 commits into from
Dec 6, 2023

Conversation

tiagolobocastro
Copy link
Contributor

@tiagolobocastro tiagolobocastro commented Dec 5, 2023

feat(csi/node/timeout): add nvme-io-engine timeout and parse humantime

Adds new parameter "--nvme-io-timeout".
This is used to set the timeout per nvme block device.
TODO: Check if this is enough to avoid setting the global timeout..
Also let's parse the "--nvme-core-io-timeout" as humantime as well..

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nexus/add-child/v1): make add child v1 idempotent

When v1 nexus add child was added, it was not made idempotent.
Even though this is not an issue per se, as the child eventually gets
GCd and re-added it can cause strange logging..
TODO: should we have different behaviour depending on the state?
Example if faulted should we remove/readd?
Bonus: Fixes old test which stopped working a long time ago when
pstor was enabled for the data-plane by not enabling it for that
particular test only..

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(csi-node/nvmf/fixup): fixup correct nvme controller

When we replace an existing path, the new path has a different controller number. And so
the controller number and device number now mismatch, meaning we can not safely deref
/sys/class/nvme/nvme{major}
Instead, we can simply deref
/sys/class/block/nvme{major}c*n1/queue
The major ensures we use the original device number, and the glob ensures we modify the
timeout for all controllers.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

When we replace an existing path, the new path has a different controller number. And so
the controller number and device number now mismatch, meaning we can not safely deref
/sys/class/nvme/nvme{major}
Instead, we can simply deref
/sys/class/block/nvme{major}c*n1/queue
The major ensures we use the original device number, and the glob ensures we modify the
timeout for all controllers.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
When v1 nexus add child was added, it was not made idempotent.
Even though this is not an issue per se, as the child eventually gets
GCd and re-added it can cause strange logging..
TODO: should we have different behaviour depending on the state?
Example if faulted should we remove/readd?
Bonus: Fixes old test which stopped working a long time ago when
pstor was enabled for the data-plane by not enabling it for that
particular test only..

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro tiagolobocastro changed the title fix(csi-node/nvmf/fixup): fixup correct nvme controller Fixup correct nvme controller and make child add v1 idempotent Dec 6, 2023
@tiagolobocastro
Copy link
Contributor Author

bors merge

bors-openebs-mayastor bot pushed a commit that referenced this pull request Dec 6, 2023
695: Fixup correct nvme controller and make child add v1 idempotent r=tiagolobocastro a=tiagolobocastro

    feat(csi/node/timeout): add io-engine timeout and parse humantime
    
    Adds new parameter "--io-engine-io-timeout".
    This is used as a base for the nvme core io timeout: we add a slack of
    10s to this value, allowing the backend to fail io first.
    Also let's parse the "--nvme-core-io-timeout" as humantime as well..
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(nexus/add-child/v1): make add child v1 idempotent
    
    When v1 nexus add child was added, it was not made idempotent.
    Even though this is not an issue per se, as the child eventually gets
    GCd and re-added it can cause strange logging..
    TODO: should we have different behaviour depending on the state?
    Example if faulted should we remove/readd?
    Bonus: Fixes old test which stopped working a long time ago when
    pstor was enabled for the data-plane by not enabling it for that
    particular test only..
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(csi-node/nvmf/fixup): fixup correct nvme controller
    
    When we replace an existing path, the new path has a different controller number. And so
    the controller number and device number now mismatch, meaning we can not safely deref
    /sys/class/nvme/nvme{major}
    Instead, we can simply deref
    /sys/class/block/nvme{major}c*n1/queue
    The major ensures we use the original device number, and the glob ensures we modify the
    timeout for all controllers.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>


Co-authored-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

bors cancel

@bors-openebs-mayastor
Copy link

Canceled.

Adds new parameter "--nvme-io-timeout".
This is used to set the timeout per nvme block device.
TODO: Check if this is enough to avoid setting the global timeout..
Also let's parse the "--nvme-core-io-timeout" as humantime as well..

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

bors merge

bors-openebs-mayastor bot pushed a commit that referenced this pull request Dec 6, 2023
695: Fixup correct nvme controller and make child add v1 idempotent r=tiagolobocastro a=tiagolobocastro

    feat(csi/node/timeout): add nvme-io-engine timeout and parse humantime
    
    Adds new parameter "--nvme-io-timeout".
    This is used to set the timeout per nvme block device.
    TODO: Check if this is enough to avoid setting the global timeout..
    Also let's parse the "--nvme-core-io-timeout" as humantime as well..
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(nexus/add-child/v1): make add child v1 idempotent
    
    When v1 nexus add child was added, it was not made idempotent.
    Even though this is not an issue per se, as the child eventually gets
    GCd and re-added it can cause strange logging..
    TODO: should we have different behaviour depending on the state?
    Example if faulted should we remove/readd?
    Bonus: Fixes old test which stopped working a long time ago when
    pstor was enabled for the data-plane by not enabling it for that
    particular test only..
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(csi-node/nvmf/fixup): fixup correct nvme controller
    
    When we replace an existing path, the new path has a different controller number. And so
    the controller number and device number now mismatch, meaning we can not safely deref
    /sys/class/nvme/nvme{major}
    Instead, we can simply deref
    /sys/class/block/nvme{major}c*n1/queue
    The major ensures we use the original device number, and the glob ensures we modify the
    timeout for all controllers.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>


Co-authored-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

bors cancel

@bors-openebs-mayastor
Copy link

Canceled.

@tiagolobocastro
Copy link
Contributor Author

bors cancel

@tiagolobocastro
Copy link
Contributor Author

bors merge

@bors-openebs-mayastor
Copy link

Build succeeded:

@bors-openebs-mayastor bors-openebs-mayastor bot merged commit cbc833a into develop Dec 6, 2023
6 checks passed
@bors-openebs-mayastor bors-openebs-mayastor bot deleted the csi-timeout branch December 6, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants