Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: add a test case for rbd-nbd mounter #1839

Merged
merged 6 commits into from
May 26, 2021
Merged

Conversation

pkalever
Copy link

@pkalever pkalever commented Jan 21, 2021

Describe what this PR does

Validate the basic working of rbd-nbd

Test cases:

  1. Create a PV with rbd-nbd backend and start the application pod using it
  2. After the application pod is started, restart the node plugin and expect the IO to fail as the rbd-nbd process is killed
  3. After restarting the node plugin, restart the rbd-nbd process by reattaching/re-mapping the device connection and expect the IO from the application pod to continue

Dependencies

Updates: #667

@pkalever pkalever marked this pull request as draft January 21, 2021 11:41
@nixpanic
Copy link
Member

You may want to have a look at #1840 for an idea how to test it in the CI before the minikube PR is merged+released.

@nixpanic nixpanic added component/testing Additional test cases or CI work dependency/k8s depends on Kubernetes features labels Jan 22, 2021
@pkalever
Copy link
Author

@nixpanic I hope the CI picks the modifications to minikube iso URL as part of this PR testing, is that right?
Or CI will only pick once the changes are merged?

@nixpanic
Copy link
Member

@nixpanic I hope the CI picks the modifications to minikube iso URL as part of this PR testing, is that right?
Or CI will only pick once the changes are merged?

The CI uses scripts/minikube.sh from the PR that modifies it, so --iso-url will be included in this testing.

@nixpanic
Copy link
Member

Oh, note that #1831 contains a modification to the EXTRA_CONFIG options for minikube, done right. CI jobs will likely fail with the change you made here now.

@pkalever
Copy link
Author

Oh, note that #1831 contains a modification to the EXTRA_CONFIG options for minikube, done right. CI jobs will likely fail with the change you made here now.

The PR#1831 seems to be in open state now, when it gets merged I shall rebase this, hope that looks like a good plan.
And yeah! while PR#1831 is still in the open state, I don't see a reason why this should fail?

Thanks!

@pkalever
Copy link
Author

Looks to be a PSP issue https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e-helm_k8s-1.19/detail/mini-e2e-helm_k8s-1.19/567/pipeline

Thanks @Madhu-1, now I understand why @nixpanic was warning in the comments above.
At least until 73d9428 is merged, I should copy them to my PR, just to comfort CI.

@nixpanic
Copy link
Member

With #1811 the minikube start commands in scripts/minikube.sh have been adapted. There is an additional --cni option now. You'll need to manually rebase this.

@pkalever pkalever force-pushed the e2e-nbd branch 3 times, most recently from 3938ced to c18459a Compare February 26, 2021 09:02
Base automatically changed from master to devel March 1, 2021 05:22
@pkalever pkalever force-pushed the e2e-nbd branch 7 times, most recently from 6e8a43b to 728e5df Compare March 3, 2021 10:57
e2e/pod.go Outdated
cmd := []string{"/bin/sh", "-c", c}
podList, err := f.PodClientNS(ns).List(context.TODO(), *opt)
framework.ExpectNoError(err)
if len(podList.Items) == 0 {
return framework.ExecOptions{}, errors.New("podlist is empty")
}
found := false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find Container Name in the pod can be wrapped into a new function .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done now!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pkalever 👍

e2e/pod.go Outdated
}

func execCommandInContainer(f *framework.Framework, c, ns string, cn string, opt *metav1.ListOptions) (string, string, error) {
podPot, err := getCommandInPodOpts(f, c, ns, cn, opt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventhough its not introduced here, Is above (podPot) a weird name ? may be we have to rename it as podOpt ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed this one too. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. thanks !

@pkalever pkalever force-pushed the e2e-nbd branch 2 times, most recently from 81706a7 to 9299867 Compare March 3, 2021 12:47
@pkalever pkalever requested a review from Madhu-1 May 5, 2021 06:47
@pkalever pkalever force-pushed the e2e-nbd branch 2 times, most recently from 77bf75d to 1009c06 Compare May 6, 2021 06:13
e2e/pod.go Outdated
@@ -305,7 +305,7 @@ func deletePod(name, ns string, c kubernetes.Interface, t int) error {
})
}

func deletePodWithLabel(label, ns string, skipNotFound bool) error {
func deletePodWithLabel(label, ns string, skipNotFound bool) error { //nolint:unparam
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic this should fix the golint warnings!

@pkalever pkalever force-pushed the e2e-nbd branch 3 times, most recently from 1972041 to 0175b5c Compare May 6, 2021 11:55
@pkalever
Copy link
Author

pkalever commented May 7, 2021

@Madhu-1 @nixpanic As we don't see any objections, could you please merge this PR now?

Thanks!

@humblec
Copy link
Collaborator

humblec commented May 7, 2021

@pkalever one quick check: Have we tested this scenario when the actual node reboot or move to NOT READY state and come back after some time ?

@pkalever
Copy link
Author

pkalever commented May 7, 2021

@pkalever one quick check: Have we tested this scenario when the actual node reboot or move to NOT READY state and come back after some time ?

When a node is rebooted, the application pod might be migrated to a new node or might be scheduled on the same node, in any case, a new NodeStageVolume call is expected to take care of mapping the device.

These tests are mainly focusing on the restart of the node plugin, where there is no NodeStageVolume call for now. We are working on implementing the missing gaps in parallel, where a new init container or a sidecar is supposed to get the list of volume attachments on a node (upon the restart of the node plugin) and make new NodeStageVolume grpc Calls per device for reattaching (remapping) rbd-nbd processes/devices.

Thanks!

@humblec
Copy link
Collaborator

humblec commented May 7, 2021

@pkalever one quick check: Have we tested this scenario when the actual node reboot or move to NOT READY state and come back after some time ?

When a node is rebooted, the application pod might be migrated to a new node or might be scheduled on the same node, in any case, a new NodeStageVolume call is expected to take care of mapping the device.

There are some ( corner ) cases, like network error, kubelet goes wrong, node completely goes out..etc which are bit different from volume pov. We have also seen issues popping up with some race conditions from CO to detect and issues RPC calls in between..etc. The reason for asking node not ready scenario testing was to understand/make sure we are not landing into issues easily when we remount the shares with these options. So I think its good to check once if we have a setup :)

@pkalever
Copy link
Author

pkalever commented May 7, 2021

There are some ( corner ) cases, like network error, kubelet goes wrong, node completely goes out..etc which are bit different from volume pov. We have also seen issues popping up with some race conditions from CO to detect and issues RPC calls in between..etc. The reason for asking node not ready scenario testing was to understand/make sure we are not landing into issues easily when we remount the shares with these options. So I think its good to check once if we have a setup :)

Thanks Humble, Got you. These test cases are less focused on the corner case. But I will make sure to test these corner cases with the logic we are implementing to address node plugin restart automatically.

@pkalever
Copy link
Author

@Madhu-1 @nixpanic can you please help on taking this PR to merge?
I would like to see them running in the CI which will help us improve the confidence on rbd-nbd.

Thanks!

Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to have this tested regularly, so let's get it in.

In case a manual rebase is needed, please correct the formatting of the // nolint: comments.

@@ -204,7 +204,7 @@ func execCommandInPod(f *framework.Framework, c, ns string, opt *metav1.ListOpti
return stdOut, stdErr, err
}

func execCommandInContainer(f *framework.Framework, c, ns, cn string, opt *metav1.ListOptions) (string, string, error) {
func execCommandInContainer(f *framework.Framework, c, ns, cn string, opt *metav1.ListOptions) (string, string, error) { //nolint:unparam,lll // cn can be used with different inputs later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually put these comments above the line that is affected, not after the line (this line becomes really long now).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought putting it above the line will affect the whole function block not just that line in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but that is not really an issue.

In general, all nolint issues need to be addressed at one point. That means, either the argument is not needed and can be dropped, or the function gets used with other arguments somewhere.

@pkalever
Copy link
Author

In case a manual rebase is needed, please correct the formatting of the // nolint: comments.

The recommendation was not to have space in between the slash and nolint

@Madhu-1
Copy link
Collaborator

Madhu-1 commented May 26, 2021

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented May 26, 2021

Command rebase: success

Branch has been successfully rebased

Prasanna Kumar Kalever added 6 commits May 26, 2021 09:21
To validate the basic working of rbd-nbd

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This is a negative testcase to showcase as per current design
the IO will fail because of the missing mappings

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Bringup the rbd-nbd map/attach process on the rbd node plugin and expect the
IO to continue uninterrupted.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This testcase tests journaling/exclusive-lock image-features with
rbd-nbd mounter

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Ignoring below warnings:

e2e/pod.go:207:60: `execCommandInContainer` - `cn` always receives
`"csi-rbdplugin"` (unparam)
func execCommandInContainer(f *framework.Framework, c, ns, cn string,
opt *metav1.ListOptions) (string, string, error) {
                                                           ^
e2e/pod.go:308:43: `deletePodWithLabel` - `skipNotFound` always receives
`false` (unparam)
func deletePodWithLabel(label, ns string, skipNotFound bool) error {

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
@mergify mergify bot merged commit 6984da5 into ceph:devel May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/testing Additional test cases or CI work dependency/ceph depends on core Ceph functionality dependency/k8s depends on Kubernetes features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants