-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmware clustering issues with ceph/rbd iscsi HA support #341
Comments
Hi Mike, Do you have information on where those design changes are being proposed at or discussed at? |
No. Do you want to work on it or are you just wondering? If you want to work on it I can bug the rbd maintainer about the r d parts and I can write up what needs to be done for the kernel. |
We are switching our failover type to explicit: ceph/ceph-iscsi-config#54 For ESX when the follow over feature is enabled (alua_followover=on) we will not hit the above problem. There is a similar feature for linux. I have not found one for windows though. |
Mike would this also address the Issue with PGRs not working if you have 2 gateways? From what i understand the alua_followover=no would solve the potential issue of rbd locks being bounced between two gateway nodes but this wouldnt necessarily address PGRs (aka the ability the share a lun across multiple hosts without potential datacorruption) ? |
No. It will not help. For PGRs even though one node is in standby it still needs to be able to setup and report the PGR state. I think the windows cluster validation test you need to run when you setup a windows cluster even tests for this.
Just to make sure there is no mixup. That should be "on" and not "no" :) |
Hi Mike, I see you are working on this when looking through the commits. It would be nice to have some information about the current usability of ceph/rbd iSCSI HA with VMware HA. Or is there any target release that is supposed to support the lock share? |
I am currently working on LIO changes and the tcmu kernel/user interface to support shared locking and fix another possible data corruption issue with VMware HA similar to the one found for the single path case here #384. For the 1.4.0 release I am trying to complete by the end of August, the data corruption issues will be fixed in all configurations. For VMware HA and the specific issue that this issue was created for where we will hit the path ping ponging issue that will not be fixed in that release. For both shared locking and PGRs, the primary part is being able to tell tcmu-runner what I_T nexus (iscsi path basically) the command is coming from. This is taking me a lot longer than I thought and I am currently fixing up related bugs in the PGR code. Specifically this patch: https://www.spinics.net/lists/target-devel/msg16945.html ended up leading to a lot of other changes. |
Hello @mikechristie - doues that issues still exist ? If yes, are there other workarounds than the ONE-VM-PER-LUN ? |
|
What is the current status here @lxbsz ? As I assume Mike is not working on this anymore I tagged you, because I would like for somebody to shed light on this. This is a pretty big show stopper for us. |
I am mainly focusing on and ocuppied by cephfs stuff, not get any chance to work on it yet. |
I'm working on a ceph iscsi poc currently. The goal is to have a multi-rbd shared datastore across 3 esxi hosts. Currently only one host at a time can actually see and write to the datastore. Is this something that can work? |
This is a place keeper for a known issue with vmware HA and ceph/rbd iscsi HA support.
It is not a issue when using single host vmware setups with ceph/rbd iscsi HA support and it is not a issue when using vmware HA setups with ceph/rbd iSCSI non-HA setups.
The problem is that in ceph/rbd iscsi HA mode rbd requires the rbd exclusive lock when executing IO. If a LUN contains multiple VMs, and they have different active hosts, and one host cannot access the active-optimized iscsi gateway then it will failover to one of the non-primary gateways, but the other hosts will continue to use the primary gateway. The rbd lock will then bounce between both hosts and cause performance problems and possibly crashes/hangs in the iscsi gateways.
A kernel and tcmu-runner change that allows tcmu-runner to share an rbd image's lock between multiple iscsi gateways is being designed.
A temporary workaround for smaller setups would be to use a VM per LUN. There is no workaround for larger setups.
The text was updated successfully, but these errors were encountered: