vmware clustering issues with ceph/rbd iscsi HA support #341

mikechristie · 2017-12-20T19:32:06Z

This is a place keeper for a known issue with vmware HA and ceph/rbd iscsi HA support.

It is not a issue when using single host vmware setups with ceph/rbd iscsi HA support and it is not a issue when using vmware HA setups with ceph/rbd iSCSI non-HA setups.

The problem is that in ceph/rbd iscsi HA mode rbd requires the rbd exclusive lock when executing IO. If a LUN contains multiple VMs, and they have different active hosts, and one host cannot access the active-optimized iscsi gateway then it will failover to one of the non-primary gateways, but the other hosts will continue to use the primary gateway. The rbd lock will then bounce between both hosts and cause performance problems and possibly crashes/hangs in the iscsi gateways.

A kernel and tcmu-runner change that allows tcmu-runner to share an rbd image's lock between multiple iscsi gateways is being designed.

A temporary workaround for smaller setups would be to use a VM per LUN. There is no workaround for larger setups.

GoozeyX · 2018-03-19T21:17:24Z

Hi Mike,

Do you have information on where those design changes are being proposed at or discussed at?

mikechristie · 2018-03-20T21:08:57Z

No.

Do you want to work on it or are you just wondering? If you want to work on it I can bug the rbd maintainer about the r d parts and I can write up what needs to be done for the kernel.

mikechristie · 2018-05-01T22:48:29Z

We are switching our failover type to explicit:

ceph/ceph-iscsi-config#54
#407

For ESX when the follow over feature is enabled (alua_followover=on) we will not hit the above problem. There is a similar feature for linux. I have not found one for windows though.

GoozeyX · 2018-05-02T09:54:58Z

Mike would this also address the Issue with PGRs not working if you have 2 gateways? From what i understand the alua_followover=no would solve the potential issue of rbd locks being bounced between two gateway nodes but this wouldnt necessarily address PGRs (aka the ability the share a lun across multiple hosts without potential datacorruption) ?

mikechristie · 2018-05-02T21:22:09Z

Mike would this also address the Issue with PGRs not working if you have 2 gateways? From what i

No. It will not help. For PGRs even though one node is in standby it still needs to be able to setup and report the PGR state. I think the windows cluster validation test you need to run when you setup a windows cluster even tests for this.

understand the alua_followover=no would solve the potential issue of rbd locks being bounced

Just to make sure there is no mixup. That should be "on" and not "no" :)

DennisKonrad · 2018-08-07T14:57:52Z

Hi Mike,

I see you are working on this when looking through the commits. It would be nice to have some information about the current usability of ceph/rbd iSCSI HA with VMware HA. Or is there any target release that is supposed to support the lock share?

mikechristie · 2018-08-07T16:14:06Z

I am currently working on LIO changes and the tcmu kernel/user interface to support shared locking and fix another possible data corruption issue with VMware HA similar to the one found for the single path case here #384.

For the 1.4.0 release I am trying to complete by the end of August, the data corruption issues will be fixed in all configurations.

For VMware HA and the specific issue that this issue was created for where we will hit the path ping ponging issue that will not be fixed in that release. For both shared locking and PGRs, the primary part is being able to tell tcmu-runner what I_T nexus (iscsi path basically) the command is coming from. This is taking me a lot longer than I thought and I am currently fixing up related bugs in the PGR code. Specifically this patch:

https://www.spinics.net/lists/target-devel/msg16945.html

ended up leading to a lot of other changes.

iceman91176 · 2019-10-21T08:18:13Z

Hello @mikechristie - doues that issues still exist ? If yes, are there other workarounds than the ONE-VM-PER-LUN ?
What is the limit of LUNs that iscsi-gw does support ?

mikechristie · 2019-10-21T16:39:14Z

It still exists.
No other workarounds.
The limit is 256 LUNs per target and the max number of targets depends on the systems resources. There is no hard coded limit for target and the actual target struct do not take a lot of memory, so you would probably hit the ESX limit on sessions/paths which I think would be 1024 per ESX host.

MatthiasGrandl · 2021-02-25T11:26:06Z

What is the current status here @lxbsz ? As I assume Mike is not working on this anymore I tagged you, because I would like for somebody to shed light on this. This is a pretty big show stopper for us.

lxbsz · 2021-03-26T13:46:23Z

What is the current status here @lxbsz ? As I assume Mike is not working on this anymore I tagged you, because I would like for somebody to shed light on this. This is a pretty big show stopper for us.

I am mainly focusing on and ocuppied by cephfs stuff, not get any chance to work on it yet.

lkuhn900 · 2022-11-09T15:15:59Z

I'm working on a ceph iscsi poc currently. The goal is to have a multi-rbd shared datastore across 3 esxi hosts. Currently only one host at a time can actually see and write to the datastore. Is this something that can work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmware clustering issues with ceph/rbd iscsi HA support #341

vmware clustering issues with ceph/rbd iscsi HA support #341

mikechristie commented Dec 20, 2017

GoozeyX commented Mar 19, 2018

mikechristie commented Mar 20, 2018

mikechristie commented May 1, 2018

GoozeyX commented May 2, 2018

mikechristie commented May 2, 2018

DennisKonrad commented Aug 7, 2018

mikechristie commented Aug 7, 2018

iceman91176 commented Oct 21, 2019

mikechristie commented Oct 21, 2019

MatthiasGrandl commented Feb 25, 2021 •

edited

Loading

lxbsz commented Mar 26, 2021

lkuhn900 commented Nov 9, 2022

vmware clustering issues with ceph/rbd iscsi HA support #341

vmware clustering issues with ceph/rbd iscsi HA support #341

Comments

mikechristie commented Dec 20, 2017

GoozeyX commented Mar 19, 2018

mikechristie commented Mar 20, 2018

mikechristie commented May 1, 2018

GoozeyX commented May 2, 2018

mikechristie commented May 2, 2018

DennisKonrad commented Aug 7, 2018

mikechristie commented Aug 7, 2018

iceman91176 commented Oct 21, 2019

mikechristie commented Oct 21, 2019

MatthiasGrandl commented Feb 25, 2021 • edited Loading

lxbsz commented Mar 26, 2021

lkuhn900 commented Nov 9, 2022

MatthiasGrandl commented Feb 25, 2021 •

edited

Loading