VNX: alu_hlu cache having entries for deleted luns #305

Murray-LIANG · 2020-03-26T07:19:29Z

OpenStack VNX tempest CI always failed. This issue was not always reproducible.
When the CI failed, you could see below log keeping printing:

Searching for a device in session 2 and hctl ('4', '0', '0', 101) yield: None

Cinder/Nova host cannot find the device because it even wasn't attached:

Mar 23 02:13:32 vnxvm1 cinder-volume[27333]: INFO storops.vnx.resource.sg [None req-9f5f1115-5f0c-4f69-afab-f03c1e76a76c tempest-Volume
RetypeWithMigrationTest-934831485 None] ALU(alu=29) is already attached, found in the cache(hlu=101).

The ALU found in the cache will not be attached on VNX. So, the problem is the alu-hlu cache is dirty.

When there are more than two pools in OpenStack VNX Cinder, there will be more than backends running, one per pool. In each running backend, there is an sg cache. And inside the sg cache, there is an alu-hlu-mapping cache.

This issue occurred in the below scenario:

Backend-A's sg cache had the entry for LUN-A's alu.
Backend-B's sg cache had the entry for LUN-A's alu too.
Backend-B detached and deleted LUN-A.
Backend-A created a new LUN-B but with the same alu with LUN-A.
Backend-A tried to attach LUN-B and found it was already in the cache. So, the cache was dirty.

This commit first reverts "Fix alu/hlu cache issue (#181)", the commit id is e82db79. The root cause of #181 was `sg.update()` would flush the alu-hlu mapping which was not returned by `sg.update()` but the LUN/alu was attached successfully on VNX. VNX was slow, which caused the info returnd by `sg.update()` was not the latest one. To fix the issue of #181, a retry is added to `detach_alu`. That is, when it found the alu-hlu is not in the cache, it will update sg's alu-hlu-map and try to get the hlu again.

Murray-LIANG · 2020-03-27T06:59:18Z

+------------------+      +-----------------+
|Cinder Backend-A  |      |Cinder Backend-B |
|                  |      |                 |
|    sg cache      |      |    sg cache     |
|                  |      |                 |
+--------+---------+      +-------+---------+
         |                        |
         v                        |
+--------+---------+              |
| sg._add_alu      |              |
| (alu=21, hlu=xxx)|              |
|                  |              |
+--------+---------+              |
         |                        v
         |                +-------+---------+
         |                | sg.update()     |
         |                | {21: xxx} is in |
         |                | sg.alu_hlu_map  |
         v                +-------+---------+
+--------+---------+              |
| sg._delete_hlu   |              |
| (alu=21)         |              |
| lun(21).delete() |              v
+------------------+      +-------+---------+
                          | lun created with|
                          | same alu=21     |
                          +-------+---------+
                                  |
                                  |
                                  |
                                  v
                          +-------+---------------+
                          | sg.attach_alu(alu=21) |
                          |                       |
                          | !!!cache dirty!!!     |
                          | Nothing will do on VNX|
                          | because alu=21 is     |
                          | found in the cache    |
                          |                       |
                          +-----------------------+

Use `persist-queue.PDict` to share the cache of storage group between differen processes running on the same machine, i.e. different OpenStack Cinder backends. `PDict` leverages `pickle` to serialize and deserialize Python objects. Objects are modified to be picklable in this commit.

- [#305] VNX: add shared cache for sg (#315) - Unity: Add new item NVMe_Extreme_PerformanceT for TierTypeEnum (#317) - Add UnityQoSMaxKBPSOutOfRangeError exception (#316)

Murray-LIANG added the bug label Mar 26, 2020

Murray-LIANG self-assigned this Mar 26, 2020

Murray-LIANG mentioned this issue Mar 27, 2020

[#305] VNX: alu-hlu cache dirty #306

Closed

Murray-LIANG added a commit that referenced this issue Jun 4, 2020

[#305] VNX: add shared cache for sg

40fc08a

Murray-LIANG added a commit that referenced this issue Jul 28, 2020

Release 1.2.6

0cad806

- [#305] VNX: add shared cache for sg (#315) - Unity: Add new item NVMe_Extreme_PerformanceT for TierTypeEnum (#317) - Add UnityQoSMaxKBPSOutOfRangeError exception (#316)

yong-huang closed this as completed Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VNX: alu_hlu cache having entries for deleted luns #305

VNX: alu_hlu cache having entries for deleted luns #305

Murray-LIANG commented Mar 26, 2020

Murray-LIANG commented Mar 27, 2020

VNX: alu_hlu cache having entries for deleted luns #305

VNX: alu_hlu cache having entries for deleted luns #305

Comments

Murray-LIANG commented Mar 26, 2020

Murray-LIANG commented Mar 27, 2020