Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VNX: alu_hlu cache having entries for deleted luns #305

Closed
Murray-LIANG opened this issue Mar 26, 2020 · 1 comment
Closed

VNX: alu_hlu cache having entries for deleted luns #305

Murray-LIANG opened this issue Mar 26, 2020 · 1 comment
Assignees
Labels

Comments

@Murray-LIANG
Copy link
Contributor

OpenStack VNX tempest CI always failed. This issue was not always reproducible.
When the CI failed, you could see below log keeping printing:

Searching for a device in session 2 and hctl ('4', '0', '0', 101) yield: None

Cinder/Nova host cannot find the device because it even wasn't attached:

Mar 23 02:13:32 vnxvm1 cinder-volume[27333]: INFO storops.vnx.resource.sg [None req-9f5f1115-5f0c-4f69-afab-f03c1e76a76c tempest-Volume
RetypeWithMigrationTest-934831485 None] ALU(alu=29) is already attached, found in the cache(hlu=101).

The ALU found in the cache will not be attached on VNX. So, the problem is the alu-hlu cache is dirty.

When there are more than two pools in OpenStack VNX Cinder, there will be more than backends running, one per pool. In each running backend, there is an sg cache. And inside the sg cache, there is an alu-hlu-mapping cache.

This issue occurred in the below scenario:

  1. Backend-A's sg cache had the entry for LUN-A's alu.
  2. Backend-B's sg cache had the entry for LUN-A's alu too.
  3. Backend-B detached and deleted LUN-A.
  4. Backend-A created a new LUN-B but with the same alu with LUN-A.
  5. Backend-A tried to attach LUN-B and found it was already in the cache. So, the cache was dirty.
@Murray-LIANG Murray-LIANG self-assigned this Mar 26, 2020
Murray-LIANG added a commit that referenced this issue Mar 26, 2020
This commit first reverts "Fix alu/hlu cache issue (#181)",
the commit id is e82db79.

The root cause of #181 was `sg.update()` would flush the alu-hlu
mapping which was not returned by `sg.update()` but the LUN/alu was
attached successfully on VNX. VNX was slow, which caused the info
returnd by `sg.update()` was not the latest one.

To fix the issue of #181, a retry is added to `detach_alu`. That is,
when it found the alu-hlu is not in the cache, it will update sg's
alu-hlu-map and try to get the hlu again.
@Murray-LIANG
Copy link
Contributor Author

+------------------+      +-----------------+
|Cinder Backend-A  |      |Cinder Backend-B |
|                  |      |                 |
|    sg cache      |      |    sg cache     |
|                  |      |                 |
+--------+---------+      +-------+---------+
         |                        |
         v                        |
+--------+---------+              |
| sg._add_alu      |              |
| (alu=21, hlu=xxx)|              |
|                  |              |
+--------+---------+              |
         |                        v
         |                +-------+---------+
         |                | sg.update()     |
         |                | {21: xxx} is in |
         |                | sg.alu_hlu_map  |
         v                +-------+---------+
+--------+---------+              |
| sg._delete_hlu   |              |
| (alu=21)         |              |
| lun(21).delete() |              v
+------------------+      +-------+---------+
                          | lun created with|
                          | same alu=21     |
                          +-------+---------+
                                  |
                                  |
                                  |
                                  v
                          +-------+---------------+
                          | sg.attach_alu(alu=21) |
                          |                       |
                          | !!!cache dirty!!!     |
                          | Nothing will do on VNX|
                          | because alu=21 is     |
                          | found in the cache    |
                          |                       |
                          +-----------------------+

Murray-LIANG added a commit that referenced this issue Jun 8, 2020
Use `persist-queue.PDict` to share the cache of storage group between
differen processes running on the same machine, i.e. different OpenStack
Cinder backends.

`PDict` leverages `pickle` to serialize and deserialize Python objects.
Objects are modified to be picklable in this commit.
Murray-LIANG added a commit that referenced this issue Jun 9, 2020
Use `persist-queue.PDict` to share the cache of storage group between
differen processes running on the same machine, i.e. different OpenStack
Cinder backends.

`PDict` leverages `pickle` to serialize and deserialize Python objects.
Objects are modified to be picklable in this commit.
Murray-LIANG added a commit that referenced this issue Jul 28, 2020
Use `persist-queue.PDict` to share the cache of storage group between
differen processes running on the same machine, i.e. different OpenStack
Cinder backends.

`PDict` leverages `pickle` to serialize and deserialize Python objects.
Objects are modified to be picklable in this commit.
Murray-LIANG added a commit that referenced this issue Jul 28, 2020
- [#305] VNX: add shared cache for sg (#315)
- Unity: Add new item NVMe_Extreme_PerformanceT for TierTypeEnum (#317)
- Add UnityQoSMaxKBPSOutOfRangeError exception (#316)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants