-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing gang ABD child removal race condition #10511
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10511 +/- ##
==========================================
+ Coverage 79.51% 80.13% +0.62%
==========================================
Files 393 292 -101
Lines 124638 84430 -40208
==========================================
- Hits 99106 67661 -31445
+ Misses 25532 16769 -8763
Continue to review full report at Codecov.
|
bae7ea9
to
5ae5354
Compare
b4344e0
to
f1a3753
Compare
module/zfs/abd.c
Outdated
* adding it to the other gang ABD. | ||
*/ | ||
mutex_enter(&cabd->abd_mtx); | ||
ASSERT3B(list_link_active(&cabd->abd_gang_link), ==, B_TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT3B(list_link_active(&cabd->abd_gang_link), ==, B_TRUE); | |
ASSERT(list_link_active(&cabd->abd_gang_link)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this.
module/zfs/abd.c
Outdated
@@ -373,6 +384,7 @@ void | |||
abd_gang_add(abd_t *pabd, abd_t *cabd, boolean_t free_on_free) | |||
{ | |||
ASSERT(abd_is_gang(pabd)); | |||
ASSERT3B(abd_is_gang(cabd), ==, B_FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT3B(abd_is_gang(cabd), ==, B_FALSE); | |
ASSERT(!abd_is_gang(cabd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this.
include/os/linux/spl/sys/list.h
Outdated
static inline int | ||
list_link_not_active(list_node_t *node) | ||
{ | ||
return ((node->next == LIST_POISON1) && (node->prev == LIST_POISON2)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing that this is different from !list_link_active()
. I am guessing this is to catch cases where only one of the pointers is POISON? And I'm also guessing that should never happen (only one of the pointers being POISON).
Instead of adding this function, could we make list_link_active() assert that either both or neither pointer is POISON? e.g.
EQUIV(node->next == LIST_POISON1, node->prev == LIST_POISON2);
return (node->next != LIST_POISON1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add this to avoid the logic short circuiting on the && statement in link_list_active(). We were observing, due to the race condition, that next would be pointing at a valid ABD, but prev was pointing at LIST_POISON2. I am fine with just updating list_link_active() to the EQUIV statement and return.
f1a3753
to
a277205
Compare
I may be wrong but I believe there is still a race condition with this code. I came up with the hypothetical scenario below trying to highlight every step so others can point to them if I am wrong. Setup: We've got a N-way mirror with devices A,B,C,D, ... N again. We issue a write to the mirror which splits into N ZIOs that end up in the I/O aggregate code in question. Step 1: Thread for device A I/O creates its ABD with its gang chain, then grabs the lock for the child ABD adds it to the chain and drops it - this is according to the code in Step 2: Same thread (device A) eventually is done and gets to Step 3: Thread for device B I/O suddenly wakes up and calls
Potential Race: Now this is the part where I may be wrong but if the device B thread was had
Again, I may be wrong but my point being that device B thread may be forcing device A thread to call Does this seem possible? |
So while your case could potentially happen if the user was not using the free_on_free flag correctly, that is not the case in the aggregate function. It is important to note that only newly allocated ABD’s pass B_TRUE here. Any ABD that can potentially be part of multiple gang ABD’s always passes B_FALSE. We are really putting the onus on the coder here to correctly pass the right flag for free_on_free. We are taking care to always pass B_FALSE in the case of the same ABD possibly being part of two gang ABD’s. |
On linux the list debug code has been setting off a failure when checking that the node->next->prev value is pointing back at the node. At times this check evaluates to 0xdead. When removing a child from a gang ABD we must acquire the child's abd_mtx to make sure that the same ABD is not being added to another gang ABD while it is being removed from a gang ABD. This fixes a race condition when checking if an ABDs link is already active and part of another gang ABD before adding it to a gang. Added additional debug code for the gang ABD in abd_verify() to make sure each child ABD has active links. Also check to make sure another gang ABD is not added to a gang ABD. Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
a277205
to
3a43a4a
Compare
On linux the list debug code has been setting off a failure when checking that the node->next->prev value is pointing back at the node. At times this check evaluates to 0xdead. When removing a child from a gang ABD we must acquire the child's abd_mtx to make sure that the same ABD is not being added to another gang ABD while it is being removed from a gang ABD. This fixes a race condition when checking if an ABDs link is already active and part of another gang ABD before adding it to a gang. Added additional debug code for the gang ABD in abd_verify() to make sure each child ABD has active links. Also check to make sure another gang ABD is not added to a gang ABD. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes openzfs#10511
On linux the list debug code has been setting off a failure when checking that the node->next->prev value is pointing back at the node. At times this check evaluates to 0xdead. When removing a child from a gang ABD we must acquire the child's abd_mtx to make sure that the same ABD is not being added to another gang ABD while it is being removed from a gang ABD. This fixes a race condition when checking if an ABDs link is already active and part of another gang ABD before adding it to a gang. Added additional debug code for the gang ABD in abd_verify() to make sure each child ABD has active links. Also check to make sure another gang ABD is not added to a gang ABD. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes openzfs#10511
On linux the list debug code has been setting off a failure when
checking that the node->next->prev value is pointing back at the node.
At times this check evaluates to 0xdead. When removing a child from a
gang ABD we must acquire the child's abd_mtx to make sure that the
same ABD is not being added to another gang ABD while it is being
removed from a gang ABD. This fixes a race condition when checking
if an ABDs link is already active and part of another gang ABD before
adding it to a gang.
Added additional debug code for the gang ABD in abd_verify() to make
sure each child ABD has active links. Also check to make sure another
gang ABD is not added to a gang ABD.
Signed-off-by: Brian Atkinson batkinson@lanl.gov
Currently when an ABD struct that was part of a gang ABD is removed with list_remove_head() in abd_free_gang_abd() we will get a failure in the linux kernel list debug code were the next->prev is pointing at LIST_POISON2 instead of the node address. There was a race condition that existed when removing a child from a gang ABD and the child's abd_mtx need to be acquired to make sure it was not being removed and added to gang ABD at the same time.
Motivation and Context
Fixes race condition in gang ABD between abd_gang_add() and abd_free_gang_abd().
#10401
Description
I have added an additional function, list_link_not_actve(), to avoid a short circuit in the && statement so both next and prev are verified to point at LIST_POISON(1/2) values. In abd_verify() I have added an additional ASSERT for verifying that a each child in a gang ABD has an active link.
How Has This Been Tested?
I have ran using zloop.sh on Centos 8
CentOS 8: Kernel 4.18.0-193
Types of changes
Checklist:
Signed-off-by
.