Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMP delay logic changes #7068

Closed
wants to merge 2 commits into from

Conversation

jhammond-intel
Copy link
Contributor

@jhammond-intel jhammond-intel commented Jan 19, 2018

Description

RFC MMP delay logic changes from @adilger. These are based on the discussion on #7045 and #7048. (I have not tested these but I plan to.)

Motivation and Context

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Andreas Dilger added 2 commits January 19, 2018 15:46
In mmp_thread(), if the mmp_delay is increasing beyond the fixed
mmp_interval, use the longer mmp_delay value when determining if
the pool should be suspended.  Otherwise, if the system is very
busy (but still getting MMP writes to disk) it may incorrectly
suspend the pool, even though a peer system would not import it.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Base zfs_multihost_fail_intervals on zfs_multihost_import_intervals
by default, so that the number of failed writes for suspending a
pool is not larger than the number of intervals a peer will use when
checking if it is safe to import the pool.

Otherwise, it would be possible (with poorly-set module parameters,
fail_interval > import_interval) that the current importer writes MMP
updates irregularly but eventually succeeds before mmp_fail_intervals
(hence does not suspend the pool), yet the peer system imported the
pool after waiting an insufficient number of update intervals.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
@behlendorf behlendorf requested a review from ofaaland January 19, 2018 22:09
@adilger
Copy link
Contributor

adilger commented Jan 19, 2018

I would consider these to be more of a "bug fix" than "new feature", but either way I thought it would be useful to get some feedback sooner rather than later...

It needs to add updates to the man/man5/zfs-module-parameters.5 descriptions (I'm working on this already).

I was thinking about the issue of zfs_multihost_import_intervals vs. zfs_multihost_fail_intervals inconsistency between nodes. Is there a reason that these values would be module parameters rather than pool parameters? It would make sense to have them as pool parameters so that they are consistent between any nodes that are trying to import the pool. There may be a bit of chicken-and-egg problem, but it should still be able to read these parameters from the dataset without actually doing an import? That is fodder for a separate patch, however.

module/zfs/mmp.c Outdated
@@ -392,6 +391,30 @@ mmp_thread(void *arg)
else
next_time = start + MSEC2NSEC(MMP_DEFAULT_INTERVAL);

/*
* Don't allow fail_intervals larger than import_intervals,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better implemented in a callback, so you can reject bad values at the time the user tries to set them? One such example is:

  • 0582e40 Add callback for zfs_multihost_interval

@ofaaland
Copy link
Contributor

ofaaland commented Jan 19, 2018

Is there a reason that these values would be module parameters rather than pool parameters? It would make sense to have them as pool parameters so that they are consistent between any nodes that are trying to import the pool. There may be a bit of chicken-and-egg problem, but it should still be able to read these parameters from the dataset without actually doing an import?

The two reasons I remember both arise from the fact that pool properties are stored in an object in the pool.

  1. We wanted to make the go/no-go decision very early in the import process, to minimize the chance that some later patch introduces dangerous code before the MMP check is performed.

  2. Getting far enough to read an object from the pool results in the node (sometimes) crashing when the pool is changing. There are several areas of the ZFS kernel module code that assume no change within the pool during the import process. We found that assumption was made about vdev count, log device count, and IIRC device size. There might well be more, these just came up early

I agree, it would be great not to depend on uniform settings among the nodes.

@adilger
Copy link
Contributor

adilger commented Jan 19, 2018

What about storing the zfs_multihost_import_interval parameter in the per-VDEV nvlist? This is relatively easily parsed (I did that in the util-linux-ng version of libblkid) and could be accessed very early in the import process without any risk of state changing.

@ofaaland
Copy link
Contributor

What about storing the zfs_multihost_import_interval parameter in the per-VDEV nvlist?

One might end up with different values from different VDEVs, for example if some devices get disconnected from host A but remain connected to host B. But if we take the MAX() of all values found we would still be guaranteed to have a safe value.

It wouldn't be a pool property in a sense, but the user would think of it that way. Sort of breaks the model, so I'm not sure how to best cope with that.

I agree that storage location for zfs_multihost_import_interval has advantages.

@ofaaland
Copy link
Contributor

Looking into this I see other issues in MMP that are not dangerous but make MMP more likely to suspend the pool than they should be, or more difficult to debug than it need to be. I'm working through those and will be back with more information soon.

"zfs_multihost_import_interval %u, "
"using smaller value",
zfs_multihost_fail_intervals,
zfs_multihost_import_interval);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zfs_multihost_import_interval -> zfs_multihost_import_intervals ?

@behlendorf
Copy link
Contributor

I'm going to close out this PR for now. Several of the changes proposed have already been committed as part of other patches. The remaining changes in behavior could benefit for additional discussion in a new PR.

@behlendorf behlendorf closed this Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants