-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check and repair index under the store metadata lock #27768
Conversation
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IW lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IW lock can be still held by `snapshotStoreMetadata`. This commit makes sure to create a CheckIndex under the metadata lock. Closes elastic#24481 Relates elastic#24787
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment LGTM otherwise
if ("fix".equals(checkIndexOnStartup)) { | ||
if (logger.isDebugEnabled()) { | ||
logger.debug("fixing index, writing new segments file ..."); | ||
store.runUnderMetadataLock(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding this runUnderMetadataLock
method would it make sense to just move the checkindex method into store like this public CheckIndex.Status checkIndex()
then we can interpret the result here and run under the appropriate locks.
@s1monw I've moved |
@elasticmachine please retest this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment. LGTM otherwise
*/ | ||
public CheckIndex.Status checkIndex(PrintStream out) throws IOException { | ||
// We don't need to lock the directory here as we are not changing the index files. | ||
final Lock noDirectoryLock = new Lock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should still acquire the lock. I don't think we should execute this while the IW is open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed f231042
please retest this. |
@s1monw, I've moved the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM is this also fixing #27731 ?
Thanks @s1monw for reviewing. |
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method snapshotStoreMetadata. This commit makes sure to create a CheckIndex under the metadata lock. Closes #24481 Closes #27731 Relates #24787
* es/master: (45 commits) Adapt scroll rest test after backport. relates #27842 Move early termination based on index sort to TopDocs collector (#27666) Upgrade beats templates that we use for bwc testing. (#27929) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Add elasticsearch-nio jar for base nio classes (#27801) Use full profile on JDK 10 builds Require Gradle 4.3 Enable grok processor to support long, double and boolean (#27896) Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Fixes DocStats to not report index size < -1 (#27863) Fixed test to be up to date with the new database files. Upgrade to Lucene 7.2.0. (#27910) Disable TestZenDiscovery in cloud providers integrations test Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) ingest: Upgraded the geolite2 databases. [Test] Fix IndicesClientDocumentationIT (#27899) ...
* es/6.x: (43 commits) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Use full profile on JDK 10 builds Require Gradle 4.3 Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Upgrade to Lucene 7.2.0. (#27910) Fixed test to be up to date with the new database files. Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) Fixes DocStats to not report index size < -1 (#27863) Disable TestZenDiscovery in cloud providers integrations test ingest: Upgraded the geolite2 databases. [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (#27717) [Test] Fix IndicesClientDocumentationIT (#27899) Move uid lock into LiveVersionMap (#27905) Mute testRetentionPolicyChangeDuringRecovery Increase Gradle heap space to 1536m Move GlobalCheckpointTracker and remove SequenceNumbersService (#27837) ...
Today when we get a metadata snapshot directly from a store directory, we acquire a metadata lock, then acquire an IndexWriter lock. However, we create a CheckIndex in IndexShard without acquiring the metadata lock first. This causes a recovery failed because the IndexWriter lock can be still held by method
snapshotStoreMetadata
. This commit makes sure to create aCheckIndex
under the metadata lock.Closes #24481
Closes #27731
Relates #24787