-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
Today (5f6321a) when constructing an InternalEngine we perform some potentially expensive operations, including:
at org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:81)
at org.elasticsearch.index.IndexService.lambda$createShard$4(IndexService.java:402)
at org.elasticsearch.index.engine.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:2244)
at org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:198)
at org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.<init>(InternalEngine.java:326)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:598)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:238)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
and
at org.elasticsearch.index.engine.InternalEngine.restoreVersionMapAndCheckpointTracker(InternalEngine.java:2820)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:257)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
We create the Engine under IndexShard#mutex:
elasticsearch/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Lines 1448 to 1458 in 0cfc9ff
| synchronized (mutex) { | |
| verifyNotClosed(); | |
| assert currentEngineReference.get() == null : "engine is running"; | |
| // we must create a new engine under mutex (see IndexShard#snapshotStoreMetadata). | |
| final Engine newEngine = engineFactory.newReadWriteEngine(config); | |
| onNewEngine(newEngine); | |
| currentEngineReference.set(newEngine); | |
| // We set active because we are now writing operations to the engine; this way, | |
| // if we go idle after some time and become inactive, we still give sync'd flush a chance to run. | |
| active.set(true); | |
| } |
This can block cluster state updates, because IndexShard#updateShardState requires the same mutex:
elasticsearch/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Lines 431 to 438 in 0cfc9ff
| public void updateShardState(final ShardRouting newRouting, | |
| final long newPrimaryTerm, | |
| final BiConsumer<IndexShard, ActionListener<ResyncTask>> primaryReplicaSyncer, | |
| final long applyingClusterStateVersion, | |
| final Set<String> inSyncAllocationIds, | |
| final IndexShardRoutingTable routingTable) throws IOException { | |
| final ShardRouting currentRouting; | |
| synchronized (mutex) { |
We should survey the things we do during the startup of all of the engines and make sure that none of them will block for too long.
Relates https://discuss.elastic.co/t/187604 in which the engine takes multiple minutes to start up, because it's loading global ordinals, and this blocks a cluster state update for long enough that the node is removed from the cluster.