Avoid expensive operations when constructing an Engine

Today (5f6321aacb95d2b842c74b6f7dc064c70a15e33c) when constructing an `InternalEngine` we perform some potentially expensive operations, including:

```
	at org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:81)
	at org.elasticsearch.index.IndexService.lambda$createShard$4(IndexService.java:402)
	at org.elasticsearch.index.engine.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:2244)
	at org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:198)
	at org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.<init>(InternalEngine.java:326)
	at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:598)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:238)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
```

and 

```
	at org.elasticsearch.index.engine.InternalEngine.restoreVersionMapAndCheckpointTracker(InternalEngine.java:2820)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:257)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
```

We create the `Engine` under `IndexShard#mutex`:

https://github.com/elastic/elasticsearch/blob/0cfc9ff77594a64737078ce6ffd4630a1a4253d5/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java#L1448-L1458

This can block cluster state updates, because `IndexShard#updateShardState` requires the same mutex:

https://github.com/elastic/elasticsearch/blob/0cfc9ff77594a64737078ce6ffd4630a1a4253d5/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java#L431-L438

We should survey the things we do during the startup of all of the engines and make sure that none of them will block for too long.

Relates https://discuss.elastic.co/t/187604 in which the engine takes multiple minutes to start up, because it's loading global ordinals, and this blocks a cluster state update for long enough that the node is removed from the cluster.

	synchronized (mutex) {
	verifyNotClosed();
	assert currentEngineReference.get() == null : "engine is running";
	// we must create a new engine under mutex (see IndexShard#snapshotStoreMetadata).
	final Engine newEngine = engineFactory.newReadWriteEngine(config);
	onNewEngine(newEngine);
	currentEngineReference.set(newEngine);
	// We set active because we are now writing operations to the engine; this way,
	// if we go idle after some time and become inactive, we still give sync'd flush a chance to run.
	active.set(true);
	}

	public void updateShardState(final ShardRouting newRouting,
	final long newPrimaryTerm,
	final BiConsumer<IndexShard, ActionListener<ResyncTask>> primaryReplicaSyncer,
	final long applyingClusterStateVersion,
	final Set<String> inSyncAllocationIds,
	final IndexShardRoutingTable routingTable) throws IOException {
	final ShardRouting currentRouting;
	synchronized (mutex) {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid expensive operations when constructing an Engine #43699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid expensive operations when constructing an Engine #43699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions