-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958
base: main
Are you sure you want to change the base?
Conversation
❌ Gradle check result for 16dd9d0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
new Translog.Snapshot() { | ||
@Override | ||
public void close() {} | ||
|
||
@Override | ||
public int totalOperations() { | ||
return 0; | ||
} | ||
|
||
@Override | ||
public Translog.Operation next() { | ||
return null; | ||
} | ||
} | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a static EMPTY_TRANSLOG_SNAPSHOT
and reuse across this and NoOpEngine
String clientId = engineConfig.getIndexSettings().getNodeName() | ||
+ "-" | ||
+ engineConfig.getIndexSettings().getIndex().getName() | ||
+ "-" | ||
+ engineConfig.getShardId().getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use ids instead of names like index uuid, node id etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how would the FGAC security model work, espl with security plugin which intercepts transport actions to validate if authorised users can perform bulk actions on certain indices. Is the intent to handle permissions at a Kafka "partition level"
Another aspect is maintaining Kafka checkpoints durably, I'm yet to read that part but would be good to understand how are we handling fail overs and recoveries
* | ||
* @opensearch.api | ||
*/ | ||
public interface IngestionConsumerPlugin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the @ExperimentalApi
annotation on this as well
*/ | ||
|
||
/** Indices ingestion module package. */ | ||
package org.opensearch.indices.ingest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "ingest" is definitely overloaded. _bulk
is a type of ingestion, there are ingest pipelines, etc. I'd suggest using polling.ingest
or pollingingest
or anything else that helps disambiguate this area of the code from the ingest related pieces.
/** | ||
* Start the poller | ||
*/ | ||
void start();; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have the LifecycleComponent interface (and an abstract implementation). I don't know if that would be useful here but please take a look if you hadn't considered extending it.
private final TranslogManager translogManager; | ||
private final DocumentMapperForType documentMapperForType; | ||
private final IngestionConsumerFactory ingestionConsumerFactory; | ||
protected StreamPoller streamPoller; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like streamPoller
is assigned in the constructor and never accessed outside this class. Why is it not private final
?
} | ||
|
||
versions << [ | ||
'kafka': '2.8.2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite old (September 2022 according to https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients). Why not use the newest available?
Description
This PR implements the basics of the pull-based ingestion described in this RFC, including:
Currently WIP, and there are a few improvements to make and test coverage to increase
Related Issues
Resolves #16927 #16929 #16928
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.