Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The item feeder consists of this PR and Lambda function.
There're some things to discuss, so I'm writing everything here.
Generally, the main idea is that deleted items must be flagged and kept in ES index for a while, because, otherwise they can be unexpectedly recreated by out of order and concurrent item operations.
The item feeding system is currently designed to use a standard (not FIFO guaranteed) SQS queue, but even though it would be a FIFO guaranteed queue, it won't solve the out of order index/delete messages problem. Because there is no guarantee that messages from dataserver will reach SQS in the right order, nor there is a guarantee that Lambda functions will consume messages in the right order.
Disordered
add
ormodify
messages, doesn't matter, but if mixed with a disordereddelete
message, deletion can happen (and fail) before creation, which results to a still existing item.Although there's an ES built-in mechanism that could help, it actually doesn't fit in our case.
It works like this: when a document is deleted, its version is automatically incremented, no matter if internal or external versioning is used. The document id and version still exists for as long as
index.gc_deletes
is configured, and can be used to do version checks when indexing an item with the same id. Defaults to 60s.Here are various examples explaining various situations when an item is deleted or operations are out of order:
So the idea is that when dataserver deletes an item, ES, instead of deleting it too, must index its id, version, deletion flag, and the time when it was deleted. After a safe time period those items can be automatically deleted from ES.