Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed items to SQS queue #86

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

mrtcode
Copy link
Member

@mrtcode mrtcode commented Oct 19, 2018

The item feeder consists of this PR and Lambda function.

There're some things to discuss, so I'm writing everything here.

Generally, the main idea is that deleted items must be flagged and kept in ES index for a while, because, otherwise they can be unexpectedly recreated by out of order and concurrent item operations.

The item feeding system is currently designed to use a standard (not FIFO guaranteed) SQS queue, but even though it would be a FIFO guaranteed queue, it won't solve the out of order index/delete messages problem. Because there is no guarantee that messages from dataserver will reach SQS in the right order, nor there is a guarantee that Lambda functions will consume messages in the right order.

Disordered add or modify messages, doesn't matter, but if mixed with a disordered delete message, deletion can happen (and fail) before creation, which results to a still existing item.

Although there's an ES built-in mechanism that could help, it actually doesn't fit in our case.
It works like this: when a document is deleted, its version is automatically incremented, no matter if internal or external versioning is used. The document id and version still exists for as long as index.gc_deletes is configured, and can be used to do version checks when indexing an item with the same id. Defaults to 60s.

Here are various examples explaining various situations when an item is deleted or operations are out of order:

Example 1:
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6,_version_type:'external_gt'}} Item indexed with version=6
{delete: {_index:'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6}} Item is deleted but its id and version=7 internally still exists for 60s


Example 2:
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6,_version_type:'external_gt'}} Item indexed with version=6
{delete: {_index:'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6}} Item is deleted. Id and version stays for 60s. version=7
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:7,_version_type:'external_gt'}} Item fails if less than 60s passed, otherwise succeeds.


Example 3:
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6,_version_type:'external_gt'}} Item indexed with version=6
{delete: {_index:'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6}} Item is deleted. Id and version stays for 60s. version=7
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:8,_version_type:'external_gt'}} Indexing succeeds because 7<8


Example 4 (out of order operations, deletion comes before update, item exists even though it was deleted):
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:5,_version_type:'external_gt'}} Item indexed with version=5
{delete: {_index:'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6}} Deletion fails, because version doesn't match
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:6,_version_type:'external_gt'}} Succeeds because 5<6


Example 5 (out of order operations, deletion comes before creation, item exists even though it was deleted):
{delete: {_index:'idx',_type:'tp',_id:'1/WCBVNWB9',_version:3}} Deletion fails, because nor document nor version exists
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:1,_version_type:'external_gt'}} Item is indexed
{index: {_index: 'idx',_type:'tp',_id:'1/WCBVNWB9',_version:2,_version_type:'external_gt'}} Item is indexed

So the idea is that when dataserver deletes an item, ES, instead of deleting it too, must index its id, version, deletion flag, and the time when it was deleted. After a safe time period those items can be automatically deleted from ES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant