Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new sink): aws sqs #4675

Merged
merged 35 commits into from
Nov 18, 2020
Merged

feat(new sink): aws sqs #4675

merged 35 commits into from
Nov 18, 2020

Conversation

fanatid
Copy link
Contributor

@fanatid fanatid commented Oct 21, 2020

The successor of #2755.

Sorry, @francescop it was hard to rebase/merge, so I copied your code from PR (hope that's ok?).

Verified

This commit was signed with the committer’s verified signature. The key has expired.
HighCrit HighCrit
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>

Verified

This commit was signed with the committer’s verified signature. The key has expired.
HighCrit HighCrit
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
@fanatid fanatid added domain: sinks Anything related to the Vector's sinks provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality. labels Oct 21, 2020
@fanatid fanatid added this to the 2020-10-12: Son of Flynn milestone Oct 21, 2020
@fanatid fanatid self-assigned this Oct 21, 2020
@fanatid fanatid mentioned this pull request Oct 21, 2020
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see this resurrected!

In general this looks pretty good. I have a few questions:

With regards to our use of the SendMessageBatch API here:

  • For SendMessageBatch, AWS can return that it only ingested some of the messages (https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessageBatch.html). It looks like we are just taking any successful response as an indication that all of the messages in the batch were processed, but it could be the case that 1 or more of them errored and should be reprocessed.
  • I realize the risk of collision is low, but I think it'd be better to ensure that the id of each message in the batch is unique. You could just number 0-9 since the ids only need to be unique within a batch.

If we don't want to take on the complexity of handling partially processed batches right now, we could opt to just use SendMessage. This will result in slower processing, of course.

For FIFO queues, I think it is a bit odd to use a unique message_group_id for each batch of messages as this could result in out-of-order processing of messages by consumers of the SQS queue. I might make this option configurable by the user so that they could set it to something like the file name (if they are using the file source) to ensure that they process the messages for a given file in-order. We could default it to just vector or maybe the hostname.

Can you explain how this handles failed SQS messages that are not retried? Will it leave the messages in the buffer for that sink to be retried later?

docs/reference/components/sinks/aws_sqs.cue Outdated Show resolved Hide resolved
src/internal_events/aws_sqs.rs Outdated Show resolved Hide resolved
src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
@fanatid fanatid requested a review from jszwedko November 10, 2020 15:52
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This is much simpler as an initial implementation.

I left a few more comments, but nothing major.

src/internal_events/aws_sqs.rs Outdated Show resolved Hide resolved
use metrics::counter;

#[derive(Debug)]
pub struct AwsSqsEventSent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an event for failed sends too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you think, should we log only RusotoError::Service in the event?

src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
src/sinks/aws_sqs.rs Show resolved Hide resolved
src/sinks/aws_sqs.rs Show resolved Hide resolved
src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
}

configuration: {
queue_url: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me that I have different SQS configuration in the aws_s3 source #4779 where, instead of queue_url, I have name and owner_id and then use GetQueueUrl. We should probably do one way or the other consistently. I'll change mine since it results in one less configuration option.

Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
jszwedko added a commit that referenced this pull request Nov 10, 2020
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all of the updates! Just a few last comments, but this is close.

src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
match error {
RusotoError::HttpDispatch(_) => true,
RusotoError::Unknown(res) if res.status.is_server_error() => true,
_ => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle AWS error codes here too? Throttling and service unavailable.

It looks like rusoto does not handle this transparently like some other AWS SDKs do: rusoto/rusoto#234

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that we can do more that other aws sinks do...
I also noted that only aws_cloudwatch_metrics handle http::StatusCode::TOO_MANY_REQUESTS. I think we should fix other aws sinks (in another PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 You are right. I didn't realize that the "common" AWS errors are not propagated: rusoto/rusoto#605

We'll need to wait for that issue to do more.

src/sinks/aws_sqs.rs Outdated Show resolved Hide resolved
@jszwedko
Copy link
Member

Noting for posterity that I did try this out locally with a test SQS queue and it looks good.

Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for all of your work on this.

@fanatid fanatid merged commit 287a4d8 into master Nov 18, 2020
@fanatid fanatid deleted the aws-sqs-sink branch November 18, 2020 06:25
@fanatid fanatid mentioned this pull request Nov 18, 2020
@binarylogic
Copy link
Contributor

Nice work @fanatid

@fanatid fanatid added the sink: aws_sqs Anything `aws_sqs` sink related label Nov 18, 2020
casserni pushed a commit that referenced this pull request Nov 20, 2020
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: casserni <nicholascassera@gmail.com>
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks provider: aws Anything `aws` service provider related sink: aws_sqs Anything `aws_sqs` sink related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants