-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source Amazon SQS: New connector #6937
Conversation
Right now In the unit tests I am using moto to create mock AWS services, but I'm not sure how this works with the integration/acceptance testing method. All other standard tests pass.
|
Have managed to get a successful run of the acceptance & units tests - output below
|
awesome @sdairs I'm going to review and test your connector later today. |
### Deletes | ||
Optionally, it can delete messages after reading - the delete_message() call is made __after__ yielding the message to the generator. | ||
This means that messages aren't deleted unless read by a Destination - however, there is still potential that this could result in | ||
missed messages if the Destination fails __after__ taking the message, but before commiting to to its own downstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sherifnada can you review this feature from Amazon SQS connector. Make sense enable this to an Airbyte Connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to do this exactly for the reason mentioned. I would suggest we skip deleting messages. But my question then is: how do we do incremental sync? Is it possible to only read new messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not be possible to do incremental sync without delete as there is no concept of offsets in SQS - I think it's an important feature to have, but we should make the caveats clear with warnings & further details in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit and Integration Tests are running with CI Sandbox account. I'll wait @sherifnada feedback about the delete feature.
Can you move the boostrap.md to the connector doc page? Makes more sense when people are trying to setting up the connector already read this info.
Awesome thanks for reviewing. The delete message is a standard SQS feature that I think should be supported - it is false by default to avoid accidental deletes, but I would expect most use cases for SQS would want deletes enabled. |
There is of course potential that messages could be lost with Delete enabled - in that, if the Destination consumes the message, and then fails before commiting it to whatever is downstream, the Source will have deleted it after yielding it - in my limited understand of Airbyte so far, there is no persistence of messages that are 'in flight' i.e. being processed by a Destination or Source - so there would be no way to cover this message and have the Destination resume from where it left off. A more robust method would be some sort of callback handle that a Destination could call after committing the message downstream, to notify the Source that the message is persisted and can be deleted. But I don't know if this is something that is possible - and would have performance considerations. |
### Deletes | ||
Optionally, it can delete messages after reading - the delete_message() call is made __after__ yielding the message to the generator. | ||
This means that messages aren't deleted unless read by a Destination - however, there is still potential that this could result in | ||
missed messages if the Destination fails __after__ taking the message, but before commiting to to its own downstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to do this exactly for the reason mentioned. I would suggest we skip deleting messages. But my question then is: how do we do incremental sync? Is it possible to only read new messages?
airbyte-integrations/connectors/source-amazon-sqs/source_amazon_sqs/spec.json
Show resolved
Hide resolved
airbyte-integrations/connectors/source-amazon-sqs/source_amazon_sqs/spec.json
Show resolved
Hide resolved
Have added explicit warnings to Spec and further details to Readme to cover the delete option & potential data loss scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for your contribution @sdairs !
* Initial commit, working source with static Creds * Typo in example queue url * Adds auto delete of messages after read * Adds visibility timeout * remove insecure comments from AWS IAM Key spec * explicitly set supported sync modes * explicit sync mode should be lower case * Adds unit tests for check, discover, read * remove incremental acceptance test block * remove incremental from conf catalog sample * remove test requirement moto from main req * align int catalog sample with sample_files * fixing catalog configs * acceptance testing config * adds expected records txt * automated formatting changes * remove expected records block from acpt test * Adds Docs page * Ammends formatting on readme * Adds doc link to summary * Improve error handling & debug logging * Adds bootstrap.md * Add a todo suggestion for batch output * Adds SQS to integrations readme list * lower case properties * removed unused line * uses enum for aws region * updates sample configs to use lowercase * required props to lower case * add missed property to lowercase * gradle formatting * Fixing issues from acceptance tests * annotate secrets in spec.json with airbyte_secret * Adds explicit warnings about data less when using Delete Message option
What
Adds an Amazon SQS Source connector
Has Unit tests for read/check/discover with mock AWS resources to simulate behaviour without needing to create a real SQS topic or IAM roles
airbytehq/connector-contest#68
How
Connector built with Python CDK
Uses python boto3 for AWS functionality
Uses python moto for mock-AWS resources
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes