Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AWS SQS Sensor (#16880) #16904

Merged
merged 6 commits into from
Aug 2, 2021
Merged

Improve AWS SQS Sensor (#16880) #16904

merged 6 commits into from
Aug 2, 2021

Conversation

baolsen
Copy link
Contributor

@baolsen baolsen commented Jul 9, 2021

Improve the AWS SQS Sensor as follows

  • Add optional visibility_timeout parameter
  • Add a customisable / overrideable filter capability so we can filter/ignore irrelevant messages

closes: #16880

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jul 9, 2021
setup.cfg Outdated Show resolved Hide resolved
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small comment to move the jsonpath_ng dependency to amazon specific provider dependencies.

airflow/providers/amazon/aws/sensors/sqs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/sensors/sqs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/sensors/sqs.py Outdated Show resolved Hide resolved
@baolsen baolsen requested review from potiuk and uranusjr July 16, 2021 08:30
@baolsen
Copy link
Contributor Author

baolsen commented Jul 16, 2021

Review comments have been addressed and I rebased with main.

@baolsen
Copy link
Contributor Author

baolsen commented Jul 26, 2021

Review comments 2 have been addressed and I rebased with main.

@baolsen
Copy link
Contributor Author

baolsen commented Jul 30, 2021

Random CI issues. Rebasing & pushing again

@uranusjr uranusjr self-requested a review July 31, 2021 05:33
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixups! (I don’t know enough about SQS to comment further)

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jul 31, 2021
@potiuk
Copy link
Member

potiuk commented Jul 31, 2021

There is a real error: ImportError: cannot import name 'Literal'

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literal cannot be imported.

@baolsen
Copy link
Contributor Author

baolsen commented Jul 31, 2021

Thanks @potiuk . Must have missed that, will get it fixed.
I've been checking the CI but it seems quite flaky. Makes it hard to spot the signal from the noise. Is it just me or a more general issue?

@potiuk
Copy link
Member

potiuk commented Jul 31, 2021

Thanks @potiuk . Must have missed that, will get it fixed.
I've been checking the CI but it seems quite flaky. Makes it hard to spot the signal from the noise. Is it just me or a more general issue?

Recently we had more issues than usual and working on getting the noise out .. Sorry for that, it's difficult to keep such a big project with such many dependencies in "excellent" shape all the times, but recently the shape was indeed "bad". Hopefully we will move it to "pretty good", and there will be hopefully long times of "really good". This takes a lot of time not only to get there, but also to keep it there.

@potiuk
Copy link
Member

potiuk commented Aug 1, 2021

Seems like there is mostly recurring problem with Integration test - one more rebase could probably give an indication if is a real error

@baolsen
Copy link
Contributor Author

baolsen commented Aug 2, 2021

Great. I've rebased & pushed. Comparison below.

Edit:
I wanted to compare the previous run to the current run, but both the runs now redirect to the same run. I realised that when I looked at the previous run and it had a redis-related failure, but now I can't find it. I'm not sure where to view the previous run for comparison. I've listed the failures in the run anyway in case we want to redo the CI (again), then we can compare.

TL/DR: From what I can see the below failures are unrelated to my changes.

Previous run:
https://github.com/apache/airflow/pull/16904/checks?check_run_id=3208579465

Current run:
https://github.com/apache/airflow/pull/16904/checks?check_run_id=3217819682

Postgres9.6,Py3.8: Always API Core Other CLI Providers WWW Integration
Failures: CORE tests/jobs/test_scheduler_job.py

Postgres13,Py3.9: Always API Core Other CLI Providers WWW Integration
Failures: CORE tests/jobs/test_scheduler_job.py

MySQL5.7, Py3.6: Always API Core Other CLI Providers WWW Integration
Failures: Error: Process completed with exit code 137. ??

MySQL8, Py3.8: Always API Core Other CLI Providers WWW Integration
Failures: INTEGRATION : TLS timeout

MSSQL2019-latest, Py3.7: Always API Core Other CLI Providers WWW Integration
Failures: CORE : tests/jobs/test_scheduler_job.py

Sqlite Py3.6: Always API Core Other CLI Providers WWW Integration
Failures: INTEGRATION: No failure log ??

Sqlite Py3.9: Always API Core Other CLI Providers WWW Integration
Failures : TLS timeout

Helm Chart; KubernetesExecutor (pull_request)
Failures: No failure log ??

Helm Chart; CeleryExecutor (pull_request)
Failures: No failure log ??

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

LGTM. And tests seem to be intermittent.

@potiuk potiuk merged commit d28efbf into apache:main Aug 2, 2021
@baolsen
Copy link
Contributor Author

baolsen commented Aug 3, 2021

Hey @potiuk thanks for the review and merge.

I have one quick question regarding CI.
See attached screenshot from a different PR.

It seems that the "wait for images" job is running while the "build images" job is queued.
I would have thought the wait for images can't run until build images is done.... What am I missing?

ci.yml has ci-images needs build-info, but this isn't what I'm seeing.

image

@potiuk
Copy link
Member

potiuk commented Aug 3, 2021

The problem is that we can't do it differently until we have better support from GitHub for cross-workflow communication.

All details are described in CI.rst but here is the gist of it :

  • The PR workflows have by definition only 'read ' access to the GitHub Repo of Airflow. They cannot have 'write' access because it opens up security issue possibility ( where anyone running a PR could modify Airflow Repo)

  • On the other hand we need to build (and share) ci docker images from the incoming PR so that all jobs can use those. This is needed because we have different kinds of jobs (tests/building providers / testing images / helm tests etc. - having common CI docker images with all the sources and what is more important dependencies built in make it possible to have common 'execution' environment for all tests. The problem with it is that if we have read only access in PR workflow, we cannot build such image only once and share it with other jobs because we cannot store the image anywhere. Each build job runs on separate machine and ideally you just build the images once, push it somewhere and every job uses this image. That could save a LOT of build time - especially when someone adds s new dependency (which does not happen very often but often enough)

  • Therefore we run building image in separate workflow (pull_request_target) where we have write access, but also ci workflow and scripts are coming fron 'main' branch - so no risk of someone's PR injecting something in our repo. Those 'Build image' workflows can build and push image with write permissions.

  • The 'pull request targer' workflow is triggered in parallel to 'ci' workflow. currently GitHub Action does not have a feature to add any 'cross workflew' dependency so we cannot wait in one workflow for the result of another (explicitly). As a workaround 'wait for image' job from ci workflow simply actively polls for the images produced by 'build image' workflow. It is a waste. Active job running and checking if images are already pushed. But there is no other way. Action do not yet have a possibility of 'pausing' and 'resuming' workflows. It is planned but not yet there

  • Also 'build image' workflow is 'safe' so we run those using our self-hosted runners while the 'ci build' is potentially unsafe (GitHub detected people using actions to mine Bitcoin(!) ) so we need to run them using Public GitHub Runners which get isolation and monitoring from GiyHub. There we have 150 slots in queue for all Apache projects (> 350 of them) which make those jobs susceptible to long queues. On the other hand we also limit our 'build image' job queue as we have limited funds from Astronomer and AWS to run those jobs in Amazon infrastructure (GCP also donated credits but we need to switch to GCP). That's why those queues run at different capacity and one is sometimes faster than the other ( in either direction).

In short 'build' and 'wait' are designed to run in parallel in any sequence and we have no way currently to enforce the sequence they run in.

I hope it makes things clearer :) (or maybe reveal the hidden complexities we are dealing with )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve AWS SQS Sensor
3 participants