-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Draft Common Message Queue #46694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft Common Message Queue #46694
Conversation
Here is a very early draft PR to introduce and socialize the concept of a "common message queue" abstraction similar to the "Common SQL" and "Common IO" abstractions in Airflow. This will be a provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption. The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability. At this stage, this is very much a WIP draft intended to solicit input from the community.
Updated the Common Message Queue Readme with an example of an Event Driven Dag
Updated the message queue Operator and Sensor to fix an issue in my sync
Changed the Message Queue Sensor Operator to be a Deferrable Trigger
Fixed typos and import errors in the MsgQueueHook
|
Implementation wise, here is my thinking. I am starting by Given |
Updated invocation of MsqQueueSensorTrigger to MsgQueueTrigger in example invocation
You are right Vincent. I did think about the "Composition vs. Inheritance" approach tradeoff. The composition style interface as defined here is easier for the DAG author, but more maintenance for us. |
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. Some more nit that I would have on the Python code/Interface but we can leave this until it is in real review.
Would be great to add an example DAG as well for the showcase.
providers/common/msgq/src/airflow/providers/common/msgq/operators/msg_queue.py
Outdated
Show resolved
Hide resolved
|
I am iterating on that PR but the new provider is not recognized. I get: With the new restructure, what is the process to add a new provider? Do I just need to create |
|
I updated the PR. I focused only on the trigger side. Please let me know if this is what you had in mind in terms of implementation regarding the trigger. I really see it as a proxy of the provider triggers. I could not test it because the new provider is not recognized but once that solved I should be able to test it. |
You need to look at the main And yes I updated https://github.com/apache/airflow/blob/main/providers/MANAGING_PROVIDERS_LIFECYCLE.rst#creating-a-new-community-provider - with the new structure and how to add a new provider, but that part is likely missing so after you figure it out, PRs there are most welcome. BTW. It will likely slightly change in the future as we will move airflow-core and others, but still it would be great to keep it updated. |
|
Generally @vincbeck -> look at everything below |
|
Thank you :D |
|
I dont understand what Sphinx is complaining about: |
Sphinx is - as usual - speaking riddles :) . It means that there is index.rst file generated by autoapi (base_provider module) and that index is not mentioned anywhere. This means that you have to add it to some "table of content" file and refer to it - because otherwise that file is not reachable from anywhere. And it means some documentation is missing to explain what it is - usually a reference doc (see in other providers) |
|
I think it would be great to somehow explain that "toctree" better :) |
|
Likely some documentation about base_provider should be added here https://github.com/apache/airflow/blob/f4fd6fd5ae45cd20924149aa0201d2da08a63112/providers/common/messaging/docs/providers.rst
|
It is already there: https://github.com/apache/airflow/pull/46694/files#diff-f54feaaca8fd8ecfad946ef2cc5b389e082660ba53d305843bab44f5a014d582R36 |
So if the index to the "init.py" is not linked (and does not need to be linked) from anywhere - it should be excluded in "docs/conf.py" explicitly for provider package builds. |
|
And yes - I reverse engineered it having similar issues. Likely it should be done bettter, so we do not have to do it manually. |
Yeah I did not want to do that but I think I'll do that, I really cannot find a solution. What I do not understand is there are a lot of modules in others providers that are not documented (for good reasons like the module But anyway, thanks for your help and I'll add these 2 paths to |
They are likely referred to in class docstrings or others. The thing is that if your class or module is not referred ANYWHERE - the only way you can reach it is by direct URL. And this is what Sphinx complains about. |
It is probably that! Makes a bit more sense in all that Sphinx dialect :) Thanks |
|
All green :) I also tested it manually and triggered few DAGs using |
|
NICE! |
This is provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption. The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability. --------- Co-authored-by: vincbeck <vincbeck@amazon.com> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
This is provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption. The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability. --------- Co-authored-by: vincbeck <vincbeck@amazon.com> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Here is a very early draft PR to introduce and socialize the concept of a "common message queue" abstraction similar to the "Common SQL" and "Common IO" abstractions in Airflow.
This will be a provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption.
The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability.
At this stage, this is very much a WIP draft intended to solicit input from the community.