Skip to content

Comments

Add Google Cloud Ray Job integration#59558

Merged
shahar1 merged 2 commits intoapache:mainfrom
VladaZakharova:ray-job-operators
Feb 10, 2026
Merged

Add Google Cloud Ray Job integration#59558
shahar1 merged 2 commits intoapache:mainfrom
VladaZakharova:ray-job-operators

Conversation

@VladaZakharova
Copy link
Contributor


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers kind:documentation provider:google Google (including GCP) related issues labels Dec 17, 2025
@VladaZakharova VladaZakharova force-pushed the ray-job-operators branch 2 times, most recently from 2eaa156 to 74eff5f Compare January 12, 2026 17:48
@raphaelauv
Copy link
Contributor

Thanks for this contribution.

I'm thinking about the users of Ray that do not use the google cloud ( gcp )

Since Ray is not related to gcp, why not put ray operators in another provider and extends them in the gcp provider for Vertex specific conn

same way than the KPO and GKEStartPodOperator

wdyt ?

@jscheffl
Copy link
Contributor

There is a Ray provider maintained by Astronomer: https://github.com/astronomer/astro-provider-ray

@pankajkoti @pankajastro WDYT?

@VladaZakharova
Copy link
Contributor Author

Hi there!
Previously there was a discussion what to do with Ray in general and the decision was made to have it in Google provider, at least part of it that is actually connected to GCP. So this logic that I am adding is just a continuation of the one we already added for Vertex ai
https://lists.apache.org/thread/b94cwcmfqh32q99r3zprmwq05x8qb9wj

@tatiana
Copy link
Contributor

tatiana commented Jan 19, 2026

It’s great to see renewed interest and progress around improving how Ray runs in Airflow — thanks for the work here, @VladaZakharova!

I agree with @raphaelauv that Ray operators, sensors, and hooks would be better suited to a dedicated provider (either pythorch/ray or ray) rather than living in the Google provider. We’ve seen similar separations work well elsewhere, for example:

Any new provider should also align with AIP-95.

Between June and October 2024, Astronomer worked with a few customers on a proof-of-concept to expose Ray in Airflow, as referenced by @jscheffl in https://github.com/astronomer/astro-provider-ray. This was actually our second attempt in this space; there was an earlier company-led initiative back in 2021 as well.

Unfortunately, we (Astronomer, including @pankajkoti and @pankajastro) don’t currently have the bandwidth to continue this work. That said, if @VladaZakharova or others are interested in picking it up, we’d be happy to hand over what we’ve built so far. We’d also be supportive of either this PR or the Astronomer provider serving as a starting point for Ray support as a new provider in the Apache Airflow repository, with the understanding that we wouldn’t be able to actively steer the effort.

We discussed a potential donation from this provider with the Anyscale team last Friday, but they don’t have the capacity to take this on right now. Given that PyTorch has recently adopted Ray, one possible next step could be reaching out to the PyTorch Foundation to see whether they are interested in supporting this workstream as well.

@potiuk
Copy link
Member

potiuk commented Jan 20, 2026

We discussed a potential donation from this provider with the Anyscale team last Friday, but they don’t have the capacity to take this on right now. Given that PyTorch has recently adopted Ray, one possible next step could be reaching out to the PyTorch Foundation to see whether they are interested in supporting this workstream as well.

Thanks @tatiana -> this is good idea, what I would propose is to follow up with the current "extensions" to the Google Provider, and maybe - as part of AIP-95 implementation we could start building the list of providers that people would like to see in Airflow, and generally ask if there are those who would like to lead introduction of those - in this case, that someone who would like to see a need for Ray provider, could take on the task on reaching out and finding whether PyTorch foundation would be interested in being a steward and make the initial effort to create such a provider.

I think we should gravitate towards the solution that we - as maintainers - should not do it, but people want to have it in Airflow should take the leadership on it - and we should empower and enable such people to build the "stewardship" around such provider. In this case - if @raphaelauv would like to see it, we should likely have a blueprint on how to approach it and what needs to be done as part of such "leadership". And anyone of course could be such leader - it does not have to be a PMC member, or even the actual steward - it could be someone who wil encourage and convince the stewards-to-be to spend some time on it and commit to maintenance.

I think the spirit of AIP-95 is such that we should do a lot to make it easy to accept such providers, and do a lot to setup a framework around it, the actual part of a) creating the provider, b) building a stewardship around it should be led by those who want such providers to be part of Airlfow, and contribute a bit of their leadership and energy to make it happen. This would also make this whole process much more scalable and long-term maintainable. @vikramkoka @kaxil - as you were in the initial discussions we had on it -> would you agree that seems like a good way to frame it?

@raphaelauv -> would you be willing to take it on and try to be one of the first who try it?

We also need to get some first attempts of "finding out" how to do it to learn what is needed, so I guess more maintainer's help will be needed here and I am happy to support anyone who will take on such a task. Also I would see @VladaZakharova and her team to be participating there in finding good ways how to do proper coupling with such a provider when it's going to appear - and possibly @tatiana and her team could help by sharing their learning (or even code) with whoever who will agree to be the steward - at least initially.

@VladaZakharova
Copy link
Contributor Author

hi @potiuk
Thanks for your comment.
Yes, this PR is intentionally scoped to integration and does not attempt to introduce or commit to a standalone Ray provider.
Please, tell me what do you think about the changes themselves, maybe I can add more lines like "experimental", or "integration helper"

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the suggested changes, as part of a "clean-up" of all Google provider open PRs. Please handle them and ping me when you're done.
I don't have much to add regarding the support of Ray as a provider - however, until that happens, I think that each dependent provider should handle for the API calls to Ray separately.

@VladaZakharova VladaZakharova force-pushed the ray-job-operators branch 2 times, most recently from f9498f7 to 2f3550a Compare February 2, 2026 11:29
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, some tests still fail (you may ignore the celery test, as it seems unrelated):

FAILED providers/google/tests/unit/google/cloud/hooks/test_ray.py::TestRayJobHook::test_get_job_status - AttributeError: 'object' object has no attribute 'value'

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go! Please replace further usages of AirflowException from the operator, and I'll approve and merge.
Also, I'll appreciate if you could address my question regarding the RuntimeError (as I said the comment - not a blocker neither approval nor merging, from my perspective).

@shahar1 shahar1 dismissed their stale review February 6, 2026 09:31

Critical changes were handled

@VladaZakharova
Copy link
Contributor Author

VladaZakharova commented Feb 9, 2026

Almost good to go! Please replace further usages of AirflowException from the operator, and I'll approve and merge. Also, I'll appreciate if you could address my question regarding the RuntimeError (as I said the comment - not a blocker neither approval nor merging, from my perspective).
i will remove it, thank you

@shahar1
Copy link
Contributor

shahar1 commented Feb 10, 2026

Was about to merge it, but then there was a conflict.
I handled it, and I'll merge it once the CI is green :)

@shahar1 shahar1 changed the title Add new operators for Jobs on Ray Add Google Cloud Ray operators Feb 10, 2026
@shahar1 shahar1 changed the title Add Google Cloud Ray operators Add Google Cloud Ray Job integration Feb 10, 2026
@shahar1 shahar1 merged commit 0085ca6 into apache:main Feb 10, 2026
129 checks passed
@jscheffl
Copy link
Contributor

It seems the merge broke Python 3.13 tests - in CI if nothing special is described only Python 3.10 is tested. See https://github.com/apache/airflow/actions/runs/21879965340/job/63161114299

Can somebody check if we need to revert? I assume we can not go and exclude the full Google provider from Python 3.13 for all other features...

@jscheffl
Copy link
Contributor

I assume cuplit is https://github.com/apache/airflow/blob/main/providers/google/pyproject.toml#L83 -any reason limiting ray version in Python 3.13? Then the dependency is just missing in Py 3.13 tests...

@shahar1
Copy link
Contributor

shahar1 commented Feb 10, 2026

It seems the merge broke Python 3.13 tests - in CI if nothing special is described only Python 3.10 is tested. See https://github.com/apache/airflow/actions/runs/21879965340/job/63161114299

Can somebody check if we need to revert? I assume we can not go and exclude the full Google provider from Python 3.13 for all other features...

Thanks for tracking that!
I'll try to figure it out - probably excluding only the Ray operator for Python 3.13 would do, if it's possible to do the granularly

Edit: posted this before your last message, I'll figure it out.

@jscheffl
Copy link
Contributor

Test baloon: #61749

@shahar1 shahar1 mentioned this pull request Feb 10, 2026
1 task
potiuk added a commit to potiuk/airflow that referenced this pull request Feb 10, 2026
The apache#59558 added Ray to google provider but so far Python 3.13 does
not allow to install ray as it has conflicting dependencies.

We should turn ray into optional feature.
Alok-kumar-priyadarshi pushed a commit to Alok-kumar-priyadarshi/airflow that referenced this pull request Feb 11, 2026
Ratasa143 pushed a commit to Ratasa143/airflow that referenced this pull request Feb 15, 2026
choo121600 pushed a commit to choo121600/airflow that referenced this pull request Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants