Add Google Cloud Ray Job integration#59558
Conversation
providers/google/src/airflow/providers/google/cloud/hooks/ray.py
Outdated
Show resolved
Hide resolved
eeefe82 to
dd1be1e
Compare
2eaa156 to
74eff5f
Compare
|
Thanks for this contribution. I'm thinking about the users of Ray that do not use the google cloud ( gcp ) Since Ray is not related to gcp, why not put ray operators in another provider and extends them in the gcp provider for Vertex specific conn same way than the KPO and GKEStartPodOperator wdyt ? |
|
There is a Ray provider maintained by Astronomer: https://github.com/astronomer/astro-provider-ray @pankajkoti @pankajastro WDYT? |
|
Hi there! |
|
It’s great to see renewed interest and progress around improving how Ray runs in Airflow — thanks for the work here, @VladaZakharova! I agree with @raphaelauv that Ray operators, sensors, and hooks would be better suited to a dedicated provider (either Any new provider should also align with AIP-95. Between June and October 2024, Astronomer worked with a few customers on a proof-of-concept to expose Ray in Airflow, as referenced by @jscheffl in https://github.com/astronomer/astro-provider-ray. This was actually our second attempt in this space; there was an earlier company-led initiative back in 2021 as well. Unfortunately, we (Astronomer, including @pankajkoti and @pankajastro) don’t currently have the bandwidth to continue this work. That said, if @VladaZakharova or others are interested in picking it up, we’d be happy to hand over what we’ve built so far. We’d also be supportive of either this PR or the Astronomer provider serving as a starting point for Ray support as a new provider in the Apache Airflow repository, with the understanding that we wouldn’t be able to actively steer the effort. We discussed a potential donation from this provider with the Anyscale team last Friday, but they don’t have the capacity to take this on right now. Given that PyTorch has recently adopted Ray, one possible next step could be reaching out to the PyTorch Foundation to see whether they are interested in supporting this workstream as well. |
Thanks @tatiana -> this is good idea, what I would propose is to follow up with the current "extensions" to the Google Provider, and maybe - as part of AIP-95 implementation we could start building the list of providers that people would like to see in Airflow, and generally ask if there are those who would like to lead introduction of those - in this case, that someone who would like to see a need for Ray provider, could take on the task on reaching out and finding whether PyTorch foundation would be interested in being a steward and make the initial effort to create such a provider. I think we should gravitate towards the solution that we - as maintainers - should not do it, but people want to have it in Airflow should take the leadership on it - and we should empower and enable such people to build the "stewardship" around such provider. In this case - if @raphaelauv would like to see it, we should likely have a blueprint on how to approach it and what needs to be done as part of such "leadership". And anyone of course could be such leader - it does not have to be a PMC member, or even the actual steward - it could be someone who wil encourage and convince the stewards-to-be to spend some time on it and commit to maintenance. I think the spirit of AIP-95 is such that we should do a lot to make it easy to accept such providers, and do a lot to setup a framework around it, the actual part of a) creating the provider, b) building a stewardship around it should be led by those who want such providers to be part of Airlfow, and contribute a bit of their leadership and energy to make it happen. This would also make this whole process much more scalable and long-term maintainable. @vikramkoka @kaxil - as you were in the initial discussions we had on it -> would you agree that seems like a good way to frame it? @raphaelauv -> would you be willing to take it on and try to be one of the first who try it? We also need to get some first attempts of "finding out" how to do it to learn what is needed, so I guess more maintainer's help will be needed here and I am happy to support anyone who will take on such a task. Also I would see @VladaZakharova and her team to be participating there in finding good ways how to do proper coupling with such a provider when it's going to appear - and possibly @tatiana and her team could help by sharing their learning (or even code) with whoever who will agree to be the steward - at least initially. |
|
hi @potiuk |
There was a problem hiding this comment.
I've reviewed the suggested changes, as part of a "clean-up" of all Google provider open PRs. Please handle them and ping me when you're done.
I don't have much to add regarding the support of Ray as a provider - however, until that happens, I think that each dependent provider should handle for the API calls to Ray separately.
providers/google/src/airflow/providers/google/cloud/operators/ray.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/operators/ray.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/operators/ray.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/hooks/ray.py
Outdated
Show resolved
Hide resolved
providers/google/tests/system/google/cloud/ray/example_ray_job.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/operators/ray.py
Outdated
Show resolved
Hide resolved
f9498f7 to
2f3550a
Compare
shahar1
left a comment
There was a problem hiding this comment.
Almost there, some tests still fail (you may ignore the celery test, as it seems unrelated):
FAILED providers/google/tests/unit/google/cloud/hooks/test_ray.py::TestRayJobHook::test_get_job_status - AttributeError: 'object' object has no attribute 'value'
2f3550a to
ef81c33
Compare
There was a problem hiding this comment.
Almost good to go! Please replace further usages of AirflowException from the operator, and I'll approve and merge.
Also, I'll appreciate if you could address my question regarding the RuntimeError (as I said the comment - not a blocker neither approval nor merging, from my perspective).
|
ef81c33 to
6747593
Compare
|
Was about to merge it, but then there was a conflict. |
|
It seems the merge broke Python 3.13 tests - in CI if nothing special is described only Python 3.10 is tested. See https://github.com/apache/airflow/actions/runs/21879965340/job/63161114299 Can somebody check if we need to revert? I assume we can not go and exclude the full Google provider from Python 3.13 for all other features... |
|
I assume cuplit is https://github.com/apache/airflow/blob/main/providers/google/pyproject.toml#L83 -any reason limiting ray version in Python 3.13? Then the dependency is just missing in Py 3.13 tests... |
Thanks for tracking that! Edit: posted this before your last message, I'll figure it out. |
|
Test baloon: #61749 |
The apache#59558 added Ray to google provider but so far Python 3.13 does not allow to install ray as it has conflicting dependencies. We should turn ray into optional feature.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.