Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Kedro compatible with Airflow 2.4.2 #2030

Merged
merged 32 commits into from
Nov 30, 2022
Merged

Make Kedro compatible with Airflow 2.4.2 #2030

merged 32 commits into from
Nov 30, 2022

Conversation

rxm7706
Copy link
Contributor

@rxm7706 rxm7706 commented Nov 16, 2022

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

To Make Kedro Compatible with Airflow 2.4.2
kedro-org/kedro-plugins#73

Development notes

Updates the requirements on attrs to permit the latest version.
Changed dependency/requirements.txt
From attrs~=21.3 to attrs >=22.1.0,<23.0

Development notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • [] Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • [] Added tests to cover my changes

Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
@rxm7706 rxm7706 marked this pull request as ready for review November 16, 2022 09:59
@rxm7706 rxm7706 requested a review from idanov as a code owner November 16, 2022 09:59
@noklam
Copy link
Contributor

noklam commented Nov 16, 2022

Thanks for the PR.

Since airflow isn't a core dependency of Kedro, I think it is ok to lift the upper bound to make it compatible with airflow, but at the same time we should keep the lower bound. There could be some other libraries pinned a lower version and kedro should allow that.

What do you think?

@rxm7706
Copy link
Contributor Author

rxm7706 commented Nov 16, 2022

Thanks for the PR.

Since airflow isn't a core dependency of Kedro, I think it is ok to lift the upper bound to make it compatible with airflow, but at the same time we should keep the lower bound. There could be some other libraries pinned a lower version and kedro should allow that.

What do you think?

@noklam I thought about using attrs >=21.3,<23.0 ; but looking at the limited Kedro code dependency and usage of attrs ; and assuming Airflow compatibility was more important - I went with >=22.1.0,<23.0.

But thinking through it a bit more, you have a good point - Pinning the lower bound of attrs so high, will actually break compatibility with older versions of airflow (e.g. airflow 2.2)

I think a better pin will be attrs >=20.0,<23.0 - which will give us backward compatibility to Airflow 2.0 & Airflow 1.X ; from the time attrs was introduced as an airflow dependency. Let me give that a shot - attrs >=20.0,<23.0

Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
@noklam
Copy link
Contributor

noklam commented Nov 16, 2022

@rxm7706 attrs was always a 2nd-level of dependencies until couple of months ago I added it into our core dependency. I think we are using some features only introduced in 21.3, thus the pinned version. Feel free to try it out tho!

See the release note of attrs here:
https://github.com/python-attrs/attrs/releases

attrs>=20.0, <23.0  # The minimum version required for Airflow 2 is >=20.0 and Airflow 2.4.2 is >=22.1
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
@rxm7706
Copy link
Contributor Author

rxm7706 commented Nov 16, 2022

@noklam Is it odd that only one set to tests failed - Just E2E Windows and Python ; and it seems to be a connection time out while launching a notebook, Attrs bounds itself don't seem to have caused any failures

https://app.circleci.com/pipelines/github/kedro-org/kedro/15100/workflows/6d48b1bb-790c-4d42-ab2c-1256c5472630/jobs/192465

Is there a way to rerun the CI suite ? Without a resorting to a PR just to trigger it.

@noklam
Copy link
Contributor

noklam commented Nov 16, 2022

@rxm7706 The only reason it works is that it is installing the latest version. If you want to test the lower bound, you need to pin attrs==20.1.0 to make sure it installed the lower version.

See this screenshot from the CI, and check out the Pip freeze section that lists what versions are installed.
image

I suggest keeping the 21.3 lower bound, I test it manually and it is specified in the release note that is a newly added API.

Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Rxm7706 patch 2
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
@rxm7706
Copy link
Contributor Author

rxm7706 commented Nov 16, 2022

@noklam changes made as suggested attrs>=21.3, <23.0
The failure in CI is again jupyter launch timeout
https://app.circleci.com/pipelines/github/kedro-org/kedro/15106/workflows/bcb867ba-061d-4d47-b4c1-863ec8e9fd4b/jobs/192591

@noklam
Copy link
Contributor

noklam commented Nov 16, 2022

@rxm7706 Thank you :) I have re-trigger the CI pipeline, let's see if it works. It fails occasionally and that may be a CircleCI issue.

@noklam noklam linked an issue Nov 21, 2022 that may be closed by this pull request
Copy link
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good overall, but left a couple small comments and thoughts.

Also, can you shorten the PR title a bit, please? :) Maybe just drop the part after the hyphen.

RELEASE.md Outdated Show resolved Hide resolved
@@ -1,5 +1,5 @@
anyconfig~=0.10.0
attrs~=21.3
attrs>=21.3, <23.0 # The minimum version required for Airflow 2 is >=20.0 and Airflow 2.4.2 is >=22.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't have an upper bound? https://www.attrs.org/en/22.1.0/changelog.html?highlight=upgrade#changelog They have a very strong backward-compatibility policy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure if we need to explain Airflow bounds here; the 21.3 is a Kedro-based lower bound, and we should just strive to be generally loose beyond that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to remove the upper bound since I don't remember attrs have ever created a problem for us. Though technically it wouldn't matter too much since it's a CalVer release.

@deepyaman deepyaman changed the title Make Kedro Compatible with Airflow 2.4.2 - Update attrs requirement from ~=21.3 to >=22.1.0,<23.0 in /dependency Make Kedro compatible with Airflow 2.4.2 - Update attrs requirement from ~=21.3 to >=22.1.0,<23.0 in /dependency Nov 21, 2022
@noklam noklam self-assigned this Nov 21, 2022
@rxm7706 rxm7706 changed the title Make Kedro compatible with Airflow 2.4.2 - Update attrs requirement from ~=21.3 to >=22.1.0,<23.0 in /dependency Make Kedro compatible with Airflow 2.4.2 Nov 21, 2022
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
@noklam
Copy link
Contributor

noklam commented Nov 24, 2022

@rxm7706 The DCO is failing, would you be able to fix it? I removed the upper bound, it should be fine unless the tests tell otherwise.

Copy link
Contributor Author

@rxm7706 rxm7706 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: rxm7706

Copy link
Contributor Author

@rxm7706 rxm7706 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: rxm7706

@rxm7706
Copy link
Contributor Author

rxm7706 commented Nov 25, 2022

@rxm7706 The DCO is failing, would you be able to fix it? I removed the upper bound, it should be fine unless the tests tell otherwise.

@noklam I can't seem to fix the DCO - which is a strange fail, because I used the github UI to approve & make the change - If you cant fix it, just let me know - might just close this PR and do a new one

daniel-falk and others added 19 commits November 25, 2022 14:08
This commit adds a video dataset that can read and write video files. The
dataset is backed by OpenCVs video reader and writer and different
buffer protocols such as PIL.Image and numpy.ndarray. There is one class
for iterables (e.g. list of frames) and one for generators. Since large
videos might not fit in memory, this ability allows us to read one frame
from the video file, transform the frame and write it to another video
file. Thus there is only need for one frame at a time in memory.

The different codecs that are supported depends on the codecs installed
and how OpenCV was compiled.

Signed-off-by: Daniel Falk <daniel.falk.1@fixedit.ai>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Updates the requirements on [fsspec](https://github.com/fsspec/filesystem_spec) to permit the latest version.
- [Release notes](https://github.com/fsspec/filesystem_spec/releases)
- [Commits](fsspec/filesystem_spec@2021.04.0...2022.11.0)

---
updated-dependencies:
- dependency-name: fsspec
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Updates the requirements on [pip-tools](https://github.com/jazzband/pip-tools) to permit the latest version.
- [Release notes](https://github.com/jazzband/pip-tools/releases)
- [Changelog](https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md)
- [Commits](jazzband/pip-tools@6.9.0...6.10.0)

---
updated-dependencies:
- dependency-name: pip-tools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
* Add execution_options to SQLQueryDataSet

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* Update RELEASE.md

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* Fix lint issues

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* Add a check in the unit test

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* Add test for connection reuse behaviour

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* Inject execution_options at load time

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>

* enhance doc and fix incorrect example

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

* More docs change

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

* Fixing typos

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

Signed-off-by: Clotilde Guinard <clotilde_guinard@hotmail.fr>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Co-authored-by: Nok <nok_lam_chan@mckinsey.com>
Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
* Add SVMLight DataSet

Add DataSet for svmlight/libsvm files using scikit-learn library as backend.

Resolves #1972

Signed-off-by: Kirill Korotkov <korotkovkm@gmail.com>

* Pin scikit-learn version to work with python 3.7

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update dataset docstring

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Pin requirements to work with python 3.7

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add requirements to setup.py

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add changes to dataset docstring

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Dummy commit to retrigger CI pt1

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Dummy commit to retrigger CI pt2

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update dataset dockstring

Signed-off-by: Kirill Korotkov <korotkovkm@gmail.com>

Signed-off-by: Kirill Korotkov <korotkovkm@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Co-authored-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
* Revised the Introduction to make it short and sweet.
* Revised the Get Started section. Gone is "Hello Kedro". Gone are the installation pre-requisites (that's just part of the Install Kedro page now). Gone is the "Standalone use of the data catalog - woot woot" and GONE is the page on Kedro starters.
* Reordered the create project material to put the project structure breakdown in the section that introduces key concepts and shorten the Iris tutorial to the bare minimum. I did add visualisation at this point though, to highlight Kedro Viz, as I felt it was coming far too late in the spaceflights tutorial and needed to be more prominent as a feature.
* Added a TL;DR page to Get Started which some people could probably just use as-is and ignore the rest of the section.
* Starters material has moved into a new section all about "Kedro project setup". Much of that section still needs review/revision but I have updated the Starters page so it reads more clearly.
* Improved the Kedro-Viz page somewhat (still more to come for Plotly)
* Notebooks/IPython materials now merged and simplified

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
@noklam noklam merged commit 5fbcdf9 into kedro-org:main Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Kedro Compatible with Airflow 2.4.2
8 participants