Proof of concept: Rework Windows CI #2841
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Windows tests are noticeably slow, but most of the time is spent recreating the conda cache and installing the dependencies (up to 20 minutes!)
In addition, lots of system dependencies are installed manually with choco or other systems, while they could be easily installed with conda in one stroke. This includes
make
,pyspark
, and others.The problem with this approach is that
pip install .[test]
will ignore the packages that are in the conda environment and reinstall them again. This creates problems not only with pyspark, but also with numpy, since many packages are compiled against it and reinstalling it should never be done. This lack of interoperability between pip and conda is unfortunately a known issue.The only way I see to proceed is generating an
environment.yml
thatconda
understands that sort of duplicatessetup.py
and contains the dependencies in a way that can be installed withconda
, and then leaving only thepip instal .
to install the development version of kedro. This would introduce some duplication, but maybe it would help us keep the conda recipe up to date (see conda-forge/kedro-feedstock#42).For now, to do an experiment I duplicated the table of extra requirements in
setup.py
, but I don't think it looks great to be honest.What do folks think?
Development notes
This uses some newer capabilities of conda, such as the libmamba solver which is much faster https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community
Checklist
RELEASE.md
file