Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin pip<23.2 and skip one micropkg test to unblock CI from breaking changes #2816

Closed
wants to merge 17 commits into from

Conversation

noklam
Copy link
Contributor

@noklam noklam commented Jul 18, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Partly fix #2813

Development notes

Checklist

It could be psutils or pytest-xdist, root cause unclear.

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

RELEASE.md Outdated
@@ -16,6 +16,7 @@

## Bug fixes and other changes
* Consolidated dependencies and optional dependencies in `pyproject.toml`.
* Pin `pip<23.2` due to a breaking change. See https://github.com/kedro-org/kedro/pull/2813
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we will not have to release Kedro like this 😢

Also, notice that we only changed the CI requirements, Kedro pyproject.toml is intact

@noklam
Copy link
Contributor Author

noklam commented Jul 18, 2023

https://app.circleci.com/pipelines/github/kedro-org/kedro/24640/workflows/2ca3adec-dbbf-45f7-a9a3-0133ee7bc981/jobs/284236

Interesting, it seems that our window test didn't run this step "Install pip setuptools" at all.

@noklam noklam requested a review from astrojuanlu July 18, 2023 17:34
@noklam noklam marked this pull request as ready for review July 18, 2023 17:47
@noklam noklam requested a review from merelcht as a code owner July 18, 2023 17:47
@astrojuanlu
Copy link
Member

Weird, there's still a stack overflow on Windows. We'll have to look more closely.

@noklam
Copy link
Contributor Author

noklam commented Jul 18, 2023

I suspect it is still pip issue but I can confirm later.

@noklam
Copy link
Contributor Author

noklam commented Jul 18, 2023

Noted Window tests are failing in different way.

Python 3.7 fails with 1 specific test

FAILED
tests/framework/cli/micropkg/test_micropkg_package.py::TestMicropkgPackageCommand::test_package_micropkg

Python >3.7 fails with StackOverflow and pytest just panic.

@noklam
Copy link
Contributor Author

noklam commented Jul 18, 2023

image
On my laptop everything passed even with pip==23.2

@noklam
Copy link
Contributor Author

noklam commented Jul 20, 2023

Good sign that win e2e test are passing, I bumped the pytest-xdist which is very old and only support pytest<=3.9

@noklam
Copy link
Contributor Author

noklam commented Jul 20, 2023

only win_unit_test 3.7 is failing now. https://app.circleci.com/pipelines/github/kedro-org/kedro/24745/workflows/d07eff5f-f9a7-4537-b812-f47450a7f7ca/jobs/285418

I will try to make it run sequentially now.

@noklam
Copy link
Contributor Author

noklam commented Jul 20, 2023

Temporarily for 3.7 tests run in sequential.

@noklam
Copy link
Contributor Author

noklam commented Jul 21, 2023

image
Everything pass when I SSH into CCI🤯

@noklam
Copy link
Contributor Author

noklam commented Jul 21, 2023

Revert 5567507 so it doesn't confuse people. Only Win 3.7 should fails after this.

Initially I suspect it the parallel execution acting weird, but it get the same error when I try to run sequentially.

@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 21, 2023

Good insight @noklam. Does it happen if only the offending tests are run? e.g. doing pytest -k micropkg_package.

@noklam
Copy link
Contributor Author

noklam commented Jul 21, 2023

I'll open a separate PR as I want to keep this CI job running. It's very tricky to test now because it happens only on CCI (even sshing and run the exact command is passing)

@noklam
Copy link
Contributor Author

noklam commented Jul 21, 2023

#2828

Waiting for CI now.

@noklam
Copy link
Contributor Author

noklam commented Jul 24, 2023

CI still failed - #2828

circleci@packer-62f2a351-716f-548b-6ad8-257cea95429e MSYS ~/project (fix/pip-att
empt-2)                                                                         
$ make test-no-spark                                                            
pytest tests/framework/cli/micropkg --no-cov --ignore tests/extras/datasets/spar
k --numprocesses logical --dist loadfile                                        
============================= test session starts ============================= 
platform win32 -- Python 3.7.12, pytest-7.4.0, pluggy-1.2.0                     
rootdir: C:\Users\circleci\project                                              
configfile: pyproject.toml                                                      
plugins: anyio-3.7.1, cov-3.0.0, mock-1.13.0, xdist-3.3.1, requests-mock-1.11.0 
4 workers [79 items]                                                            
........................................................................ [ 91%] 
.......                                                                  [100%] 
======================= 79 passed in 863.79s (0:14:23) ======================== 

This is getting to nowhere, I cannot reproduce the issue even on CircleCI SSH.

Trying to debug the test itself now. Observing the failing message on CI

=========================== short test summary info ===========================
FAILED tests/framework/cli/micropkg/test_micropkg_package.py::TestMicropkgPackageCommand::test_package_micropkg[options0-my_pipeline-'dummy_package.pipelines.my_pipeline' packaged!] - AssertionError: assert 1 == 0
 +  where 1 = <Result PermissionError(13, 'The process cannot access the file because it is being used by another process')>.exit_code
================== 1 failed, 78 passed in 938.52s (0:15:38) ===================

It may be related to these lines that file handler are not closed properly and blocked other process?

        with tarfile.open(sdist_file, "r") as tar:
            sdist_contents = set(tar.getnames())

@merelcht
Copy link
Member

@noklam So it's just the windows unit tests on 3.7 failing? If you feel like it's sensible, should we just change this check to not be mandatory?

@noklam noklam changed the title Pin pip<23.2 to unblock CI from breaking changes Pin pip<23.2 and skip one micropkg test to unblock CI from breaking changes Jul 24, 2023
@noklam
Copy link
Contributor Author

noklam commented Jul 24, 2023

Suppress the test in Window Python 3.7

@noklam noklam force-pushed the fix/ci-breaking-pip-tools branch from de319ff to d84e39d Compare July 24, 2023 14:07
@noklam noklam closed this Jul 24, 2023
@noklam noklam force-pushed the fix/ci-breaking-pip-tools branch from d84e39d to e2464b8 Compare July 24, 2023 14:12
noklam added 3 commits July 24, 2023 14:12
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
noklam added 2 commits July 24, 2023 22:42
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

  • Suppress few more test Win Python 3.7 test_micropkg_packge.py - TestMicropkgPackageCommand::test_micropkg_package_same_name_as_package_name (Nok)
  • Recover a passing test from last week - pin the exact version for python 3.7 (Ahdra)

@AhdraMeraliQB AhdraMeraliQB mentioned this pull request Jul 25, 2023
5 tasks
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@AhdraMeraliQB
Copy link
Contributor

It may be related to these lines that file handler are not closed properly and blocked other process?

        with tarfile.open(sdist_file, "r") as tar:
            sdist_contents = set(tar.getnames())

Just a note that the with context should be automatically closing the files after the lines are run

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

The issue seems to be slightly random, Win 3.7 consistently fail, while the 3.8, 3.9 fails occasionally.

  • This fails in 3.7, 3.8, 3.9 cc3c6a9
  • It was only failing in Win 3.7 27fcd1a
  • Now it fails for Win 3.8 too - 46a34e4

This looks like a related issue found by @AhdraMeraliQB , which suggest could be unknown interaction with pytest-xdist.

@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

And as expected, the error on Window 3.7 is something different else.

Noted that initially there is only 1 failing test

FAILED tests/framework/cli/micropkg/test_micropkg_package.py::TestMicropkgPackageCommand::test_pacakge_micropkg

Then I suppress it, it become another test

FAILED tests/framework/cli/micropkg/test_micropkg_package.py::TestMicropkgPackageCommand::test_micropkg_package_same_name_as_package_name_alias

and finally

FAILED tests/framework/cli/micropkg/test_micropkg_package.py::TestMicropkgPackageCommand::test_micropkg_package_to_destination

@AhdraMeraliQB
Copy link
Contributor

Looks like all 3.8 and 3.9 fails are a result of the node crashing and the process timing out, not any particular tests failing. I would expect that on a re-run or with raising the time allowance they would pass more consistently. However this does not address the node crashing (looks like a seg fault) or 3.7's error

@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

Just a note that the with context should be automatically closing the files after the lines are run

@AhdraMeraliQB This is probably true. I only say that because when I was working on reading the compressed config .tar.gz, I recall there was similar issue with Window. However that is because of an implementation issue with fsspec, in this case it is pure tarfile which is Python Standard Library and I don't expect it to be buggy.

@AhdraMeraliQB
Copy link
Contributor

Could be the case, after playing around, the file permissions conflict is arising from the lines

result = CliRunner().invoke(
            fake_project_cli, ["pipeline", "create", pipeline_name], obj=fake_metadata
        )

My guess at the moment is that several tests are trying to kedro pipeline create the same modular pipeline simultaneously. Unsure why it has only started failing now, an intermediate fix could be to specify that these should not be run in parallel (essentially calling pytest twice, parallel for those unmarked, and sequentially for anything marked as such). It is puzzling that it's only just arisen 🤔

@astrojuanlu
Copy link
Member

Wait, that looks familiar: #2568 (comment)

This is my attempt at a diagnose then: #2568 (comment)

Essentially, it seemed like not all the tests were getting their own temporary directory, and under some circumstances this seems to fail. Maybe it's the same problem?

I was not able to fix it back then, I just sidestepped it or reverted the changes.

noklam added 2 commits July 25, 2023 15:08
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

I suspect is our custom fixture which creates problem, so I use pytest fixture but the problem remains. 51986d5

noklam added 3 commits July 25, 2023 16:21
This reverts commit 0a11440.
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 25, 2023

If I understand correctly, there are two issues:

(1) Incompatibility between latest pip-tools and latest pip on Python 3.7. This is because pip-tools 7 dropped support for Python 3.7, and latest pip-tools 6.* is incompatible with latest pip. I managed to fix that issue with this diff:

diff --git a/features/environment.py b/features/environment.py
index 218ff097..671bee01 100644
--- a/features/environment.py
+++ b/features/environment.py
@@ -103,7 +103,9 @@ def _setup_minimal_env(context):
             "pip",
             "install",
             "-U",
-            "pip>=21.2",
+            # pip==23.2 breaks pip-tools<7.0, and pip-tools>=7.0 does not support Python 3.7
+            "pip>=21.2,<23.2; python_version < '3.8'",
+            "pip>=21.2; python_version >= '3.8'",
             "setuptools>=65.5.1",
             "wheel",
         ],
diff --git a/pyproject.toml b/pyproject.toml
index ae5fc961..50bf4a84 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -28,7 +28,7 @@ dependencies = [
     "more_itertools~=9.0",
     "omegaconf~=2.3",
     "parse~=1.19.0",
-    "pip-tools~=6.5",
+    "pip-tools>=6.5,<8",
     "pluggy~=1.0",
     "PyYAML>=4.2, <7.0",
     "rich>=12.0, <14.0",

(2) Problems with the micropackaging tests only on Windows and only when run in parallel, unknown origin. As @noklam said above, suppressing 1 test doesn't work, there seems to be a systematic issue with those.

Maybe it's related to this observation by @SajidAlamQB in #1328:

# configure_project does imports that add PACKAGE_NAME.pipelines,
# PACKAGE_NAME.settings to sys.modules. These need to be removed.
# Ideally we would reset sys.modules to exactly what it was before
# running anything, but removal of distutils.build.commands from
# sys.modules mysteriously makes some tests for `kedro micropkg package`
# fail on Windows, Python 3.7 and 3.8.
for module in list(sys.modules.keys()):
if module.startswith(PACKAGE_NAME):
del sys.modules[module]

an intermediate fix could be to specify that these should not be run in parallel (essentially calling pytest twice, parallel for those unmarked, and sequentially for anything marked as such).

@AhdraMeraliQB I agree. Could be something like this:

(Edit: micropkg tests might fail even if run sequentially, the solution below is not enough)

diff --git a/.circleci/continue_config.yml b/.circleci/continue_config.yml
index 1632c4bd..bc7cf756 100644
--- a/.circleci/continue_config.yml
+++ b/.circleci/continue_config.yml
@@ -285,8 +285,11 @@ jobs:
             equal: [ "3.10", <<parameters.python_version>> ]
           steps:
             - run:
-                name: Run unit tests without spark in parallel
-                command: conda activate kedro_builder; make test-no-spark
+                name: Run unit tests without spark and without micropkg in parallel
+                command: conda activate kedro_builder; pytest --no-cov --ignore tests/extras/datasets/spark -k "not micropkg" --numprocesses 4 --dist loadfile
+            - run:
+                name: Run micropkg unit tests sequentially
+                command: conda activate kedro_builder; pytest --no-cov --ignore tests/extras/datasets/spark -k "micropkg" --dist no
       - when:
           condition:
             equal: [ "3.10", <<parameters.python_version>> ]

@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 25, 2023

In astrojuanlu#1 I experimented with the two ideas above and I still got a test failure with the dreaded "The process cannot access the file because it is being used by another process". The funniest thing is that the tests were running sequentially, and this test was the first to run. On the other hand, after my changes in #2614, kedro micropkg is slower, and the CI times can get out of hand.

Fundamentally, the fact that we are testing the micropkg workflow almost exclusively through the CLI makes debugging more difficult:

  • The actual errors are hidden, because we're calling kedro micropkg as a subprocess.
  • The way we mock and create temporary directories on the fly contributes to this.

Addressing this requires a significant rewrite of the tests.

I don't have very good ideas left on how to proceed beyond marking these tests as xfail on Windows + Python 3.7 and move on. I honestly don't think there's anything wrong with the code, but rather with our tests.

@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

@astrojuanlu Agree with most of it, just want to note that it also cause this flaky stackoverflow issue (not show up everytime) - See https://app.circleci.com/pipelines/github/kedro-org/kedro/24923/workflows/99bc8992-4935-44bb-9aa4-75dfc2402e23/jobs/286915

In this PR the stackoverflow issue fails in 3.8 & 3.9 as well, I suspect it is more related to pytest-xdist base on what @AhdraMeraliQB found. If it's only ""The process cannot access the file because it is being used by another process", then I agree with xfail, otherwise we may need xskip.

So on solution, I think there are 2 options

  1. xskip on Window Python 3.7-3.9
  2. xfail on Window Python 3.7 + change all the test to run sequentially.

On refactoring - I agree on this. #2816 (comment) is an attempt to improve how we create and clean up tests. We should leverage fixture as much as we could to reduce the chance that it conflicts with other pytest plugin. From my experience, touching the sys.modules often create unexpected issues, we made a lot of effort to remove most of these tests recently.

However, we need to decide whether micropkg is a priority. It was sort of in maintenance mode, partly due to what alloy is doing already. (we need to keep it in mind this isn't available in OS)

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
@noklam
Copy link
Contributor Author

noklam commented Jul 25, 2023

On the other hand, it may worth to enable Github Action on kedro, it would not take too much effort since we already have the more complicated setup in kedro-plugins. The reason for this is I believe there is no bug in the code or dependency itself, so one possibility is CircleCI did something funny behind the scene (which could explains why this error suddenly show up without code change).

@noklam
Copy link
Contributor Author

noklam commented Jul 26, 2023

Closed in favor of moving to Github Action which solves the problem. Still unsure what's the root cause, could be CircleCI infrastructure as we didn't make any code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants