Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2518] kedro new --starter argument is broken on Windows for python<3.8 #722

Closed
Galileo-Galilei opened this issue Mar 13, 2021 · 20 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@Galileo-Galilei
Copy link
Member

Description

I tried to create a kedro project by running kedro new on Windows in a conda environment with python 3.7.10. The command fails with a Error: Failed to generate project. error.

The command should work, since the badge in the README claims support for python=3.6, 3.7, 3.8.

After investigation, the error comes from tempfile library and the context manager used to create a temporary file in _prompt_user_for_config

https://github.com/quantumblacklabs/kedro/blob/6d9ffaba46ed1e984500a86ff5177327857c74f9/kedro/framework/cli/starters.py#L259-L268

This context manager __exit__ method calls _rmtree_unsafe on a read only file. This discussion in the official python issue tracker (msg262584 has the exact same stack trace than me, and msg344037 reference the commit which fixes the issue) show that this is a known bug which has been resolved when python==3.8 was released.

I checked and everything works fine with python=3.8 on the same computer. The compatibility with different os and python version seems important to me, especially for enterprise support. For my personal kedro use I can easily upgrade my python version, but in a professional setup I often have to deal with the constraints of the team I work with, and I barely choose if I work on windows/linux and if I can upgrade to a newer python version (a lot of teams are conservative and do not want to upgrade their python version by fear of breaking something in production).

Context

I can't create a new kedro project with kedro new --starter=pandas-iris command on Windows with python=3.7.

Steps to Reproduce

On a Windows 7 or Windows 10 computer, create a conda environment with python=3.7 and call kedro new:

conda create -n ked171_py37 python=3.7 -y
pip install kedro==0.17.1
kedro new --starter==pandas-iris

Expected Result

The kedro project should be created.

Actual Result

The usual questions are asked, then the message Error: Failed to generate project. is displayed and no project is created.

Running the command with --verbose flag return the following stacktrace:

Traceback (most recent call last):
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\kedro\framework\cli\cli.py", line 300, in _create_project
    config = _prompt_user_for_config(template_path, checkout, directory)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\kedro\framework\cli\cli.py", line 385, in _prompt_user_for_config
    return config
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\tempfile.py", line 807, in __exit__
    self.cleanup()
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\tempfile.py", line 811, in cleanup
    _shutil.rmtree(self.name)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 516, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  [Previous line repeated 1 more time]
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 400, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\shutil.py", line 398, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 5] Accès refusé: 'C:\\Users\\xxx\\AppData\\Local\\Temp\\tmpntjghogt\\kedro-starters\\.git\\objects\\pack\\pack-26ae52b934aecc262c73b726202a49ce0bb01487.idx'
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\kedro\framework\cli\cli.py", line 229, in new
    directory=directory,
  File "c:\users\xxx\anaconda3\envs\kp_171\lib\site-packages\kedro\framework\cli\cli.py", line 332, in _create_project
    raise KedroCliError("Failed to generate project.") from exc
kedro.framework.cli.utils.KedroCliError: Failed to generate project.
Error: Failed to generate project.

This is the very same stacktrace that the one in msg262584 in above link on python issue tracker.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.17.1
  • Python version used (python -V): 3.7.10
  • Operating system and version: Windows 10
@Galileo-Galilei Galileo-Galilei added the Issue: Bug Report 🐞 Bug that needs to be fixed label Mar 13, 2021
@antonymilne antonymilne changed the title kedro new --starter argument is broken on Windows for python<3.8 [KED-2518] kedro new --starter argument is broken on Windows for python<3.8 Mar 15, 2021
@antonymilne
Copy link
Contributor

Thank you very much for this really excellent bug report! I have added the issue to our internal tracker.

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

Is there an example of how to use the 0.17.2 kedro? The starter repository seems broken, I cannot get the pandas-iris running.

I can't find where the register_pipelines was call, it kept saying no pipeline is registered.

@antonymilne
Copy link
Contributor

Hi @noklam, using 0.17.2 kedro should be very similar to previous versions, and I believe that in general all the starters are working. The issue @Galileo-Galilei brought up is specifically a problem with Windows and Python <3.8.

Between 0.17.1 and 0.17.2,register_pipelines has moved from hooks.py to pipeline_registry.py (though this should be a backwards compatible change, i.e. you don't have to move it there if you don't want to). In the pandas-iris starter you can see this file here

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

This does not work for me as register_pipelines was never imported.

After digging into source code, I found that I have to add from pipeline_registry import register_pipelines in the __init__.py file.

https://github.com/quantumblacklabs/kedro-starters/blob/master/pandas-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/__init__.py

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

https://github.com/quantumblacklabs/kedro/blob/8c981a454431b0a32a207fbaeb8e3b5890a87bd6/kedro/framework/project/__init__.py#L106-L112

   def _get_register_pipelines(  # pylint: disable=no-self-use
        self, pipelines_module: str
    ):
        module_obj = importlib.import_module(pipelines_module)
        register_pipelines = getattr(module_obj, "register_pipelines")
        return register_pipelines

@antonymilne
Copy link
Contributor

This seems strange. I just ran this:

conda create -n temp python=3.7
conda activate temp
pip install kedro
kedro new -s pandas-iris
kedro install
kedro run

... and all worked ok for me, without needing to change anything. Please can you post exactly what you have in your pipeline_registry.py file, your Python version, your OS, and check kedro --version?

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

Without running kedro install, I get this error. I don't remember kedro install is a straight requirement before?

python: 3.8.3
OS: Windows 10
kedro: 0.17.2

    raise KedroContextError(
kedro.framework.context.context.KedroContextError: Failed to find the pipeline named '__default__'. It needs to be generated and returned by the 'register_pipelines' function.

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

It works after running kedro install, I am a bit confused here as I thought it is installing the dependencies only, it should not cause Kedro fails to discover the pipeline?

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

It fails to import package.pipeline_registry without kedro install.

After running kedro install , I try to run import package and it fails, I don't understand what did kedro did to register the pipelines?

It's bug me that importlib.import_module(pipelines_module) this code apparent run successfully in kedro pipeline. But if I open a python shell, import package will fails.

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

OK, I was finally able to debug. It is a silly mistake, I just didn't install pandas

I understand it's for backward compatibility, but may it should at least show some log?

        try:
            register_pipelines = self._get_register_pipelines(pipelines_module)
        except (ModuleNotFoundError, AttributeError):
            # for backwards compatibility, this is fine
            project_pipelines = {}
        else:
...

https://github.com/quantumblacklabs/kedro/blob/8c981a454431b0a32a207fbaeb8e3b5890a87bd6/kedro/framework/project/__init__.py#L117-L122

@antonymilne
Copy link
Contributor

antonymilne commented Mar 18, 2021

Hi @noklam, thanks for digging into this and sharing. You make a very good point about not just hiding the ModuleNotFoundError but raising some sort of log. I'll see if we can easily get this added (though in due course, when 0.18 is released, the whole except block should disappear anyway).

@noklam
Copy link
Contributor

noklam commented Mar 18, 2021

Thanks for getting back.

I have noticed Kedro changed the API in the last few versions. from project context to hooks, then hooks to pipeline registry. Is there some migration guide for what need to be changed?

Also, is there a roadmap for Kedro release? As we maintain an internal library that integrate with Kedro, it would be nice to expect potential broken API and plans ahead.

@antonymilne
Copy link
Contributor

antonymilne commented Mar 18, 2021

The release notes should generally explain how to migrate when new versions are released. e.g. this explains the move of pipeline registry. If it's a particularly tricky set of changes then there's sometimes whole sections called "Migration guide" to help out. I definitely appreciate that these sorts of changes can make upgrading kedro more frustrating than it should be though. Sorry!

As for a roadmap for kedro releases, I don't think we have such a thing available. However, we do note upcoming deprecations with runtime warnings and also in the release notes, so definitely worth reading that for planning purposes. We were also thinking about holding a session for us to explain to users more about where the framework is going and what changes to expect in the future - @yetudada might be able to comment more 🙂

@noklam
Copy link
Contributor

noklam commented Mar 19, 2021

Don't worry! Thanks for open sourcing kedro, it definitely helps to solve some of our problems. :) It would be nice to learn more about where is kedro heading to.

@limdauto
Copy link
Contributor

This will be patched in the next release using the workaround in the Python documentation.

@antonymilne
Copy link
Contributor

@noklam the fix about the ModuleNotFoundError error with the pipeline registry will be in 0.17.3, coming soon. Investigating this actually revealed a deeper underlying bug to do with lazy loading of pipelines which we've also fixed. So thank you very much for pointing this out as it turned into something even more important than we first thought!

@noklam
Copy link
Contributor

noklam commented Apr 19, 2021

@AntonyMilneQB Awesome!

@lorenabalan
Copy link
Contributor

Hi everyone, this should be fixed in 0.17.3, released today. As always, feel free to raise a new issue if you encounter problems. Thank you all for your patience and insights!

@FranciscoReveriano
Copy link

I actually think its broken again. I am trying your Spaceflight example and going through the long-guide.
It seems that the "pipeline_registry" function is not called. Results in the infamous:

"ValueError: Pipeline does not contain named ["preprocess_companies_node"]

@antonymilne
Copy link
Contributor

@FranciscoReveriano I think (hope) this will not be a kedro problem now. It should be either because you have not followed the documentation quite right or because the documentation itself isn't quite right.

Try doing kedro new -s spaceflights? Does that work with kedro run straightaway? If so, then try comparing the code in that to the code you have after the following the tutorial. Can you see what the difference is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

6 participants