Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kedro-Airflow not working with Astrocloud #13

Closed
yetudada opened this issue Mar 28, 2022 · 13 comments
Closed

Kedro-Airflow not working with Astrocloud #13

yetudada opened this issue Mar 28, 2022 · 13 comments
Assignees
Labels
airflow bug Something isn't working

Comments

@yetudada
Copy link
Contributor

yetudada commented Mar 28, 2022

Raised by @jweiss-ocurate:

Description

I am trying to run a simple spaceflights example with Astrocloud. I wasn't sure if anyone has been able to get it to work.

Here is the DockerFile:
FROM quay.io/astronomer/astro-runtime:4.1.0

RUN pip install --user new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python

Context

I am trying to use kedro-airflow with astrocloud.

Steps to Reproduce

  1. Follow directions here https://kedro.readthedocs.io/en/latest/10_deployment/11_airflow_astronomer.html
  2. Replace the DockerFile with the above mentioned image.

Expected Result

Complete Kedro Run on local Airflow image.

Actual Result

Failure in local Airflow image.
[2022-02-26, 16:43:26 UTC] {store.py:32} INFO - read() not implemented for BaseSessionStore. Assuming empty store.
[2022-02-26, 16:43:26 UTC] {session.py:78} WARNING - Unable to git describe /usr/local/airflow
[2022-02-26, 16:43:29 UTC] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL

Your Environment

Include as many relevant details about the environment you experienced the bug in:

  • Kedro-Airflow plugin version used (get it by running pip show kedro-airflow): 0.4.1
  • Airflow version (airflow --version):
  • Kedro version used (pip show kedro or kedro -V): 0.17.7
  • Python version used (python -V): > 2.0.0
  • Operating system and version: Ubuntu Linux 20.04
@yetudada
Copy link
Contributor Author

yetudada commented Mar 28, 2022

From @jacobweiss2305:

Kedro-Airflow plugin version used (get it by running pip show kedro-airflow): 0.4.1
Airflow version (airflow --version): > 2.0.0
Kedro version used (pip show kedro or kedro -V): 0.17.7
Python version used (python -V): >3.9
Operating system and version: Ubuntu Linux 20.04

@merelcht merelcht moved this to Todo in Kedro Framework Mar 28, 2022
@yetudada
Copy link
Contributor Author

From @limdauto:

Hi @jacobweiss2305 please try python 3.8. Support for 3.9 hasn't been out yet.

@yetudada
Copy link
Contributor Author

yetudada commented Mar 28, 2022

From @jacobweiss2305:

Hi @limdauto

Support for Kedro and Python 3.9 is available using pip install kedro --ignore-requires-python (kedro-org/kedro#710)

@yetudada
Copy link
Contributor Author

yetudada commented Mar 28, 2022

From @jweiss-ocurate:

Hi @limdauto

Here are the exact steps I am taking:

Kedro + Airflow + AstronomerCloud

Environment

  1. Python 3.9.6
  2. Ubuntu 20.04
  3. Kedro == 0.17.7
  4. Kedro-Airflow == 0.4.1

Steps

  1. mkdir astro_cloud_kedro
  2. cd astro_cloud_kedro
  3. astrocloud dev init
  4. python -m venv venv && source venv/bin/activate
  5. pip install kedro --ignore-requires-python
  6. pip install kedro-airflow --ignore-requires-python
  7. kedro new --starter=spaceflights
  8. cp -r new-kedro-project/* . && rm -rf new-kedro-project
  9. pip install -r src/requirements.txt --ignore-requires-python
  10. kedro package
  11. Edit the DockerFile
FROM [quay.io/astronomer/astro-runtime:4.1.0](http://quay.io/astronomer/astro-runtime:4.1.0)

RUN pip install --user src/dist/new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python
  1. kedro airflow create --target-dir=dags/ --env=base
  2. astrocloud dev start

Error

  1. Go to localhost:8080
  2. Activate new-kedro-project dag in Airflow
  3. The first step should fail with the following logs:
*** Failed to verify remote log exists s3:///new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log.
Please provide a bucket_name instead of "s3:///new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log"
*** Falling back to local log
*** Reading local file: /usr/local/airflow/logs/new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log
[2022-02-28, 15:17:11 UTC] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [queued]>
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [queued]>
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1243} INFO - 
--------------------------------------------------------------------------------
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1244} INFO - Starting attempt 1 of 2
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1245} INFO - 
--------------------------------------------------------------------------------
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1264} INFO - Executing <Task(KedroOperator): data-processing-preprocess-companies-node> on 2022-02-28 14:47:01.235178+00:00
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:52} INFO - Started process 220 to run task
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'new-kedro-project', 'data-processing-preprocess-companies-node', 'scheduled__2022-02-28T14:47:01.235178+00:00', '--job-id', '2', '--raw', '--subdir', 'DAGS_FOLDER/new_kedro_project_dag.py', '--cfg-path', '/tmp/tmpmr1pmxmb', '--error-file', '/tmp/tmpqzqs8xs8']
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:77} INFO - Job 2: Subtask data-processing-preprocess-companies-node
[2022-02-28, 15:17:12 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [running]> on host 3d8fc15ee46a
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=new-kedro-project
AIRFLOW_CTX_TASK_ID=data-processing-preprocess-companies-node
AIRFLOW_CTX_EXECUTION_DATE=2022-02-28T14:47:01.235178+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-02-28T14:47:01.235178+00:00
[2022-02-28, 15:17:12 UTC] {store.py:32} INFO - `read()` not implemented for `BaseSessionStore`. Assuming empty store.
[2022-02-28, 15:17:12 UTC] {session.py:78} WARNING - Unable to git describe /usr/local/airflow
[2022-02-28, 15:17:12 UTC] {logging_mixin.py:109} WARNING - /home/astro/.local/lib/python3.9/site-packages/kedro/config/config.py:296 UserWarning: Duplicate environment detected! Skipping re-loading from configuration path: /usr/local/airflow/conf/base
[2022-02-28, 15:17:13 UTC] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL
[2022-02-28, 15:17:13 UTC] {taskinstance.py:1272} INFO - Marking task as UP_FOR_RETRY. dag_id=new-kedro-project, task_id=data-processing-preprocess-companies-node, execution_date=20220228T144701, start_date=20220228T151711, end_date=20220228T151713
[2022-02-28, 15:17:14 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

@yetudada
Copy link
Contributor Author

yetudada commented Mar 28, 2022

From @sunkickr:

@jweiss-ocurate this may be a memory issue based on the task logs showing Negsignal.SIGKILL. Could you try increasing the amount of local memory allocated to docker?

@yetudada
Copy link
Contributor Author

From @idanov:

@jweiss-ocurate I can confirm we could reproduce that. We'll try to debug what's causing it and update you with any findings we have here.

@yetudada
Copy link
Contributor Author

From @jweiss-ocurate:

Astronomer worked on this with me. The current docker image for Astronomer Cloud requires python 3.9. So I had to install kedro using --ignore-requires-python.

Astronomer was able to add a quick fix by reinstalling python 3.7 in the dockerfile.

@yetudada
Copy link
Contributor Author

From @noklam:

@jweiss-ocurate Does it works after downgrading the Python version?

@yetudada
Copy link
Contributor Author

From @jweiss-ocurate:

Yes it does.

@noklam noklam added the bug Something isn't working label Mar 29, 2022
@noklam noklam moved this from Todo to In Progress in Kedro Framework Mar 30, 2022
@noklam noklam self-assigned this Mar 30, 2022
@noklam
Copy link
Contributor

noklam commented Mar 30, 2022

I try to get it running with develop but was not success.

  1. astrocloud dev start doesn't really allow volume mounting so I can't install a local copy of kedro
  2. No permission of git and even shipping the entire repo into the docker and installation seems to be blocked. (see error below)

I wonder if there is anything special with astrocloud or we could just test it with a custom Airflow setup to get rid of these restrictions.

I also notice it is using quay.io/astronomer/astro-runtime instead of astronomer/ap-airflow that is used in the documentation.

#13 0.247 + pip install kedro_develop                                                                                                                                                                                                  
#13 0.589 Defaulting to user installation because normal site-packages is not writeable                                                                                                                                                
#13 0.610 Looking in links: https://pip.astronomer.io/simple/astronomer-fab-security-manager/                                                                                                                                          
#13 0.973 ERROR: Could not find a version that satisfies the requirement kedro_develop (from versions: none)                                                                                                                           
#13 0.973 ERROR: No matching distribution found for kedro_develop
#13 1.274 WARNING: You are using pip version 21.3.1; however, version 22.0.4 is available.
#13 1.274 You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.

@noklam
Copy link
Contributor

noklam commented Apr 5, 2022

@jweiss-ocurate Could you share the latest Dockerfile that runs successfully?

@noklam
Copy link
Contributor

noklam commented Apr 5, 2022

After some investigation, this is the exact line causing the issue with logging.config.dictConfig(logging_config).

Testing with the latest image + Python 3.9 + Kedro==0.18.0. This is a workaround that would make it works.

Update thie line in logging.yml

"disable_existing_loggers": True

Dockerfile

FROM quay.io/astronomer/astro-runtime:4.2.1
RUN pip install --user dist/new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python

Minimal example to reproduce the error

A minimal example of KedroOperator.execute() to reproduce the issue. It's not entirely clear what's the issue, but disable the existing logger to fix the crash. Potentially it is conflicting with Airflow's own logger. We will revisit the way Kedro does logging soon and hopefully will fix this issue together.

    def execute(self, context):
        print("Hello World")
        config = {
            "version": 1,
            "disable_existing_loggers": False,
            "formatters": {
                "simple": {
                    "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
                },

            },
            "handlers": {
                "console": {
                    "class": "logging.StreamHandler",
                    "level": "INFO",
                    "formatter": "simple",
                    "stream": "ext://sys.stdout",
                },
            },
            # Try uncomment this line, it will fail
            # "root": {
            #     "level": "INFO",
            #     "handlers": ["console",  ],
            # },
        }

        logging.config.dictConfig(
            config
        )  # Comment out this line, everything will break
        print("End of the Program")

@DimedS
Copy link
Member

DimedS commented May 24, 2024

The Airflow Astronomer and AstroCloud deployment documentation was updated in #3792. Due to issues with the Rich library logging in Airflow deployments, one of the updated steps advises setting Kedro logging to [console] only. Deployments are now successfully working with Astro and other cloud providers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants