You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
requirements.txt contains packages ( vertica_python, pandas, numpy etc) along with their version needed for my code.
I wrote a litte shell script based on the one provied in the doc for creating packaged dags:
set -eu -o pipefail
if [ $#== 0 ];thenecho"First param should be /srv/user_name/virtualenvs/name_virtual_env"echo"Second param should be name of temp_directory"echo"Third param directory should be git url"echo"Fourth param should be dag zip name, i.e dag_zip.zip to be copied into AIRFLOW__CORE__DAGS__FOLDER"echo"Fifth param should be package name, i.e classify_business"fi
venv_path=${1}
dir_tmp=${2}
git_url=${3}
dag_zip=${4}
pkg_name=${5}
python3 -m venv $venv_pathsource$venv_path/bin/activate
mkdir $dir_tmpcd$dir_tmp
python3 -m pip install --prefix=$PWD git+$git_url
zip -r $dag_zip*
cp $dag_zip$AIRFLOW__CORE__DAGS_FOLDER
rm -r $dir_tmp
The shell will install my package along with dependencies directly from gitlab, zip and then move to the dags folder.
This is the content of the folder tmp_dir before being zipped.
bin
lib
lib64
predict_dag.py
train_dag.py
Airflow doesn't seem to be able to import package installed in lib or lib64.
I'm getting this error
ModuleNotFoundError: No module named 'vertica_python'
I even tried to move my custom package outside of lib:
bin
my_custom_package
lib
lib64
predict_dag.py
train_dag.py
But still getting same error.
One of the problem I think relies on how to use pip to install package in a specific location.
Airflow example use --install-option="--install-lib=/path/" but it's unsupported:
Location-changing options found in --install-option: ['--install-lib']
from command line. This configuration may cause unexpected behavior
and is unsupported. pip 20.2 will remove support for this
functionality. A possible replacement is using pip-level options like
--user, --prefix, --root, and --target. You can find discussion regarding this at pypa/pip#7309.
Using --prefix leads to a structure like above, with module not found error.
Using --target leads to every package installed in the directory specified.
In this case I have a pandas related error
C extension: No module named 'pandas._libs.tslibs.conversion' not built
I guess that it's related to dynamic libraries that should be available at a system level?
I really don't know how to do that.
Thanks
The text was updated successfully, but these errors were encountered:
packaged dags cannot contain dynamic libraries (eg. libz.so) these need to be available on the system if a module needs those. In other words only pure python modules can be packaged.
Packaged DAGs is only a partial solution to the dependency problem because it only allows you to load simple Python libraries. All complex math libraries are not supported.
I'm trying to use
apache airlfow
with packaged dags.I've written my code as a python package and my code depends on other libraries such as numpy, scipy etc.
This is
setup.py
of my custom python package:requirements.txt contains packages ( vertica_python, pandas, numpy etc) along with their version needed for my code.
I wrote a litte shell script based on the one provied in the doc for creating packaged dags:
The shell will install my package along with dependencies directly from gitlab, zip and then move to the dags folder.
This is the content of the folder tmp_dir before being zipped.
Airflow doesn't seem to be able to import package installed in lib or lib64.
I'm getting this error
I even tried to move my custom package outside of lib:
But still getting same error.
One of the problem I think relies on how to use
pip
to install package in a specific location.Airflow example use
--install-option="--install-lib=/path/"
but it's unsupported:Using
--prefix
leads to a structure like above, with module not found error.Using
--target
leads to every package installed in the directory specified.In this case I have a pandas related error
I guess that it's related to dynamic libraries that should be available at a system level?
I really don't know how to do that.
Thanks
The text was updated successfully, but these errors were encountered: