Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow Scheduler on WSL fails to execute Windows EXE #13108

Closed
oferze opened this issue Dec 16, 2020 · 24 comments
Closed

Airflow Scheduler on WSL fails to execute Windows EXE #13108

oferze opened this issue Dec 16, 2020 · 24 comments

Comments

@oferze
Copy link
Contributor

oferze commented Dec 16, 2020

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment:

  • Cloud provider or hardware configuration: Local dev machine. Issue is unrelated to hardware.
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.1 LTS (Focal Fossa), running within WSL 2 on Windows 10 Pro 20H2, OS Build 19042.685 and Windows Feature Experience Pack 120.2212.551.0
  • Kernel (e.g. uname -a): Linux LifeMapPC 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:

What happened:

I have a BashOperator task which invokes an .EXE on Windows (via /mnt/c/... or via symlink).
The task fails. Log shows:

[2020-12-16 18:34:11,833] {bash_operator.py:134} INFO - Temporary script location: /tmp/airflowtmp2gz6d79p/download.legacyFilesnihvszli
[2020-12-16 18:34:11,833] {bash_operator.py:146} INFO - Running command: /mnt/c/Windows/py.exe
[2020-12-16 18:34:11,836] {bash_operator.py:153} INFO - Output:
[2020-12-16 18:34:11,840] {bash_operator.py:159} INFO - Command exited with return code 1
[2020-12-16 18:34:11,843] {taskinstance.py:1150} ERROR - Bash command failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.8/dist-packages/airflow/operators/bash_operator.py", line 165, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
[2020-12-16 18:34:11,844] {taskinstance.py:1187} INFO - Marking task as FAILED. dag_id=test-dag, task_id=download.files, execution_date=20201216T043701, start_date=20201216T073411, end_date=20201216T073411

And that's it. Return code 1 with no further useful info.

Running the very same EXE via bash works perfectly, with no error (I also tried it on my own program which emits something to the console - in bash it emits just fine, but via airflow scheduler it's the same error 1).

What you expected to happen:

Expected the Windows executable to run successfully via airflow scheduler, same as when I run it directly in Bash. That is: emit any output to the console and return success (error 0).

Alternatively, happy to learn a way to get more insight into the log produced by the airflow scheduler run. i.e. to see "what happened" that makes it return error 1 on certain commands.

How to reproduce it:

I do not know the circumstances / environment "fault" that is causing it so can't supply reproduction steps.

Anything else we need to know:

Some more data and things I've done to rule out any other issue:

  • airflow scheduler runs as root. I also confirmed it's running in a root context by putting an whoami command in my BashOperator, which indeed emitted root (I should also note that all native Linux programs run just fine! only the Windows programs don't.)
  • The Windows EXE I'm trying to execute and its directory have full 'Everyone' permissions (on my own program of course, wouldn't dare doing it on my Windows folder - that was just an example.)
  • The failure happens both when accessing via /mnt/c as well as via symlink. In the case of a symlink, the symlink has 777 permissions.
  • I tried running airflow test on a BashOperator task - it runs perfectly - emits output to the console and returns 0 (success).
  • Tried with various EXE files - both "native" (e.g. ones that come with Windows) as well as my C#-made programs. Same behavior in all.
  • Didn't find any similar issue documented in Airflow's GitHub repo nor here in Stack Overflow.
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 16, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented Dec 16, 2020

Interesting problem. Did you try to run a bash script that would execute your binary via

cmd.exe /C <command>

https://docs.microsoft.com/en-us/windows/wsl/interop

@oferze
Copy link
Contributor Author

oferze commented Dec 16, 2020

@potiuk tried (even though it's intended for non-executables like dir). Same error 1. Even /mnt/c/Windows/System32/cmd.exe /C dir fails the same way, and succeeds when I plainly run it in Bash.

@potiuk
Copy link
Member

potiuk commented Dec 16, 2020

@oferze
Copy link
Contributor Author

oferze commented Dec 16, 2020

@potiuk don't think so, all of my airflow environment (including Python) is installed within Ubuntu.

@oferze
Copy link
Contributor Author

oferze commented Dec 17, 2020

@potiuk any other things to check? Is there a way to get the task runner / airflow scheduler to emit verbose output or to see what exactly did it do by attempting to run the command and receiving error 1?

@potiuk
Copy link
Member

potiuk commented Dec 17, 2020

I am not a WSL2 user unfortunately :(. maybe others who use it can help ? you can ask in troubleshooting or try to ask question in StackOverflow in general.

@oferze
Copy link
Contributor Author

oferze commented Dec 17, 2020

Had already asked in Stack Overflow, with no luck: https://stackoverflow.com/questions/65319176/airflow-scheduler-on-wsl-fails-to-execute-windows-exe

What about my other question about the scheduler?

Also, I'm not 100% sure it's WSL2's fault.
Those files run perfectly fine from Bash (within WSL2).
They even also run fine with airflow test.
It's only via airflow scheduler that they don't. Even though the scheduler runs as root I suspect there's something there that makes it run in a different "context" than elsewhere.

@potiuk
Copy link
Member

potiuk commented Dec 17, 2020

Too bad - maybe someone will answer it . I am not sure about the other question - maybe it makes sense to add one question in one thread? Then people will be able to focus on one thing and maybe someone will answer it.

@oferze
Copy link
Contributor Author

oferze commented Dec 21, 2020

Thanks; I've just created a question but may I note that the question is not separate from the issue; it's just means to help the community and myself figure out the root cause; this will ultimately lead to a fix in either Airflow or WSL2 - depends on the findings.

It can indeed be a WSL2 issue but I suspect it's more likely something with Airflow, since the way it runs BashOperators seems to be slightly different than "normal" Bash.

@oferze
Copy link
Contributor Author

oferze commented Dec 21, 2020

Furthermore, if I'll try to create an issue with WSL2, they will very rightfully say that since Bash executes these Windows programs successfully, it must be something within airflow itself which executes Bash commands differently.

So I think this place is the best avenue of trying to solve this...

@potiuk
Copy link
Member

potiuk commented Dec 21, 2020

Sure - just please take a note, that Airflow is not "guaranteed" to work on WSL2. WSL2 might be used for development of Airflow itself, but this is not the "target" execution environment for any production use. So while you might get help from someone who finds the problem, it's not really "expected" that this case is going to work.

@oferze
Copy link
Contributor Author

oferze commented Dec 21, 2020

The project is open source and non for profit... the whole idea is to make the product strive by figuring out cases that should legitimately work but don't. More and more users are using WSL / WSL2. The only way to invoke a Windows program by airflow is via WSL. Meaning, you can't just put airflow on a Linux VM or bare metal and invoke Windows programs because it won't work. At the same note, I shall probably not "expect" anyone to help with a genuine inner-Linux issue, because the project is mainly volunteer based.

Airflow is a product, a platform, for scheduling, triggering and executing DAGs so it runs on its designed environment, Linux, but I think that by ruling out support for Windows targets (not Windows as an OS for airflow - Windows programs as targets to airflow tasks) it effectively sends a message: "run it inside the Linux world only".

@potiuk
Copy link
Member

potiuk commented Dec 21, 2020

Airflow is a product, a platform, for scheduling, triggering and executing DAGs so it runs on its designed environment, Linux, but I think that by ruling out support for Windows targets (not Windows as an OS for airflow - Windows programs as targets to airflow tasks) it effectively sends a message: "run it inside the Linux world only".

This is precisely what Airflow is right now. Linux is the only 'execution environment' only for Airflow. There are open issues to improve Windows support - for example #12874 and #10388 - but those are not being actively worked on for now.

We are going to discuss the scope of Airflow 2.1 right after New Years, I will make sure to mention Windows support as possibly important topic to cover. But until this is picked up by someone in the community you can get at most support from other community members who are also on Windows.

Sorry if that is not helpful, but this is the current state. Maybe you can also start a discussion thread on the devlist of Airflow about that and see if there is an interest there to improve support for Windows - maybe there will be some other community members who would like to join their forces and improve Windows support.

@oferze
Copy link
Contributor Author

oferze commented Dec 21, 2020

Thanks for raising it.

It works on a different environment so I think it's environment/permission based (even though I mentioned I'm root all the way) rather than code based.

The other takeaway might be to add some more environment-related info when a Bash command fails, to STDERR / STDOUT.

UPDATE: I've read the 2 issues you linked to; they talk about having Windows host airflow itself.

@potiuk
Copy link
Member

potiuk commented Dec 21, 2020

Yep. But I think those are very much related - i.e. the same people might be interested in solving both issues.

BTW - maybe you simply can try add this info yourself and make a PR. There are not many people who are running Airflow on Windows in the community - it is not very common. but that might be a good start for you to dig a bit deeper and try to add more logging yourself? Airflow is a community-driven project, so you might make it your first contribution if you find a fix to the problem. It's not as complex as you might think. This is all python, so you can edit the python code directly in your installation. The code is here for bash operator and depending on how you installed it you will find airflow installed under one of the paths displayed by python -m site.

Then you can add your own logging information there are do some more experimentation with it.

@oferze
Copy link
Contributor Author

oferze commented Dec 22, 2020

Yes I've already inspected bash_operator.py. Saw that it's using a subprocess etc. but couldn't figure how to tweak it to emit more detailed logging or to expose what different environment a subprocess runs at. I'm not a Python developer; I can google of course and try debugging my way - which I'll do - but I was hoping someone within the project could immediately point the environmental difference when running via airflow scheduler / Python subprocess, vs. plain bash.

@potiuk
Copy link
Member

potiuk commented Dec 22, 2020

I don't think we have anyone who uses this setup in the core project team. We could also try googling (which I did) but I believe at this stage some more experimentation is needed.

@potiuk
Copy link
Member

potiuk commented Dec 22, 2020

This is likely more related to the way Windows treats processes spawned as sub-processed of such process - and this is not "airflow" issue but rather subprocesses in this setup. We fork the processes when we are running the task - so this might be the issue. You can try to do export AIRFLOW__CORE__EXECUTE_TASKS_NEW_PYTHON_INTERPRETER=True before starting Airflow to see if it changes anything (this uses spawning new processes rather than forking). If that works for you then we can at least narrow down the problem.

@oferze
Copy link
Contributor Author

oferze commented Dec 23, 2020

Thanks for the idea! For now I've re-built a fresh Ubuntu machine with Airflow 2 and the problem doesn't exist. If it will happen again in the future, I'll try that env flag. But I'm not sure it's a Windows thing since I have another Windows machine (running Ubuntu 18.04 as opposed to my "problematic" WSL2 that was running Ubuntu 20.04) which works just fine.

@potiuk potiuk closed this as completed Jun 17, 2021
@BrunoSerafim
Copy link

I had the same issue. And I believe the problem is how the bashoperator parses the shell script. In the airflow bashoperator code the execute function tries to run "bash" + command. And when you run "bash *.exe" in WSL you get the error: "cannot execute binary file". So adapting the bashoperator to handle this exception (shell scripts that start with a windows executable file) or create your own operator (changing the execute function in the bashoperator code so the command run only the self.bash_command instead of ['bash', '-c', self.bash_command] should solve the issue.

@oferze
Copy link
Contributor Author

oferze commented Jul 20, 2021

@BrunoSerafim thanks, I'll try!

@zarria
Copy link

zarria commented Mar 9, 2022

I think this is one of the best explanations i have found to execute exe files via wsl in airfflow

https://stackoverflow.com/questions/65402424/airflow-scheduler-fails-to-execute-windows-exe-via-wsl/69487192#69487192

Maybe a little bit late, but might be of some help to others

@d0al3x
Copy link

d0al3x commented Apr 6, 2022

I think this is one of the best explanations i have found to execute exe files via wsl in airfflow

https://stackoverflow.com/questions/65402424/airflow-scheduler-fails-to-execute-windows-exe-via-wsl/69487192#69487192

Maybe a little bit late, but might be of some help to others

Thanks for this @zarria , it helped me !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants