Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gunicorn to serve logs generated by worker #17591

Merged
merged 6 commits into from
Aug 14, 2021
Merged

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Aug 12, 2021

Flask build-in webserver is not intended to meet security and performance requirements for a production server. It should be used only for development.

We even see warnings when we start worker.

   WARNING: This is a development server. Do not use it in a production deployment.

For production deployment, we should production-ready WSGI server. In this contribution, I used Gunicorn because it is already used by the airflow webserver command, so we don't need to add a dependency.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

from flask import Flask, abort, request, send_from_directory
from itsdangerous import TimedJSONWebSignatureSerializer
from setproctitle import setproctitle

from airflow.configuration import conf


def flask_app():
flask_app = Flask(__name__)
def create_app():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def flask_app():
flask_app = Flask(__name__)
def create_app():
flask_app = Flask(__name__, static_folder=None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support this feature, so we can safely turn it off.

@mik-laj mik-laj requested a review from potiuk August 12, 2021 23:31
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more about this change, why do we need it?

@potiuk
Copy link
Member

potiuk commented Aug 13, 2021

Can you explain a bit more about this change, why do we need it?

Yeah. Agree with @kaxil I figured it is after seeing this slack conversation: https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1626866127472700 - but commit message why we are doing it and what problem are we solving (what was the effect of running the dev server rather than gunicorn).

@mik-laj
Copy link
Member Author

mik-laj commented Aug 13, 2021

@potiuk @kaxil Updated PR description.

app.run(host='0.0.0.0', port=worker_log_server_port)
options = {
'bind': f"0.0.0.0:{worker_log_server_port}",
'workers': 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 2 workers or do away with just 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 should work too, but I configured 2 to failover in case one worker had problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 2 is better just in case of memory errors/crashes. Just one small nit (in case somene has problems with memory usage etc.) i think it would be great to mention in the docs of worker that we are using Gunicorm and that the configuration options can be overridden by GUNiCORN_CMD_ARGS env variable https://docs.gunicorn.org/en/latest/settings.html#settings

It's not at all obvious from docs without looking at the code now that wlwe have separate Gunicorm processes forked and that you can configure their behaviour via ENV vars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added docs.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 13, 2021
@@ -107,3 +107,16 @@ External Links
When using remote logging, users can configure Airflow to show a link to an external UI within the Airflow Web UI. Clicking the link redirects a user to the external UI.

Some external systems require specific configuration in Airflow for redirection to work but others do not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@potiuk potiuk merged commit bd4f0c3 into apache:main Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants