Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timetable registration a la OperatorLinks #17931

Closed
uranusjr opened this issue Aug 31, 2021 · 3 comments · Fixed by #17989
Closed

Timetable registration a la OperatorLinks #17931

uranusjr opened this issue Aug 31, 2021 · 3 comments · Fixed by #17989
Assignees

Comments

@uranusjr
Copy link
Member

Currently (as implemented in #17414), timetables are serialised by their classes’ full import path. This works most of the time, but not in some cases, including:

  • Nested in class or function
  • Declared directly in a DAG file without a valid import name (e.g. 12345.py)

It’s fundamentally impossible to fix some of the cases (e.g. function-local class declaration) due to how Python works, but by requiring the user to explicitly register the timetable class, we can at least expose that problem so users don’t attempt to do that.

However, since the timetable actually would work a lot of times without any additional mechanism, I’m also wondering if we should require registration.

  1. Always require registration. A DAG using an unregistered timetable class fails to serialise.
  2. Only require registration when the timetable class has wonky import path. “Normal” classes work out of the box without registering, and user sees a serialisation error asking for registration otherwise.
  3. Don’t require registration. If a class cannot be correctly serialised, tell the user we can’t do it and the timetable must be declared another way.
@ashb
Copy link
Member

ashb commented Sep 2, 2021

The main reason I've previously gone down the "require classes the scheduler/webserver will load to require registration" is I have medium term plans to allow a dag definition to be submitted via an API (i.e. an Airflow user to post a JSON blob) -- and I don't want to open that up to allowing reverse-shell or similar by creating an instance of a random class.

@uranusjr
Copy link
Member Author

uranusjr commented Sep 2, 2021

So it seems like method 1 is desired? I don’t think it technically changes the equation, but sounds like always requiring registration is the safest route. (Although it does make writing customised timtable more difficult since now you can only use pre-defined timetable classes instead of inventing things right inside a DAG file.)

@ashb
Copy link
Member

ashb commented Sep 2, 2021

Although it does make writing customised timtable more difficult since now you can only use pre-defined timetable classes instead of inventing things right inside a DAG file.

Since the scheduler needs to load the timetable class, and Airflow tries very hard to never load dag code in to a long running process, I think not being able to define it in a DAG file is, sadly, desired behaviour.

You'd have to define it in a plugin which can live in plugins/ along side dags/, so users won't have to create a Python dist etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants