Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back dag_run to docs #35142

Merged
merged 3 commits into from
Oct 31, 2023
Merged

Add back dag_run to docs #35142

merged 3 commits into from
Oct 31, 2023

Conversation

fritz-astronomer
Copy link
Contributor


Adds in dag_runs so it can be linked to and viewed - such as from the templates.rst page.

Should others be here? Like pool? 🤷

@potiuk potiuk added this to the Airflow 2.7.3 milestone Oct 23, 2023
@Taragolis
Copy link
Contributor

Are we want to make all methods of DagRun and DagRunNote as part of Airflow Public interface?
Should we explicitly set :meta private: for all internal methods?

@fritz-astronomer fritz-astronomer changed the title Add back dag_run to docs [CSE-1089] Add back dag_run to docs Oct 23, 2023
@potiuk
Copy link
Member

potiuk commented Oct 23, 2023

Are we want to make all methods of DagRun and DagRunNote as part of Airflow Public interface? Should we explicitly set :meta private: for all internal methods?

The last time when we added all the other classes we relied on what have been already "intended" - i.e _underscored methods and meta private ones.

Do you think of some concrete methods that we should also mark as private in this case? Maybe that is a good opportunity to mark them as private indeed.

docs/conf.py Outdated Show resolved Hide resolved
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@dstandish
Copy link
Contributor

i wish we could actually flip it and say everything is private unless explicitly declared public

https://public.readthedocs.io/en/stable/

@Taragolis
Copy link
Contributor

Do you think of some concrete methods that we should also mark as private in this case? Maybe that is a good opportunity to mark them as private indeed.

Difficult to say which one should be accessed in context and which one shouldn't.
I think r/o property it is safe to use, a lot of others might break something if it run in runtime or it might required additional access to DB. So better double check it.

I assume that provide access to this methods to DAG Authors might be not a good idea. Some of them not documented, some of them could change behaviour of scheduled tasks

def update_state(
self, session: Session = NEW_SESSION, execute_callbacks: bool = True
) -> tuple[list[TI], DagCallbackRequest | None]:

@provide_session
def schedule_tis(
self,
schedulable_tis: Iterable[TI],
session: Session = NEW_SESSION,
max_tis_per_query: int | None = None,
) -> int:
"""
Set the given task instances in to the scheduled state.
Each element of ``schedulable_tis`` should have its ``task`` attribute already set.
Any EmptyOperator without callbacks or outlets is instead set straight to the success state.

def notify_dagrun_state_changed(self, msg: str = ""):
if self.state == DagRunState.RUNNING:
get_listener_manager().hook.on_dag_run_running(dag_run=self, msg=msg)
elif self.state == DagRunState.SUCCESS:
get_listener_manager().hook.on_dag_run_success(dag_run=self, msg=msg)
elif self.state == DagRunState.FAILED:
get_listener_manager().hook.on_dag_run_failed(dag_run=self, msg=msg)
# deliberately not notifying on QUEUED
# we can't get all the state changes on SchedulerJob, BackfillJob
# or LocalTaskJob, so we don't want to "falsely advertise" we notify about that

@provide_session
def task_instance_scheduling_decisions(self, session: Session = NEW_SESSION) -> TISchedulingDecision:
tis = self.get_task_instances(session=session, state=State.task_states)
self.log.debug("number of tis tasks for %s: %s task(s)", self, len(tis))

def set_state(self, state: DagRunState) -> None:
if state not in State.dag_states:

@potiuk
Copy link
Member

potiuk commented Oct 23, 2023

i wish we could actually flip it and say everything is private unless explicitly declared public

https://public.readthedocs.io/en/stable/

That was the idea (on module level) to only include the modules we want (with the risk that we will not add something like dag_run initially - it's always OK to add something we forgot),

But on member level it is not possible with autoapi, I am afraid https://sphinx-autoapi.readthedocs.io/en/latest/reference/config.html

@potiuk
Copy link
Member

potiuk commented Oct 23, 2023

Difficult to say which one should be accessed in context and which one shouldn't.

Yeah. The list might be a good start.

@Taragolis
Copy link
Contributor

@fritz-astronomer You should manually add DagRun toctree into the Public Interface of Airflow, some example: https://github.com/apache/airflow/pull/35032/files#diff-4c4bd448d0ce0ccad92e624513f28e532403582532f05fc4a53b0141e2f7acd7

@Taragolis
Copy link
Contributor

Yeah. The list might be a good start.

Ideally we should provide some copy of DagRun (as well as others) into task context. Something like pydantic models which are use in internal API. But I think we can’t forced r/о context until:

  • AIP-44 completed
  • Airflow 3.0

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

Yeah. The list might be a good start.

Ideally we should provide some copy of DagRun (as well as others) into task context. Something like pydantic models which are use in internal API. But I think we can’t forced r/о context until:

  • AIP-44 completed
  • Airflow 3.0

This is actually one of the "soft" mechanism that AIP-44 aims to do. The DagRun | DagRunPydantic in Context.pyi is a way how users can check if they are only using the "public" and "static" parts of the dagrun via context. It's been a very bad idea to expose DB objects with some logic via context, but I agree we cannot do much until either AIP-44 or Airflow 3.0 becaue this might be heavily breaking change .

BTW. In a way - AIP-44 will be a step towards this compatibility breaking if we decide to exclusively use Pydantic vs. DB object for the future (if any) Airlfow 3.0.

When you will enable isolation in AIP-44, it will be "mostly" compatible for existing DAGs / custom operators etc. (that was the goal) but any direct DB features will be disallowed, so switching 2.x -> DB API (possibly with a gradual "soft" mechanism that will allow to migrate dag-by-dag) -> 3.x is quite a nice migration path.

@Taragolis
Copy link
Contributor

In a way - AIP-44 will be a step towards this compatibility breaking if we decide to exclusively use Pydantic vs. DB object for the future (if any) Airlfow 3.0.

I believe that this would prevent a some of non-obvious ways to break scheduler pipeline. That is not something new when DAG authors tried to change states in the middle of task execution or create some Airflow objects thought Models (and sessions when they found that code from SO doesn't work anymore).

However right now better we could do it is hide methods/properties from Auto-API

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

However right now better we could do it is hide methods/properties from Auto-API

agree

@hussein-awala hussein-awala changed the title [CSE-1089] Add back dag_run to docs Add back dag_run to docs Oct 24, 2023
@fritz-astronomer
Copy link
Contributor Author

fritz-astronomer commented Oct 26, 2023

Updated Public Interface Doc. Screenshots of locally rendered docs (as testing):

image image

Ready to merge, from me ✅

@eladkal eladkal added the type:doc-only Changelog: Doc Only label Oct 29, 2023
@ferruzzi ferruzzi merged commit ab80ca8 into apache:main Oct 31, 2023
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants