Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIRFLOW-128 Optimize and refactor process_dag #1514

Merged
merged 1 commit into from
Jun 1, 2016

Conversation

bolkedebruin
Copy link
Contributor

@bolkedebruin bolkedebruin commented May 18, 2016

This addresses AIRFLOW-128.

@aoen @artwr @mistercrunch @r39132 @jlowin : ready for review.

Goals:

  • Improve readability of the code and generic assumptions (getters should not change a state) for DagRuns
  • Improve robustness and lower risk of race conditions in the scheduler
  • Reduce amount of calls to the database, limit connections in the scheduler
  • Identify speed optimizations possibilities

What has changed:

  • Two new TaskInstance states have been introduced. "REMOVED" and "SCHEDULED". REMOVED will be set when taskinstances are encountered that do no exist anymore in the DAG. This happens when a DAG is changed (ie. a new version). The "REMOVED" state exists for lineage purposes. "SCHEDULED" is used when a Task that did not have a state before is sent to the executor. It is used by both the scheduler and backfills. This state almost removes the race condition that exists if using multiple schedulers: due to the fact UP_FOR_RETRY is being managed by the TaskInstance (I think that is the wrong place) is still exists for that state.
  • get_active_runs was a getter that was also updating to the database. This patch refactors get_active_runs into two different functions that are now part of DagRun. 1) update_state updates the state of the dagrun based on the taskinstances of the dagrun. 2) verify_integrity checks and updates the dag run based on if the dag contains new or missing tasks.
  • DagRun.update_state has been updated to not call the database twice for the same functions. This reduces the time spent here by 50% in certain occasions when having many tasks in a Dag that need to be evaluated. Still this needs to be faster: for those Dags with many tasks the aggregation query in TaskInstance.are_dependencies_met is very expensive. It should be refactored.
  • process_dag has been updated to use the functions and cleaned up, making it much more readable. Tasks are now properly locked by the database. I have played with multiprocessing here (on dagruns and taskinstances) but left it out for now. Fixing the above will help more I think.

Stats:

  • New scheduler time spent at earlier stages is a bit more, due to eager creation of TaskIntances
  • Old scheduler MAX is higher due to "are_dependencies_met()" called twice
  • New scheduler fluctuates a bit more due to database locking and "are_dependencies_met" scanning the table (needs to wait for lock).

Old:

2016-05-23 11:17:39,031 INFO - Loop took: 0.018685 seconds
2016-05-23 11:17:44,022 INFO - Loop took: 0.013455 seconds
2016-05-23 11:17:49,033 INFO - Loop took: 0.018868 seconds
2016-05-23 11:17:54,031 INFO - Loop took: 0.019578 seconds
2016-05-23 11:17:59,024 INFO - Loop took: 0.013051 seconds
2016-05-23 11:18:05,026 INFO - Loop took: 1.010049 seconds
2016-05-23 11:18:10,399 INFO - Loop took: 1.390761 seconds
2016-05-23 11:18:32,112 INFO - Loop took: 18.104715 seconds
2016-05-23 11:19:03,432 INFO - Loop took: 31.089109 seconds
2016-05-23 11:19:14,140 INFO - Loop took: 10.492135 seconds
2016-05-23 11:19:38,339 INFO - Loop took: 24.05553 seconds
2016-05-23 11:20:06,281 INFO - Loop took: 27.887196 seconds
2016-05-23 11:20:30,215 INFO - Loop took: 23.9155 seconds
2016-05-23 11:20:53,953 INFO - Loop took: 23.375444 seconds
2016-05-23 11:21:29,168 INFO - Loop took: 35.191994 seconds
2016-05-23 11:22:42,276 INFO - Loop took: 72.384736 seconds
2016-05-23 11:23:06,276 INFO - Loop took: 23.831495 seconds
2016-05-23 11:23:38,852 INFO - Loop took: 32.333608 seconds

New:

2016-05-23 11:26:12,021 INFO - Loop took: 0.011257 seconds
2016-05-23 11:26:17,031 INFO - Loop took: 0.018259 seconds
2016-05-23 11:26:22,021 INFO - Loop took: 0.01233 seconds
2016-05-23 11:26:27,026 INFO - Loop took: 0.017952 seconds
2016-05-23 11:26:32,031 INFO - Loop took: 0.017606 seconds
2016-05-23 11:26:37,707 INFO - Loop took: 0.697367 seconds
2016-05-23 11:26:43,268 INFO - Loop took: 1.255278 seconds
2016-05-23 11:27:01,234 INFO - Loop took: 14.225399 seconds
2016-05-23 11:27:03,832 INFO - Loop took: 2.580292 seconds
2016-05-23 11:27:35,556 INFO - Loop took: 29.534056 seconds
2016-05-23 11:27:55,896 INFO - Loop took: 20.321862 seconds
2016-05-23 11:28:10,192 INFO - Loop took: 14.250471 seconds
2016-05-23 11:28:40,778 INFO - Loop took: 30.337702 seconds
2016-05-23 11:28:49,003 INFO - Loop took: 8.135393 seconds
2016-05-23 11:29:09,132 INFO - Loop took: 19.923375 seconds
2016-05-23 11:29:46,856 INFO - Loop took: 37.393256 seconds
2016-05-23 11:30:30,984 INFO - Loop took: 43.79019 seconds
2016-05-23 11:31:09,856 INFO - Loop took: 38.444254 seconds
2016-05-23 11:31:31,164 INFO - Loop took: 21.061177 seconds
2016-05-23 11:32:29,776 INFO - Loop took: 58.153763 seconds
2016-05-23 11:33:12,128 INFO - Loop took: 42.105758 seconds
2016-05-23 11:33:50,796 INFO - Loop took: 38.137385 seconds
2016-05-23 11:34:40,290 INFO - Loop took: 49.115855 seconds
2016-05-23 11:35:22,900 INFO - Loop took: 42.269646 seconds
2016-05-23 11:35:38,456 INFO - Loop took: 15.453262 seconds

** Note: unittests have been added to cover process_dag **

Dag used for testing:

  • Please note that both schedulers this Dag does not even finish. It only finished properly with the fully implemented new scheduler (including connecting dagruns by using previous ids)
from datetime import timedelta, datetime
from airflow.models import DAG, Pool
from airflow.operators import BashOperator, SubDagOperator, DummyOperator
from airflow.executors import SequentialExecutor
import airflow


# -----------------------------------------------------------------\
# DEFINE THE POOLS
# -----------------------------------------------------------------/
session = airflow.settings.Session()
for p in ['test_pool_1', 'test_pool_2', 'test_pool_3']:
    pool = (
        session.query(Pool)
        .filter(Pool.pool == p)
        .first())
    if not pool:
        session.add(Pool(pool=p, slots=8))
        session.commit()


# -----------------------------------------------------------------\
# DEFINE THE DAG
# -----------------------------------------------------------------/

# Define the Dag Name. This must be unique.
dag_name = 'hanging_subdags_n16_sqe'

# Default args are passed to each task
default_args = {
    'owner': 'Airflow',
    'depends_on_past': False,
    'start_date': datetime(2016, 04, 10),
    'retries': 0,
    'retry_interval': timedelta(minutes=5),
    'email': ['your@email.com'],
    'email_on_failure': True,
    'email_on_retry': True,
    'wait_for_downstream': False,
}

# Create the dag object
dag = DAG(dag_name,
          default_args=default_args,
          schedule_interval='0 0 * * *'
          )

# -----------------------------------------------------------------\
# DEFINE THE TASKS
# -----------------------------------------------------------------/


def get_subdag(dag, sd_id, pool=None):
    subdag = DAG(
        dag_id='{parent_dag}.{sd_id}'.format(
            parent_dag=dag.dag_id,
            sd_id=sd_id),
        params=dag.params,
        default_args=dag.default_args,
        template_searchpath=dag.template_searchpath,
        user_defined_macros=dag.user_defined_macros,
    )

    t1 = BashOperator(
        task_id='{sd_id}_step_1'.format(
            sd_id=sd_id
        ),
        bash_command='echo "hello" && sleep 1',
        dag=subdag,
        pool=pool
    )

    t2 = BashOperator(
        task_id='{sd_id}_step_two'.format(
            sd_id=sd_id
        ),
        bash_command='echo "hello" && sleep 2',
        dag=subdag,
        pool=pool
    )

    t2.set_upstream(t1)

    sdo = SubDagOperator(
        task_id=sd_id,
        subdag=subdag,
        retries=0,
        retry_delay=timedelta(seconds=5),
        dag=dag,
        depends_on_past=False,
        executor=SequentialExecutor()
    )

    return sdo

start_task = DummyOperator(
    task_id='start',
    dag=dag
)

for n in range(1, 17):
    sd_i = get_subdag(dag=dag, sd_id='level_1_{n}'.format(n=n), pool='test_pool_1')
    sd_ii = get_subdag(dag=dag, sd_id='level_2_{n}'.format(n=n), pool='test_pool_2')
    sd_iii = get_subdag(dag=dag, sd_id='level_3_{n}'.format(n=n), pool='test_pool_3')

    sd_i.set_upstream(start_task)
    sd_ii.set_upstream(sd_i)
    sd_iii.set_upstream(sd_ii)

@bolkedebruin bolkedebruin force-pushed the process_dag branch 2 times, most recently from dab8009 to 320285b Compare May 19, 2016 09:21
@@ -1061,6 +1060,7 @@ def are_dependencies_met(

task = self.task

logging.info("Checkpoint A")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is for debugging, and will go away before final merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, everything after my initial commit in this pr is actually profiling stuff (ie. WIP). A lot of time is spent in are_dependencies_met as it is iterating over all tasks now every time now.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.48% when pulling 0a84f4a on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.42% when pulling 632ed76 on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.50% when pulling b0bfa3f on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.43% when pulling e60fa08 on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.52% when pulling 726de7c on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.71% when pulling d5d63bc on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.52% when pulling 529ceb2 on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.48% when pulling cf20a11 on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.48% when pulling dfd18bc on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.48% when pulling aad7554 on bolkedebruin:process_dag into ccfc4c8 on apache:master.

@bolkedebruin bolkedebruin changed the title AIRFLOW-128 Refactor get_active_runs into DagRun and reduce roundtrips to database in process_dag AIRFLOW-128 Optimize and refactor process_dag May 22, 2016
@landscape-bot
Copy link

Code Health
Repository health decreased by 0.46% when pulling 696063b on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.40% when pulling 3f97b2b on bolkedebruin:process_dag into 88f895a on apache:master.

@bolkedebruin bolkedebruin force-pushed the process_dag branch 2 times, most recently from 825866c to ff7ebba Compare May 23, 2016 08:54
@landscape-bot
Copy link

Code Health
Repository health decreased by 0.33% when pulling ff7ebba on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.33% when pulling ff7ebba on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.25% when pulling 68ec4dc on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.25% when pulling 5a1fed9 on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.34% when pulling a774a2e on bolkedebruin:process_dag into 88f895a on apache:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.33% when pulling d0ed971 on bolkedebruin:process_dag into 88f895a on apache:master.

@criccomini
Copy link
Contributor

Got it. Thanks! :)

or_(DagRun.external_trigger == False,
# add % as a wildcard for the like query
DagRun.run_id.like(DagRun.ID_PREFIX+'%')))
dag_id=dag.dag_id).filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my new preferred way to indent method chains is

qry = (
    session.query(func.max(DagRun.execution_date))
    .filter_by(dag_id = dag.dag_id)
    .filter(or_(
        DagRun.external_trigger == False,
        DagRun.run_id.like(DagRun.ID_PREFIX+'%')
    )
)

@mistercrunch
Copy link
Member

@plypaul, how does this play with your upcoming PR? can you submit yours or share the branch you've been working with to get a sense on whether there's duplicated effort/overlap here?

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Jun 1, 2016

@plypaul @mistercrunch I'm assuming #1559 is referred to. That one does raise some questions with me and @jlowin, but I dont mind working with @plypaul to make it work together. It should not be too difficult as #1559 is basically wrapping the areas I touched.

This patch addresses the following issues:
get_active_runs was a getter that was also updating
to the database. This patch refactors get_active_runs
into two different functions that are part of DagRun.
update_state update state of the dagrun based on the
taskinstances of the dagrun. verify_integrity checks
and updates the dag run based on if the dag contains
new or missing tasks.

Deadlock detection has been refactored to ensure that
database does not get hit twice, in some circumstances
this can reduce the time spent by 50%.

process_dag has been refactored to use the functions
of DagRun reducing complexity and reducing pressure on the
database. In addition locking is now properly working
under the assumption that the heartrate is longer
than the time process_dag spends.

Two new TaskInstance states have been introduced. "REMOVED"
and "SCHEDULED". REMOVED will be set when taskinstances
are encountered that do no exist anymore in the DAG.
This happens when a DAG is changed (ie. a new version).
The "REMOVED" state exists for lineage purposes.
"SCHEDULED" is used when a Task that did not have a state
before is sent to the executor. It is used by both the
scheduler and backfills. This state almost removes the
race condition that exists if using multiple schedulers:
due to the fact UP_FOR_RETRY is being managed by the
TaskInstance (I think that is the wrong place) is still
exists for that state.
@asfgit asfgit merged commit b18c995 into apache:master Jun 1, 2016
@plypaul
Copy link
Contributor

plypaul commented Jun 1, 2016

Yeah, as @bolkedebruin mentioned, this should be mostly compatible with #1559. To clarify, what's the difference between the SCHEDULED and QUEUED states?

@bolkedebruin
Copy link
Contributor Author

@plypaul SCHEDULED is set at handover from the scheduler to the executor only. It can only be set by the scheduler. It prevents race conditions. Ideally, it should always be the state of the task when it is sent to the executor. It isn't now due to UP_FOR_RETRY being handled by the TI instead of in the scheduler.

QUEUED indicates that a ti is waiting for a slot in the executor from either a pool or max parallelism. It is a bit ambiguous as it is managed in different locations. Some of the questions @jlowin and I have around #1559 are due to this ambiguity and your broader use of QUEUED.

@criccomini
Copy link
Contributor

@bolkedebruin so is the order SCHEDULED -> QUEUED?

@plypaul
Copy link
Contributor

plypaul commented Jun 3, 2016

Can you elaborate on the race condition that the SCHEDULED state helps to resolve?

@criccomini
Copy link
Contributor

criccomini commented Jun 3, 2016

@plypaul via https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics

The scheduler processes tasks that have a state of NONE, QUEUED, and UP_FOR_RETRY. NONE is a newly created TaskInstance, QUEUED is a task that is waiting for a slot in an executor and UP_FOR_RETRY means a task that failed before but needs to be retried. At the moment the scheduler will not change the state of a Task. This creates the possibility of a race condition: if the executor cannot execute a task quickly enough its state will not be updated and it can get scheduled again. Due to the loop times being quite large in complex cases, this won't occur too often. As people try to limit scheduler loop times, or just prefer LocalExecutor more than the CeleryExecutor, they will run multiple schedulers at the same time, which increases the chances of this happening.

For this reason a "SCHEDULED" state is proposed in one of the PRs. This will not fully close the door to this race condition due to the fact that TaskInstance evaluate their own state in case of UP_FOR_RETRY (ie. checking if they should run now). Normally one would move this check to the scheduler was it not for that fact that we have backfills...

Basically, without a SCHEDULED state, the task is given to the executor. If the scheduler loops around fast enough again (or you have multiple schedulers running) before the executor updates the task's state, the scheduler will schedule it again, since the task's state hasn't changed (and it's still runnable).

@plypaul
Copy link
Contributor

plypaul commented Jun 3, 2016

I share the same concern as @jlowin. Schedulers can be restarted at random points for deployment / instance replacements, and having orphaned tasks is something that needs to be addressed.

@plypaul
Copy link
Contributor

plypaul commented Jun 3, 2016

Prior to adding the SCHEDULED state, I take it that you were seeing task instances getting queued multiple times in the executor? Since the code in TaskInstance.run() seems to (somewhat) mitigate this by detecting the RUNNING state, what sort of issues were you seeing?

@bolkedebruin
Copy link
Contributor Author

@plypaul arbitrary means a SIGKILL in this case which leaves any process in an undetermined state. So the chances of it to occur are small (it also needs to coincide with the time between the change to scheduled and the executor receiving it at a kill of the scheduler). There is also a small chance the executor gets killed which would leave the task in limbo too.

To remove no 1. The with_for_update needs to wrap the "send" to the executor. The transaction to the db will then fail in case of a kill and the state will not be changed. To remove number 2 probably a kind of garbage collection needs to be added (at the time of checking for zombie tasks?).

Changes are slim and in general one should not go around kill -9 processes.

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Jun 3, 2016

On the occurrence of the race condition please see the comment in the executor by @jlowin. As said when people get more complex environments and start running two local executors for example, the chances for this to occur would be there for every scheduler loop so for it to occur at a certain point in time would be 99% certain. People might just not have seen it.

@plypaul
Copy link
Contributor

plypaul commented Jun 8, 2016

The condition where task instances can be orphaned doesn't require SIGKILL - if the scheduler is stopped by SIGTERM after the task is sent to the executor but before the task is run, all those task instances will be left in the SCHEDULED state, right?

This could be a problem in our setup where there can be 1000's of task instances waiting for a slot in the executor to run. That queue can take hours to clear, and if the scheduler is restarted at any point in that window, we'll have a bunch of orphaned tasks.

We also use the CeleryExecutor, which relies on an external queue mechanism. The queue can be cleared (for example, the Redis instance dies or needs to be restarted), and when this happens, we'll also have a bunch of orphaned tasks.

Overall, it would be great for Airflow to be resilient to restarts and failures. Machines die and services restart all the time, and the oncall who wakes up needs to have simple remediation procedures for handling the failure. Without a way to recover from these cases, it would make the oncall's life much more difficult.

@bolkedebruin
Copy link
Contributor Author

@plypaul in case of celery that is a really short time frame: it gets to scheduled at the moment of sending it to the executor (thus celery). So if you get in between that moment yes you can get a orphaned task at the moment although chances imho are very slim. This can be further eliminated by surrounding the "send to executor" with the "for_update_block" this way the record wont get updated if sending to the executor fails.

In addition the executor should set a state when it picks up the task. This way you can do a bit of garbage collection of tasks that are in a certain state without a heartbeat for some time.

@plypaul
Copy link
Contributor

plypaul commented Jun 8, 2016

So this was my understanding of how the state changes with this PR:

  1. Scheduler processes DAGs, and if a task instance needs to be run, it is created in the ORM in the SCHEDULED state and added to the queue of tasks that should be sent to the executor. Task instances already in the SCHEDULED state do not get added to the queue.
  2. Once all the DAG files have been processed, task instances are examined and sent to the executor.
  3. The executor runs the task. When the task runs, the state of the task changes from SCHEDULED to RUNNING.

When there are many DAGs, step 1 can take a long time. In our case, it's about 6 minutes. If the scheduler exits between steps 1 and 2, there will be orphaned tasks and the 6 minute window is fairly large.

Also, the state of the task does not change when it's sent to the executor. The state of the task only changes when the task is actually run. With the CeleryExecutor, the task would stay in the SCHEDULED state until a worker becomes free, picks the task from the Redis queue, and runs it. On busy moments, the task can be in either the internal executor queue or the Redis queue for many minutes. If the scheduler restarts while the task is in the internal executor queue, there will be orphaned tasks. Likewise, if the Redis queue is cleared, there will be orphaned tasks.

@bolkedebruin
Copy link
Contributor Author

Ah so yes like I mentioned we probably should move the set to scheduled state to to moment it is really sent to the executor and indeed not at evaluation time.

And yes at the moment the executor does not set a state but it should.

I'll have a look at #1 as that is the biggest issue and think about a garbage collector.

@bolkedebruin
Copy link
Contributor Author

Btw the issue with the scheduler can/should only occur on a SIGKILL a sigterm allows for clean termination (which might need some work)

@plypaul
Copy link
Contributor

plypaul commented Jun 8, 2016

Considering the operational issues with the scheduled state, it seems like the garbage collector is needed. Why don't we go back to the previous state logic, and put then in the state change + garbage collector together?

@bolkedebruin
Copy link
Contributor Author

Ha that sounds like it is not needed (reverting - working together is fine :) ). Applying the previous state logic would just mean also evaluating the scheduled status and and re-scheduling those taskinstances everytime and let the taskinstance at run time figure out what to do. This is what the previous scheduler did with"none" states. That's kind of a one-liner.

What I would suggest is to move the "set to scheduled state" to when it the taskinstance is sent to the executor. This resolves the orphaning when the scheduler is killed and is a small change.

For the garbage collector we can add a time stamp last_updated to the task_instance which will get set everytime the record is updated (or a state change happens). The scheduler can then do a simple sweep by sql statement to set the state to "none" or to a new "reschedule" state by comparing it to an arbitrary timeout value. This would be orphaning down the line and would be future proof. Also a one liner.

Or even better we could ask the executor which tasks it knows about and compare that to the list of scheduled tasks. If they are not there reschedule them. This is a little bit more work and might need to be combined with the above one as in the past the reporting by the executor was not really trustworthy.

What do you think? Over the weekend I can have a patch for both I think.

@plypaul
Copy link
Contributor

plypaul commented Jun 9, 2016

Changing the state when the task instance when the executor gets it reduces the window, but there are still cases where we can have a bunch of orphaned tasks. Hence, we want to try for a more robust solution.

As you point out, having the scheduler check the executor to see if a particular task instance has been submitted already and only queue if it hasn't been queued seems like a simple solution to the original problem. With that solution, there wouldn't need to be an additional timeout, right?

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Jun 9, 2016

You are right on the possibility of orphaned tasks, but please approach them as two separate windows. Moving the "setting schedule state" to when it gets sent to the executor closes one as the DB safe guards against a kill of the scheduler by not committing the transaction. The second one is the time between the handover from the executor to the worker by means of the MQ (and then after not setting a different state).

Indeed asking the executor for the information should work. The timeout would just safe guard defensively against the executor not reporting back correctly. It 'lost' tasks in the past therefore @jlowin reverted some logic in process_queued_tasks before the release of 1.7.1 .

So yes I think we are on the right track :). Let's see if asking the executor solves the issue, but I do think scanning for orphaned tasks might be nice to have in also to also safe guard against any future changes. Ie. the scan has a holistic view and asking the executor it is only an particular view.

@bolkedebruin
Copy link
Contributor Author

@plypaul I did a first iteration in #1581 (untested) lets continue the discussing there.

state=State.unfinished(),
session=session
)
none_depends_on_past = all(t.task.depends_on_past for t in unfinished_tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be not t.task.depends_on_past?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants