Skip to content

Should Worker.data_needed be priority-ordered? #5323

Closed
@gjoseph92

Description

@gjoseph92

Worker.data_needed is a list of tasks that need data to be fetched. Currently it's just FIFO by task submission time. So in theory, if a higher-priority task gets submitted after a lower-priority one, the low-priority one will get its data fetched first, and therefore probably run first.

I'm not sure in reality if this is much of an issue. We tend to see the biggest priority-vs-FIFO-ordering issues with root tasks; since they don't have dependencies, they're not relevant here. But it does just feel odd. The question is how often tasks with dependencies get submitted to workers out of priority order.

Switching this to a priority heap would be pretty easy and probably pretty cheap. There's even a TODO for it:

self.data_needed = deque() # TODO: replace with heap?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions