Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task duration plot across dagruns #40755

Merged
merged 5 commits into from
Jul 30, 2024
Merged

Conversation

tirkarthi
Copy link
Contributor

closes: #40337
related: #40337

This implements multi-line chart to plot task duration across all dagruns in the grid. The duration includes only run duration and is stored as seconds. During rendering the y-axis value is converted to hh:mm:ss format along with the tooltip. In task duration for a specific task we can calculate max duration but calculating it across all instances means when an outlier task ran in hours that takes only seconds then deselecting the task from legend will still display all others in hours. Clicking on the data point takes the user to specific task instance detail tab.

I am facing an issue with moment import which I couldn't figure out. Any help is appreciated.

static/js/dag/details/task/AllTaskDuration.tsx:67:27 - error TS2686: 'moment' refers to a UMD global, but the current file is a module. Consider adding an import instead.

67       const runDuration = moment
                             ~~~~~~

static/js/dag/details/task/AllTaskDuration.tsx:83:12 - error TS2686: 'moment' refers to a UMD global, but the current file is a module. Consider adding an import instead.

83     return moment.utc(value * 1000).format("HH[h]:mm[m]:ss[s]");
              ~~~~~~

static/js/dag/details/task/AllTaskDuration.tsx:127:28 - error TS2686: 'moment' refers to a UMD global, but the current file is a module. Consider adding an import instead.

127           const duration = moment.utc(value * 1000);

Screenshot

image

echarts example : https://echarts.apache.org/examples/en/editor.html?c=line-stack&lang=js

@boring-cyborg boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Jul 12, 2024
@tirkarthi
Copy link
Contributor Author

tirkarthi commented Jul 12, 2024

Another option that I thought will be cool is a stacked bar chart that shows the percentage of time each task spent and showing variations. Like a gantt view of all dagruns but normalized to 1 and shows task time percentage of dagrun duration. Demo link and screenshot.

Demo Link : stacked bar chart

image

@bbovenzi
Copy link
Contributor

Love the idea of showing task duration even when nothing is selected.

I'm still partial to a bar chart over a line graph. The discrete task instance or dag run is more important than the trendline. So the stacked bar chart is cool. Of course over a certain number of tasks, it isn't worth displaying. But maybe, we can actually merge the dag run and task graphs into a single view and provide various options to users to adjust the chart to best fit how each DAG works.

@ketozhang
Copy link
Contributor

Amongst the suggestion, just want to remind the use case of #40337

Currently, we can only view the duration of tasks individually, which makes it difficult to identify spikes in specific tasks, especially if the total DAG run duration appears normal

The proportional stacked bar chart would make the use case difficult. Classical stacked bar chart would better suit this use case, so does line chart—but I agree with @bbovenzi bar chart is better (lines are for trends).

@ketozhang
Copy link
Contributor

Clicking on the data point takes the user to specific task instance detail tab.

❤️ this. Do you predict the user clicking this would be more interested in the details tab or the durations tab (task durations bar chart)?

@tirkarthi
Copy link
Contributor Author

Clicking on the data point takes the user to specific task instance detail tab.

❤️ this. Do you predict the user clicking this would be more interested in the details tab or the durations tab (task durations bar chart)?

Details tab was my first guess but it could even be logs since user might want to know why it took sometime. It can be changed as PR is reviewed and I don't have strong opinions here.

@tirkarthi
Copy link
Contributor Author

It looks easier to implement switching between line and bar chart. In the latest commit I have added a checkbox similar to "show landing time" in run duration without persistence to make it easier to play around locally if someone is interested. I also wanted to see how the chart is for dags with lot of tasks where many of our dags have minimum 10 tasks and below is a sample dag to generate random sleep tasks.

Line chart to me looks unusable when there are lot of data points that finish around same time but useful to see trends as noted with fewer number of tasks with varying time. Bar chart also looks useful where user can click on the bar to get to task detail like line chart and looks better in some cases. With a lot of tasks the bars/lines can get crammed and might be harder to select the exact one without unselecting few tasks in the legend.

Few thoughts and questions :

  • Lot of tasks means legend can be larger and possibly collides with y axis. I tried various options in echarts like padding, nameGap but couldn't get it working.
  • Task groups don't filter down to further tasks as separate bars.
  • I have seen the chart go negative while running might be some issue in calculation of duration for running tasks.
  • Maybe keep the checkbox so that user can select the best representation as per the situation.
  • Tooltip could probably display only the relevant datapoint instead of time for everything. I am not sure how to select the right option for that in echarts.
from datetime import datetime, timedelta

from airflow import DAG
from airflow.decorators import task, task_group

with DAG(
    dag_id="all_task_duration",
    start_date=datetime(2024, 7, 1),
    catchup=True,
    schedule_interval="@daily",
) as dag:

    def base(i, j):
        import random, time

        duration = random.randrange(i, j)

        for index in range(duration):
            time.sleep(1)
            print(index)

    @task
    def extract():
        base(10, 20)

    @task
    def load():
        base(20, 30)

    @task
    def transform():
        base(2, 5)

    @task
    def transform_1():
        base(2, 5)

    @task
    def transform_2():
        base(2, 5)

    @task
    def transform_3():
        base(2, 5)

    @task
    def transform_4():
        base(2, 5)

    @task
    def transform_5():
        base(2, 5)

    @task
    def transform_6():
        base(2, 5)

    @task
    def transform_7():
        base(2, 5)

    @task
    def transform_8():
        base(2, 5)

    @task
    def transform_9():
        base(2, 5)

    @task
    def transform_10():
        base(2, 5)

    @task
    def transform_11():
        base(2, 5)

    @task_group(group_id="my_task_group")
    def tg1():

        @task
        def tg11():
            base(11, 20)

        @task
        def tg12():
            base(20, 30)

        tg11() >> tg12()

    (
        extract()
        >> load()
        >> tg1()
        >> transform()
        >> transform_1()
        >> transform_2()
        >> transform_3()
        >> transform_4()
        >> transform_5()
        >> transform_6()
        >> transform_7()
        >> transform_8()
        >> transform_9()
        >> transform_10()
        >> transform_11()
    )

Screenshots :

image

image

Lot of tasks with same duration where bar chart looks better than line chart

image

image

@tirkarthi tirkarthi force-pushed the task-duration-all branch from 5f10a1d to 592e9d2 Compare July 18, 2024 13:50
@tirkarthi
Copy link
Contributor Author

Love the idea of showing task duration even when nothing is selected.
I'm still partial to a bar chart over a line graph. The discrete task instance or dag run is more important than the trendline. So the stacked bar chart is cool. Of course over a certain number of tasks, it isn't worth displaying. But maybe, we can actually merge the dag run and task graphs into a single view and provide various options to users to adjust the chart to best fit how each DAG works

Just thought to add about merging dagrun and task duration that sometimes the individual task duration doesn't really add to dagrun time since there could be cases where pool slot is not available, Airflow is down, maintenance etc where the task will execute fine once the resources/dependencies are met but the might not indicate an overall delay in the chart thus introducing discrepancy between the bar charts on the left of grid view not corresponding to the chart on the right.

@tirkarthi
Copy link
Contributor Author

@bbovenzi @ketozhang If someone can also help in fixing the typescript issue in the PR description moment import that would also help in making the PR green for further review. I am not sure what I am doing wrong here. The class name AllTaskDuration could also be renamed later to be something more meaningful.

Thanks

@bbovenzi
Copy link
Contributor

  • Consolidating the run and task charts. I think adding an additional sub-bar which is all the non-task run time would be really helpful for users!
  • I love selecting and deselecting various tasks. We might want to choose a maximum number of tasks to show for very large DAGs, maybe 20? Later, we can add some additional logic to consolidate all short duration tasks into a "Other" category to keep things readable.
  • I'm still not a fan of the line chart but toggling between is fine.

I'll pull down your branch later today and help with the ts work

@tirkarthi tirkarthi force-pushed the task-duration-all branch from 592e9d2 to d267aef Compare July 24, 2024 17:31
@eladkal eladkal added this to the Airflow 2.10.0 milestone Jul 25, 2024
@eladkal eladkal added the type:improvement Changelog: Improvements label Jul 25, 2024
Copy link
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 2.10 is due soon, let's get this merged and we can make enhancements in smaller PRs before release.

@bbovenzi bbovenzi merged commit f10fbe6 into apache:main Jul 30, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request to Reintroduce Tasks Duration Multi Line Graph
4 participants