Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter ratio dynamic metric #24

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

andreiNek
Copy link

  • calcNodeMetrics in updateSqlNodeMetrics (for live updates) and calculateSql (for completed runs)
    were replaced with updateNodeMetrics function.
  • updateNodeMetrics accepts the node graph and able to add extra insights based on spark metrics and graphs insights.
  • new logic for metrics add-on can be added here.
  • Now there is only 1 call to addFilterRatioMetric which adds to Filter/Join nodes an extra metrics names filter_ratio (in percentage).
  • base use cases like no input, stage followed by filter, more than one input node with rows and join filtering are implemented.

return updatedMetrics;
}

const filterRatio = ((totalInputRows - outputRows) / totalInputRows) * 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use formatPercentage util method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "filter ratio" is the opposite I think, if you filter 1000 from 1M it's 0.1% filtered not 99%. Maybe the naming should be better here like "filtered rows percentage"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find formatPercentage util I can use in typescript but found NumberFormat, hope it is what you intended :)

@@ -399,6 +400,7 @@ export function updateSqlNodeMetrics(

const notEffectedSqls = currentStore.sqls.filter((sql) => sql.id !== sqlId);
const runningSql = runningSqls[0];
const graph = generateGraph(runningSql.edges, runningSql.nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we are generating the graph for every metric update cycle. This is not ideal.
We should cache the graph if there is no change.
Please at least add a todo comment to cache the graph.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about it, but my bad to forget the todo.
I think it should work with map with sqlId key easily.
But detecting actual changes is something different. We can do a check on node metrics before and after or compare their hashes. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants