Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Distinctive MemoryDataset view on flowchart #1706

Closed
wants to merge 1 commit into from

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Jan 16, 2024

Description

Suggested for quick experimentation by @datajoely: MemoryDatasets are non-persistent datasets, meaning they are not saved after a run. Some flowcharts contain numerous MemoryDatasets, and Kedro-viz currently lacks a way to differentiate them from persistent datasets. This PR introduces opacity to MemoryDatasets, making them easily distinguishable on the flowchart. The transparency also signifies their non-persistent nature.

Currently this feature is under an experiment flag.

Development notes

MemoryDatasets

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@datajoely
Copy link
Contributor

It's beautiful! I'd suggest this should be default behaviour already - but it's should be really quick to get some user validation of that

@inigohidalgo
Copy link

I think this is very nice functionality, speaking as someone who has a loot of memorydatasets in my pipeline. In my case it's not relevant, but I know some users use "dummy" inputs/outputs to force node order execution, I'm not sure if they tend to use memorydatasets or another type.

Maybe a user-configurable toggle per dataset type to fade/not fade?

@@ -171,6 +172,10 @@ export const drawNodes = function (changed) {
)
.classed('pipeline-node--data', (node) => node.type === 'data')
.classed('pipeline-node--task', (node) => node.type === 'task')
.classed(
'pipeline-node--memory-data',
(node) => flags?.diffMemoryDatasets && node?.dataset?.includes('memory')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Can we have a DatasetType check instead of string matching ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think we will eventually refactor and do it properly. for now this is just a quick dirty experiment

@ravi-kumar-pilla
Copy link
Contributor

This looks great already to distinguish MemoryDataset. May be we can extend this to opt-in color-code dataset feature (we color datasets based on type - for easier debugging). Thank you !

@astrojuanlu
Copy link
Member

I really like this idea (more than #1707).

As I said there, I'm wondering if eventually we could provide a more generic customisation feature that would allow users to have more control over this, instead of adding a toggle.

@rashidakanchwala
Copy link
Contributor Author

@astrojuanlu -- I agree. After speaking to a few people, I do realise that while some people might like this, others might not find it so usable. They might want to differentiate other datasets instead. Perhaps we need to build some sort of customisability, as you mentioned in the other PR, to allow users to distinguish datasets rather than us doing it for them.

Ravi also mentioned color-coding datasets, and as we have too many datasets, giving each a color-code might end up being more confusing and less visually appealing? well, this is subjective, isnt it?

hence, providing them some sort of customisability where they can decide how they want to highlight or distinguish certain datasets over others would be better.

I will try and get more feedback on this short experiment. But keep the comments coming :)

@rashidakanchwala
Copy link
Contributor Author

Closing this experiment PR for now -- allowing users to differentiating datasets on kedro-viz seems like a good idea. We are going to try and do #1148 first and see how that picks up with users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants