[Experiment] Memory Datasets on the flowchart #1709

NeroOkwa · 2024-01-17T12:42:37Z

Description

MemoryDatasets are non-persistent datasets, that are not saved after a run. Some users have flowcharts containing numerous MemoryDatasets, and Kedro-Viz currently lacks a way to show and differentiate them from persistent datasets.

This was suggested for quick experimentation by @datajoely, but related questions have been asked by slack users:

"Does anyone know how layers are inferred for datasets which are not tagged in the catalog and just exist as memory datasets.
For example: If you have a pipeline where the only elements you persist are lets say till primary layer, and then you jump to model output in the end. How will the layers be inferred in that case ?"

Context

This is particularly useful for larger and more complex pipelines, where there is a greater need to track datasets. Another benefit is that it supports the debugging of the pipeline.

Possible Implementation

The first step would be to show MemoryDatasets on the Kedro-Viz flowchart, and then the ability to filter them out of the view as required. Implementation would require:

Design
Engineering

@rashidakanchwala has created two experimental PRs to address this:

Distinctive MemoryDataset view on flowchart [Experiment] Distinctive MemoryDataset view on flowchart #1706 - This PR introduces opacity to MemoryDatasets, making them easily distinguishable on the flowchart. The transparency also signifies their non-persistent nature.
Show/Hide Memory Datasets on the flowchart [Experiment] Show/Hide Memory Datasets on the flowchart #1707 - This PR offers an experimental toggle to show/hide Memory Datasets, functioning similarly to the show/hide dataset.

Acceptance Criteria

The user launches Kedro-Viz and sees their project in the flowchart UI.
They go to the settings panel and select the toggle 'Show MemoryDatasets in flowchart', saving their option.
An opaque version of the MemoryDataset is shown in flowchart.
When the MemoryDataset is selected in the flowchart, the side panel opens with details of the datasets
Users can also go the filter panel and select Memory Datasets. This shows/hides it from the flowchart view.

datajoely · 2024-01-30T17:04:02Z

Are we going to revisit the UI side of things?

rashidakanchwala · 2024-01-30T17:49:05Z

Yes, we have prioritised for this ticket - #1148.We will explore the concept of allowing users to customise (icons/color) their dataset through the catalog.

NeroOkwa added the Issue: Feature Request label Jan 17, 2024

NeroOkwa added this to the Improve large pipeline experience milestone Jan 17, 2024

NeroOkwa added this to Kedro-Viz Jan 17, 2024

NeroOkwa assigned NeroOkwa and rashidakanchwala and unassigned NeroOkwa Jan 17, 2024

NeroOkwa added Quick Win Low/Medium priorities but quick to do Design: UX labels Jan 17, 2024

NeroOkwa assigned stephkaiser Jan 17, 2024

This was referenced Jan 17, 2024

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

Closed

Add attribute to flag persistence in Dataset classes kedro-org/kedro#3520

Merged

rashidakanchwala moved this to Inbox in Kedro-Viz Jan 30, 2024

rashidakanchwala closed this as completed Jan 30, 2024

github-project-automation bot moved this from Inbox to Done in Kedro-Viz Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Memory Datasets on the flowchart #1709

[Experiment] Memory Datasets on the flowchart #1709

NeroOkwa commented Jan 17, 2024 •

edited

Loading

datajoely commented Jan 30, 2024

rashidakanchwala commented Jan 30, 2024

[Experiment] Memory Datasets on the flowchart #1709

[Experiment] Memory Datasets on the flowchart #1709

Comments

NeroOkwa commented Jan 17, 2024 • edited Loading

Description

Context

Possible Implementation

Acceptance Criteria

datajoely commented Jan 30, 2024

rashidakanchwala commented Jan 30, 2024

NeroOkwa commented Jan 17, 2024 •

edited

Loading