Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vizro Discussion #2898

Closed
merelcht opened this issue Aug 4, 2023 · 22 comments
Closed

Vizro Discussion #2898

merelcht opened this issue Aug 4, 2023 · 22 comments
Assignees
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Technical DR 💾 Decision Records (technical decisions made)

Comments

@merelcht
Copy link
Member

merelcht commented Aug 4, 2023

Description

Placeholder ticket to be filled in by @rashidakanchwala for the VizX Tech Discussion on 23/8

Aim of the discussion

Context

Pre-reading

Other useful links

@merelcht merelcht converted this from a draft issue Aug 4, 2023
@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Aug 4, 2023
@noklam
Copy link
Contributor

noklam commented Aug 14, 2023

Moved to 30/8

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Aug 29, 2023

For more information on the VizX integration with Kedro-viz, please refer to this issue: kedro-viz issue #1457.

First Part of Technical Design: Learning and Introduction

  • An introduction to VizX: VizX on GitHub.
  • A demo of the VizX and Kedro-viz integration done during hackathon

Second Part of Technical Design: Integration Discussion

We aim to explore the methods for integrating VizX with Kedro.

There are two primary user journeys to consider:

  • Existing Kedro users who could enhance their experience with a dashboard by incorporating VizX into their Kedro projects.
  • VizX users interested in viewing their app within Kedro-viz. How can we streamline the connection to Kedro-viz for them?

To address these user journeys, we're evaluating the following options:

  • Should we directly integrate VizX into the Kedro project for a more seamless experience for existing Kedro users?
  • Alternatively, should we keep VizX and Kedro separate? When starting Kedro-viz, users could specify a path to the VizX folder, which might be located inside or outside the Kedro project. This approach would enable the VizX app to run concurrently and be visible as an iframe within Kedro-viz.

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Aug 30, 2023

Integration Methods for VizX and Kedro/Kedro-Viz

1. Seamless Integration with VizX

Overview:

  • VizX is seamlessly integrated.
  • VizX users currently use various methods for dashboard configuration, including from_yaml, from_dict, from_json, or the default pydantic way. Given Kedro's use of yaml, here's a proposed approach:
    • Implement the yaml method for dashboard configuration in Kedro, preferably in the conf folder.
    • Use data transformation functions required for the dashboard as nodes with raw data inputs and reporting data outputs, similar to Plotly.
    • On the Kedro-viz side, we do a VizX integration that identifies the dashboard.yaml, runs VizX on a different port, and introduces an iframe connected to that port in kedro-viz.

Pros:

  • Seamless experience for Kedro users familiar with the Kedro approach.

Cons:

  • VizX has other components like the assets folder. It contains images and advanced CSS configurations. Its placement within a Kedro project is unclear. Maybe images could reside in data folder?
  • VizX users using different methods to build dashboard will need to migrate their code to yaml.
  • The yaml method lacks features like code completion, available in the pydantic method.

2. Independent VizX Integration

Overview:

  • VizX operates separately from Kedro-viz.
  • Proposed structure:
    • A VizX folder with:
      • app.py for data transformation functions.
      • A dashboard configuration file in dict, json, yaml, or py format.
      • An assets folder.
    • A path reference to the VizX folder in the Kedro project, possibly in settings.py.
    • Kedro-viz would read this folder, run the VizX app, and show it in an iframe.

Pros:

  • Simplified integration for VizX users.
  • Kedro users can refer to VizX documentation to learn how to build a VizX dashboard.
  • Kedro users can enjoy code completion with the pydantic method.

Cons:

  • The fit within the Kedro project structure remains ambiguous.
  • VizX isn't connected to the Kedro project but is run in Kedro-viz. The integration might feel disjointed.

3. Kedro-Viz Support for Generic Applications

(Technical feasibility still under investigation)

Overview:

  • This resembles the independent VizX integration but we keep things generic so we can extend it to other apps like streamlit
  • Key difference: The run command is also in the VizX folder.
  • A location in the Kedro project, like settings.py, that registers applications for Kedro-viz.
  • Apps like streamlit or VizX use Kedro-sessions to connect to the Kedro Data Catalog. After registration, the app's path is shared with Kedro-viz, which runs the app in an iframe.

Pros:

  • Allows diverse applications to run in Kedro-viz, not just VizX.

Cons:

  • The cons mentioned in the second method.
  • The method isn't solely for VizX integration, although through our documentation we can recommend VizX for dashboard development.

@datajoely
Copy link
Contributor

datajoely commented Aug 30, 2023

3! 3! 3! I can't find the issue, but I've wanted it for ages - especially if you can expose the session and catalog programmatically to any generic application

@astrojuanlu
Copy link
Member

especially if you can expose the session and catalog programmatically to any generic application

Could you elaborate?

@deepyaman
Copy link
Member

  • Should we directly integrate VizX into the Kedro project for a more seamless experience for existing Kedro users?
  • Alternatively, should we keep VizX and Kedro separate? When starting Kedro-viz, users could specify a path to the VizX folder, which might be located inside or outside the Kedro project. This approach would enable the VizX app to run concurrently and be visible as an iframe within Kedro-viz.

Keep it separate.

There are already calls to further componentize Kedro core (e.g. separate Kedro data catalog so it can be used independently); this would do the opposite, for something that's not core to every Kedro workflow (a lot of data pipelines may not require visualization, especially in production).

Integration Methods for VizX and Kedro/Kedro-Viz

Strong preference for #2. If you want the most flexible way to build and serve plots, and have a way to do so linked with your Kedro pipeline, it would make sense to not be forced to do it via Kedro-Viz. I would imagine there are a lot of use cases where people want a hosted dashboard (e.g. maybe as part of a website) powered by some data coming from Kedro, but it's very unlikely IMO they would want to see all of Kedro-Viz there.

A more "cohesive" experience for showing plots via Kedro-Viz could, at some later point, leverage the above integration/plugin, but I don't see this as a priority.

@datajoely
Copy link
Contributor

especially if you can expose the session and catalog programmatically to any generic application

If tools can access the catalog data and session metadata via an API then they can do lots of things without creating their own session.

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Aug 30, 2023

I would imagine there are a lot of use cases where people want a hosted dashboard (e.g. maybe as part of a website) powered by some data coming from Kedro, but it's very unlikely IMO they would want to see all of Kedro-Viz there.

True, so that part is already available. VizX has already built it's own Kedro intergration which allows it to access the Kedro catalog and run the dashboard which can be hosted or integrated elsewhere.

the above is specifically for running VizX inside Kedro-viz.

@deepyaman
Copy link
Member

I would imagine there are a lot of use cases where people want a hosted dashboard (e.g. maybe as part of a website) powered by some data coming from Kedro, but it's very unlikely IMO they would want to see all of Kedro-Viz there.

True, so that part is already available. VizX has already built it's own Kedro intergration which allows it to access the Kedro catalog and run the dashboard which can be hosted or integrated elsewhere.

the above is specifically for running VizX inside Kedro-viz.

Oh, that's good to know. In that case, I don't have a particularly strong opinion; I don't particularly have anything against #1 even, as long as the additional config handling is in Kedro-Viz (i.e. Kedro core should not explicitly load conf/base/dashboards.yaml or whatever, because the functionality is entirely to enable something in Kedro-Viz).

I also personally don't see this as a high priority, since I am not aware of users asking to see dashboards in Kedro-Viz, UNLESS it's clear to the Kedro-Viz team that doing so would increase adoption of Kedro-Viz. There seems to be some "evidence markers" in kedro-org/kedro-viz#1457 to support that, but they're all around translators?

From the same issue:

  • Translators also seem to rely on VizX dashboards in senior stakeholder conversations, and less so with Kedro-Viz

I may be missing some context, but why should translators show Kedro-Viz in stakeholder conversations? It seems perfectly reasonable to me that they should show the dashboard. I've only really seen Kedro-Viz used with clients to show progress ("look at how our baby pipeline has grown up over the past few months", "see how we connect to your various data sources") and highlight the work of data scientists and engineers; I have not personally seen it used in conversations to showcase insights, because a dashboard is the better choice there, and it doesn't necessarily need to live in Kedro-Viz.

@rashidakanchwala rashidakanchwala changed the title VizX Discussion Vizro Discussion Sep 5, 2023
@noklam
Copy link
Contributor

noklam commented Sep 5, 2023

Both 1 & 2 seems like a tight couple with kedro to me. (1) requires dumping stuff in data and (2) requires changing of settings.py.

  1. seems ideal but not sure how feasible and what are the other integration we have in mind? In any case, I don't think settings.py is the correct place for plugin to inject their settings, see Spike: Provide a way for plugins to have runtime configuration and extend CLI arguments #2866. I prefer it to be a plugin, either a separate kedro-vizro plugin or an optional for kedro-viz[vizro]

VizX isn't connected to the Kedro project but is run in Kedro-viz. The integration might feel disjointed.

Kedro-viz already integrated with Kedro so kedro-viz[vizro] may not be a bad option?

@deepyaman
Copy link
Member

Kedro-viz already integrated with Kedro so kedro-viz[vizro] may not be a bad option?

*kedro-viz[ro] 😁

@rashidakanchwala
Copy link
Contributor

Both 1 & 2 seems like a tight couple with kedro to me. (1) requires dumping stuff in data and (2) requires changing of settings.py.

  1. seems ideal but not sure how feasible and what are the other integration we have in mind? In any case, I don't think settings.py is the correct place for plugin to inject their settings, see Provide a way for plugins to have runtime configuration #2866. I prefer it to be a plugin, either a separate kedro-vizro plugin or an optional for kedro-viz[vizro]

VizX isn't connected to the Kedro project but is run in Kedro-viz. The integration might feel disjointed.

Kedro-viz already integrated with Kedro so kedro-viz[vizro] may not be a bad option?

I want to understand this more. What would the user journey look like? where would they define the path to vizro_app?

@rashidakanchwala
Copy link
Contributor

Summary from the last tech discussion

  • @idanov likes the first approach
  • @yetudada wants to also extend support to streamlit and other apps.

@idanov mentioned building this using plugins (kedro/ vizro/ kedro-viz plugins?) which we are currently investigating.

  • Next technical discussion we are going to look at discussing implementation using plugins. For now, we still start with support for only Vizro but we want to build it in a way that in future it we can extend it to other apps.

@noklam
Copy link
Contributor

noklam commented Sep 18, 2023

The three approaches have some overlap with each other and the final solution is likely a mix of them (i.e. for example we could take approach 1 but still enable integration with streamlit or vizro). It would help us to understand better by looking at the following dimension and see what is possible or not possible with each approach.

  • Use of ConfigLoader
  • Where is the application folder stored?
  • Where should the transformation logic goes? (it should goes to Kedro pipeline, but is there any chance we need to keep some filtering script in vizro level? i.e. Kedro compute a master dashboard table and vizro trim down the data for specific use or exploring the data with different filter.)
  • Where does the integration goes to? (kedro/kedro-viz/ new kedro-virzo plugin?)
  • How do user interact with the application? (via new CLI or existing kedro viz command?)
  • How do user provide configuration/settings at runtime? (settings.py or some new configuration)?
  • How would the backend work?
  • How would the frontend work?
  • How do user install this package?
  • Code completion

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Sep 18, 2023

Thanks @noklam for this - above questions are exactly what we need to discuss in our next technical design session.

@astrojuanlu
Copy link
Member

A user was asking today whether there's a way to "automate the dashboard creation using VizX". This is an interesting use case we could look into.

For example, something like

# catalog.yaml
input_data:
  type: polars.GenericDataset
  format: parquet
  filepath: data/03_primary/data.pq

dashboard:
  type: vizro.VizroDataset
  filepath: data/08_reporting/dashboard.yaml
# nodes.py
import vizro.models as vm
import polars as pl

def create_dashboard(input_data: pl.DataFrame) -> pl.Dashboard:
  page = ...
  dashboard = vm.Dashboard(pages=[page])
  return dashboard

But this assumes a dashboard built from Vizro pydantic models can be serialized to YAML or JSON (couldn't verify that from the docs) and also the resulting YAML needs a Python script to load it and run it: https://vizro.readthedocs.io/en/latest/pages/user_guides/dashboard/

# app.py
from pathlib import Path

import yaml

import vizro.plotly.express as px
from vizro import Vizro
from vizro.managers import data_manager
from vizro.models import Dashboard

data_manager["iris"] = px.data.iris
dashboard = yaml.safe_load(Path("dashboard.yaml").read_text(encoding="utf-8"))
dashboard = Dashboard(**dashboard)

if __name__ == "__main__":
    Vizro().build(dashboard).run()

@inigohidalgo
Copy link
Contributor

Hi all!

@noklam pointed me this way and asked me to share some messages I wrote in the slack relating to kedro/vizro integration

Some months back I experimented with using the kedro datacatalog as a dataloader for streamlit dashboards. It worked very well by allowing me to build a dropdown which would return the selected dataset to then be visualized in different ways. It was useful for building reusable mini-dashboards and my idea was to try to include an app/ in every kedro project which would automatically provide some basic visualizations out of the box. I didn't get to that stage, but I am convinced a clean datacatalog integration, including some prebuilt components like the one I mentioned for the dropdown would be a highly valuable tool.

I also briefly considered using kedro pipelines as the source of transformation steps for the actual running dashboard but 1. it was too complex, 2. it introduced too much overhead for streamlit's constant recalculation of stuff and 3. it simply did not add that much functionality beyond just using the datacatalog and doing the final filtering on the streamlit side. I'm sure y'all will come up with a much cleaner integration than that.

@stephkaiser
Copy link

@inigohidalgo that is awesome, I had a similar idea of providing some out-of-the-box visualisations using vizro in kedro-viz. Could i reach out to you to understand more when we start research on this? 😊

@rashidakanchwala
Copy link
Contributor

@inigohidalgo that is awesome, I had a similar idea of providing some out-of-the-box visualisations using vizro in kedro-viz. Could i reach out to you to understand more when we start research on this? 😊

EDA would be a very good out-of-box-visualisation.

@inigohidalgo
Copy link
Contributor

Could i reach out to you to understand more when we start research on this? 😊

Sure :)

@rashidakanchwala
Copy link
Contributor

Summary of what was discussed in technical design today: -

  • Firstly, we need to define a standard way for users to define dashboard configurations in their Kedro Projects both through a dashboard.py (Pydantic way ) and a dashboard.yaml file.
  • Secondly, we are planning to create a Vizro plugin. This plugin will act as an entry point to kedro-viz, enabling the simultaneous running of both Vizro and kedro-viz. In the future, we will develop a similar plugin for streamlit, and also define conventions to support Streamlit applications.
  • For the initial release (version 1.0), we will use IFrames to display dashboards on kedro-viz. We can enhance this solution in the future.

@merelcht merelcht added the Type: Technical DR 💾 Decision Records (technical decisions made) label Dec 13, 2023
@merelcht
Copy link
Member Author

Closing this, because the initial discussions are completed. A new issue should be opened when more decisions need to be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Technical DR 💾 Decision Records (technical decisions made)
Projects
None yet
Development

No branches or pull requests

8 participants