[SIP-81] - Chart creation without a dataset #19953

eschutho · 2022-05-04T16:35:03Z

[SIP-81] Proposal for Chart creation without a dataset

Motivation

Currently a user needs to create a dataset for each chart that they want to create. Many times these charts aren’t kept for long, usually either never making it to a dashboard or because someone just wants a quick view of their data to share for feedback or to gain insight into their own queries, tables, etc. A lot of new users don’t understand what a dataset is or why they need it. We want to allow people to progressively move into dataset usage, and allow them to create a chart quickly based on either a query, saved query, table or dataset. When they save we will prompt them to name a dataset, which will be a much lower barrier to visualizing their data quickly.

Proposed Change

Users should be able to create a chart from the chart page, from sql lab, or from a dataset. From explore or SQL Lab, they need to be able to view a chart, apply filters, see a list of columns in their query or table just as they do now, but without creating a dataset. If coming from a dataset view, they should be able to continue to use a dataset to back a chart as they can currently.

This solution is based on the recently approved flow: #18584 Per this flow, users will be able to create a chart from any of the above listed data types. When saving the chart, they would be required to create a dataset. It's possible in the future that we may relax the restriction to save a dataset in the future.

1st PR for chart creation with a query is here: https://github.com/apache/superset/pull/19812/files

As part of SIP 68, we will be creating a mixin that contains all of the necessary functionality to power a chart. By extending that mixin to other models that have the necessary relationships (database, schema, columns) those models can also be used to power a chart.

We currently have two types of datasources in the config, SqlaTable (Dataset) and the Druid Datasource. If a chart connects to something, the proposal is that it should be a datasource. It follows in line with the methodology of what we are trying to achieve and doesn’t add in any complicated middle layers, and will be very extendable. With SIP 68 and Superset 2.0 we are in the process of removing Druid NoSQL Datasource and the datasource as a config and instead limiting the datasources to those classes that have the functionality needed to power a chart.

As part of SIP 68 there is also a PR to convert the ConnectorRegistry which uses the configs to a [DatasourceDAO](#19811). This DatasourceDAO will be used to retrieve any type of object that is configured to be a datasource.

Examples of specific work to be done per datasource type:

Charts by Tables:
- Import/export:
  - since a chart cannot be saved until it has a dataset, this is n/a for now
- Explore/Dashboard view:
  - When selecting a table as a datasource, we would create a sl_table instance and save it to the chart as a datasource. The sl_table would have all the column information needed to power the explore view.
  - On save, we just create the dataset to point to the already created Table.
- SQL Lab to explore:
  - This only applies to queries
Charts by Queries :
- Import/export:
  - since a chart cannot be saved until it has a dataset, this is n/a for now
- Explore/Dashboard view:
  - The Query will store column information needed to power the explore view in a new Columns column as a json blob. Since queries are immutable, we will only need to read this data, which is currently sent all at once to the client in bootstrap-data and then searched/filtered client side.
  - If it saves time/effort we are evaluating the possibility of not having cache for Queries.
- SQL Lab to explore:
  - A chart will be linked to a Query from this flow. This is the only way that someone can create a chart from a Query.
  - On save, we create a dataset and add the query as the expression
Charts by SavedQueries:
- Import/export:
  - since a chart cannot be saved until it has a dataset, this is n/a for now
- Explore/Dashboard view:
  - When selecting a SavedQuery as a datasource, we would tie that object to the chart. The SavedQuery would have a new relationship to Column (i.e., sl_columns) for all the column information needed to power the explore view.
  - On save, we create a dataset and add the query as the expression
- SQL Lab to explore:
  - n/a
Charts by Dataset:
- We need to update the old SqlaTable to a new Sl_dataset as part of SIP68. Everything else will be the same.

New or Changed Public Interfaces

New UI flows are described here:
#18584

New dependencies

None

Migration Plan and Compatibility

We will need to add a relationship to sl_columns for Queries and SavedQueries

Rejected Alternatives

Create a temporary dataset without explicitly asking the user to do anything

Pros: Simple for engineering, seamless, not much extra work.

Cons: Users will see a bloated list of datasets in their dataset crud view and won’t know what they are.

1b. Mark these datasets as hidden and don’t show them on the CRUD page.

Pros: Simple, easy to build. Users don’t see extra datasets.

Cons: It gets complicated to have two different types of datasets, especially now that we are cleaning up the virtual vs physical. Now we would have hidden and visible, but we’re saying that the chart is backed by a query table, when in reality it’s not
Create a dataset just during the request cycle

Pros: Doesn’t bloat the user’s CRUD list; There aren’t two types of datasets that we have to deal with

Cons: It’s also complicated to create a dataset each time and could slow down performance, especially if we have to query their database too often.
Request the column data from the db each time we need that information

Pros: We don’t need to store any extra data except on the client side.

Cons: Poor performance, and could incur extra cost to the user for db usage.
Make a lightweight dataset by storing just column data in redis

Pros: We don’t need to deal with any database models and/or database

Cons: We are adding a separate middleware to the models when we don’t need to. Plus we would need to write up all of the logic for storing/retrieving the data.

The text was updated successfully, but these errors were encountered:

simonvanderveldt · 2022-09-06T13:19:25Z

@eschutho This issue is marked as done, I see #19981 is merged (although I don't understand the relation to this SIP tbh) and available in 2.0.0. We're running 2.0.0, but I don't see a way to create a chart without creating a dataset. Just wanted to check if this is really done? Or maybe I am missing a setting somewhere?

eschutho · 2022-09-07T21:24:20Z

Hi @simonvanderveldt, charts by queries will be available in version 2.1 which is in the early stages of the release process now. There were a few breaking changes that went into 2.0 that were necessary in order for the charts by queries feature to be built. The charts by table and saved queries features are currently on hold while we work on some other features.

The SIP is marked as done as an indication that it was approved, but not necessarily that the work has been completed. So beginning 2.1 you should be able to go from SqlLab to explore without creating a dataset.

simonvanderveldt · 2022-09-07T22:14:23Z

@eschutho All clear, thanks for the clarifications! I'll keep an eye on the 2.1 release then :)

superset-github-bot bot added preset-io Superset-Community-Partners Preset community partner program participants labels May 4, 2022

eschutho changed the title ~~SIP- Chart creation without a dataset~~ DRAFT SIP- Chart creation without a dataset May 4, 2022

eschutho removed the Superset-Community-Partners Preset community partner program participants label May 4, 2022

eschutho changed the title ~~DRAFT SIP- Chart creation without a dataset~~ [SIP-81] - Chart creation without a dataset May 10, 2022

eschutho added the sip Superset Improvement Proposal label May 10, 2022

eschutho mentioned this issue May 17, 2022

feat!: pass datasource_type and datasource_id to form_data #19981

Merged

9 tasks

eschutho closed this as completed Jun 7, 2022

michael-s-molina mentioned this issue Jun 22, 2022

chore: Restructure explore redux state #20448

Merged

9 tasks

rusackas added this to SIPs (Superset Improvement Proposals) Dec 8, 2022

rusackas moved this to IMPLEMENTED / DONE in SIPs (Superset Improvement Proposals) Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-81] - Chart creation without a dataset #19953

[SIP-81] - Chart creation without a dataset #19953

eschutho commented May 4, 2022 •

edited

Loading

simonvanderveldt commented Sep 6, 2022 •

edited

Loading

eschutho commented Sep 7, 2022

simonvanderveldt commented Sep 7, 2022

[SIP-81] - Chart creation without a dataset #19953

[SIP-81] - Chart creation without a dataset #19953

Comments

eschutho commented May 4, 2022 • edited Loading

[SIP-81] Proposal for Chart creation without a dataset

Motivation

Proposed Change

Examples of specific work to be done per datasource type:

New or Changed Public Interfaces

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

simonvanderveldt commented Sep 6, 2022 • edited Loading

eschutho commented Sep 7, 2022

simonvanderveldt commented Sep 7, 2022

eschutho commented May 4, 2022 •

edited

Loading

simonvanderveldt commented Sep 6, 2022 •

edited

Loading