-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-81] - Chart creation without a dataset #19953
Comments
@eschutho This issue is marked as done, I see #19981 is merged (although I don't understand the relation to this SIP tbh) and available in 2.0.0. We're running 2.0.0, but I don't see a way to create a chart without creating a dataset. Just wanted to check if this is really done? Or maybe I am missing a setting somewhere? |
Hi @simonvanderveldt, charts by queries will be available in version 2.1 which is in the early stages of the release process now. There were a few breaking changes that went into 2.0 that were necessary in order for the charts by queries feature to be built. The charts by table and saved queries features are currently on hold while we work on some other features. The SIP is marked as done as an indication that it was approved, but not necessarily that the work has been completed. So beginning 2.1 you should be able to go from SqlLab to explore without creating a dataset. |
@eschutho All clear, thanks for the clarifications! I'll keep an eye on the 2.1 release then :) |
[SIP-81] Proposal for Chart creation without a dataset
Motivation
Currently a user needs to create a dataset for each chart that they want to create. Many times these charts aren’t kept for long, usually either never making it to a dashboard or because someone just wants a quick view of their data to share for feedback or to gain insight into their own queries, tables, etc. A lot of new users don’t understand what a dataset is or why they need it. We want to allow people to progressively move into dataset usage, and allow them to create a chart quickly based on either a query, saved query, table or dataset. When they save we will prompt them to name a dataset, which will be a much lower barrier to visualizing their data quickly.
Proposed Change
Users should be able to create a chart from the chart page, from sql lab, or from a dataset. From explore or SQL Lab, they need to be able to view a chart, apply filters, see a list of columns in their query or table just as they do now, but without creating a dataset. If coming from a dataset view, they should be able to continue to use a dataset to back a chart as they can currently.
This solution is based on the recently approved flow: #18584 Per this flow, users will be able to create a chart from any of the above listed data types. When saving the chart, they would be required to create a dataset. It's possible in the future that we may relax the restriction to save a dataset in the future.
1st PR for chart creation with a query is here: https://github.com/apache/superset/pull/19812/files
As part of SIP 68, we will be creating a mixin that contains all of the necessary functionality to power a chart. By extending that mixin to other models that have the necessary relationships (database, schema, columns) those models can also be used to power a chart.
We currently have two types of datasources in the config, SqlaTable (Dataset) and the Druid Datasource. If a chart connects to something, the proposal is that it should be a datasource. It follows in line with the methodology of what we are trying to achieve and doesn’t add in any complicated middle layers, and will be very extendable. With SIP 68 and Superset 2.0 we are in the process of removing Druid NoSQL Datasource and the datasource as a config and instead limiting the datasources to those classes that have the functionality needed to power a chart.
As part of SIP 68 there is also a PR to convert the
ConnectorRegistry
which uses the configs to a [DatasourceDAO](#19811). This DatasourceDAO will be used to retrieve any type of object that is configured to be a datasource.Examples of specific work to be done per datasource type:
sl_table
instance and save it to the chart as a datasource. Thesl_table
would have all the column information needed to power the explore view.Queries
.Query
from this flow. This is the only way that someone can create a chart from aQuery
.expression
Column
(i.e.,sl_columns
) for all the column information needed to power the explore view.expression
SqlaTable
to a newSl_dataset
as part of SIP68. Everything else will be the same.New or Changed Public Interfaces
New UI flows are described here:
#18584
New dependencies
None
Migration Plan and Compatibility
We will need to add a relationship to
sl_columns
for Queries and SavedQueriesRejected Alternatives
Create a temporary dataset without explicitly asking the user to do anything
Pros: Simple for engineering, seamless, not much extra work.
Cons: Users will see a bloated list of datasets in their dataset crud view and won’t know what they are.
1b. Mark these datasets as hidden and don’t show them on the CRUD page.
Pros: Simple, easy to build. Users don’t see extra datasets.
Cons: It gets complicated to have two different types of datasets, especially now that we are cleaning up the virtual vs physical. Now we would have hidden and visible, but we’re saying that the chart is backed by a query table, when in reality it’s not
Create a dataset just during the request cycle
Pros: Doesn’t bloat the user’s CRUD list; There aren’t two types of datasets that we have to deal with
Cons: It’s also complicated to create a dataset each time and could slow down performance, especially if we have to query their database too often.
Request the column data from the db each time we need that information
Pros: We don’t need to store any extra data except on the client side.
Cons: Poor performance, and could incur extra cost to the user for db usage.
Make a lightweight dataset by storing just column data in redis
Pros: We don’t need to deal with any database models and/or database
Cons: We are adding a separate middleware to the models when we don’t need to. Plus we would need to write up all of the logic for storing/retrieving the data.
The text was updated successfully, but these errors were encountered: