Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-83] Proposal for copying virtual datasets #20232

Closed
reesercollins opened this issue Jun 1, 2022 · 15 comments
Closed

[SIP-83] Proposal for copying virtual datasets #20232

reesercollins opened this issue Jun 1, 2022 · 15 comments
Labels
sip Superset Improvement Proposal

Comments

@reesercollins
Copy link
Contributor

reesercollins commented Jun 1, 2022

[SIP-83] Proposal for copying virtual datasets

Motivation

There is currently no way for users to copy datasets in superset. Even though they can copy the SQL of a dataset, any changes we make to the field types (e.g. in cases where we turn long, complex types into JSON, or advanced types as described in #17852), to column labels, etc. cannot be easily copied.

Proposed Change

Add a duplicate action to virtual datasets within the dataset list:
image

When clicked, it will open a modal which will allow the user to choose a name for the new dataset:
image

When OK is clicked, a duplicate of the dataset is created with the chosen name.

Here it is in action:

2022-06-01.15-40-22.mp4

See this pull request for a working demo.

New dependencies

None.

Migration Plan and Compatibility

No migration needed.

Rejected Alternatives

No alternatives have been considered at this time.

@reesercollins reesercollins changed the title [SIP-82] Proposal for copying virtual datasets [SIP-83] Proposal for copying virtual datasets Jun 1, 2022
@rusackas
Copy link
Member

@reesercollins Thanks for the proposal and the demo. Could you please post this like (or both links) to the dev@superset.apache.org email list for a proper discussion? If you need help navigating those waters, just say the word and I'll be happy to assist.

@rusackas rusackas added the sip Superset Improvement Proposal label Jun 13, 2022
@eschutho
Copy link
Member

Thank you for the SIP! @yousoph. Do you have any concerns with having the copy dataset only for virtual datasets? I wonder how this will look after SIP68 work is complete where there will no longer be a column for virtual or physical?

@mistercrunch
Copy link
Member

Seems like there's a use case for "sibling" tables too, where say you have the same table in prod/staging/dev and you wan to sync the dataset definitions

@yousoph
Copy link
Member

yousoph commented Jun 29, 2022

Thank you for the SIP! @yousoph. Do you have any concerns with having the copy dataset only for virtual datasets? I wonder how this will look after SIP68 work is complete where there will no longer be a column for virtual or physical?

I think it's okay to have the copy only for virtual datasets (especially since there's a limitation on the physical dataset not being able to point to the same db table) but agreed that it might be more confusing later on if virtual/physical aren't displayed as a column in the list view.

@jess-dillard do you think physical datasets that can't be copied need some explanation, like a disabled duplicate icon with some tooltip text?

@ktmud
Copy link
Member

ktmud commented Jun 29, 2022

I think it'd be valuable if there is a copy virtual dataset flow in Explore, too.

@zuzana-vej
Copy link
Contributor

Definitely +1 for NOT enabling this for physical datasets, only virtual. In terms of virtual datasets, one small concern I have is with the recent changes on ownership of datasets, we would love to prevent cases where users copy dataset, just because they can no longer edit it. Example: Jane D. build dataset and left the company. John Doe wants to edit the dataset, but because he doesn't have access he just goes and copies it, and perhaps changes the chart to use the new dataset. Same scenario can happen in a simpler case when John just doesn't know he needs to ask Jane to be co-onwer to have edit access. As a result, the old dataset is no longer used, although it's not deleted, and it can just cause clutter and confusion.

What I am proposing is to make he UX clear in a way that users can decide whether they should edit or copy, and if only thing preventing them from editing it is the fact they are not co-owners, it should be clear to them how to get themselves on the co-owner list (e.g. ask current owner or admins)

@nytai
Copy link
Member

nytai commented Jun 29, 2022

At some point I think it would be good to add a clone dataset for physical dataset, but give the user the option to choose a new table for the dataset. It is quite common to have the exact same schema for datasets (eg, dev, test, prod data). So far I've been suggesting users use the export dataset feature to change the table and the UUID and reimport as a workaround.

@reesercollins
Copy link
Contributor Author

reesercollins commented Jun 29, 2022

At some point I think it would be good to add a clone dataset for physical dataset, but give the user the option to choose a new table for the dataset. It is quite common to have the exact same schema for datasets (eg, dev, test, prod data). So far I've been suggesting users use the export dataset feature to change the table and the UUID and reimport as a workaround.

The duplication of datasets only occurs within a single instance of Superset, not across deployments. Export/Import is still the only way to move data between deployments.

EDIT: I think I misunderstood your statement. I guess that could be possible.

@ghost
Copy link

ghost commented Jun 29, 2022

Definitely agree with @zuzana-vej and would add only those with dataset edit rights should be able to duplicate to avoid the situation described.

@yousoph Yes - once we combine we'll want that to be the functionality. So we can add in now or address once we get into those dataset CRUD changes.

Small request @reesercollins - Can we change the primary button label from "Ok" to either "Duplicate" or "Create"? Typically we only use "Ok" as a button label if no action was taken in the modal.

@reesercollins
Copy link
Contributor Author

Definitely agree with @zuzana-vej and would add only those with dataset edit rights should be able to duplicate to avoid the situation described.

Currently there is a permission can_duplicate. If someone is concerned with the described situation occurring, they can assign can_duplicate and can_edit together.

@eschutho
Copy link
Member

eschutho commented Jul 6, 2022

At some point I think it would be good to add a clone dataset for physical dataset, but give the user the option to choose a new table for the dataset. It is quite common to have the exact same schema for datasets (eg, dev, test, prod data). So far I've been suggesting users use the export dataset feature to change the table and the UUID and reimport as a workaround.

I think @nytai's idea could be the solution for how to have a duplicate button for both datasets when they get merged together when SIP68 work is done.

@rusackas
Copy link
Member

rusackas commented Oct 2, 2023

Closing this as the vote passed. @reesercollins @cccs-rc is there a timeline on which this might get implemented?

@rusackas rusackas closed this as completed Oct 2, 2023
@rusackas
Copy link
Member

rusackas commented Apr 3, 2024

This SIP is at risk of being considered discarded due to lack of activity, if nobody has plans to carry it forward in the near future. Please let us know. You can always revisit/reopen.

@cccs-rc
Copy link
Contributor

cccs-rc commented Apr 3, 2024

Sorry for missing your previous comment, @rusackas! We seem to have gotten this merged back in August 2022. Here's the PR: #20309

There may be a couple bugs hiding in there that are in our backlog to fix, but the bulk of it seems to have been merged upstream. :)

@rusackas
Copy link
Member

rusackas commented Apr 3, 2024

Thanks for the reminder/clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

10 participants