-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archiving Histories #14771
Comments
Thanks @davelopez for writing this up, very appreciated! If we have a history that we export in some way, I think we need a dataset state that is indicating "archived" in some way. We do not want to show datasets to be "deleted", even if they are - they are archived in this case. Not sure how the "deferred" concept comes in here and if those archived datasets should be turned into "deferred" ones referencing the exported archive? One other thought that is tricky is if we have a frozen history and we export it, can we change the dataset state to "archived" or "deleted"? Can frozen datasets be deleted? |
A couple random considerations:
|
any dataset modifications, datatype, name, tags, etc. No modifications period. |
Yeah, dealing with the contents of the archived history needs a bit more thinking... I was hoping to use only the "archived/frozen" state of the history to display a "virtual" state on the contents. In other words, the dataset can be deleted or in any other state, but as long as it is part of an archived history it will be displayed as "Archived" in the UI regardless of the real internal state. The idea sounded simple enough for a start, but we probably need to set the "frozen" or "archived" state also at the dataset level to prevent any further mutations.
That would be an interesting approach, but I'm not sure if the concept is the same or if we can effectively address a dataset inside of a remote package. In any case, this will likely only "work" with published histories. Worth thinking a bit about it though.
Yep, another tricky situation, I guess if we can combine the Archive/Export/Freeze in one "kind of atomical" process in the Storage Dashboard, we can allow controlled mutations in that particular case by unfreezing if the history is already frozen, then archive/export, and then freeze again. But yeah... lots of tricky scenarios... 😵💫
Sure, the idea is to associate and keep track of all exports of a particular history (whether the export implies publishing or not). The "Archiving" state can just associate one of the exports for recovery purposes when the contents need to be removed from Galaxy.
In this context, "published" means the history has been packaged and stored in a remote public repository. So I don't think "unpublishing" is an option... But I haven't thought about it yet 🤔 |
Does it need to stay accessible in Galaxy, tracked in the DB? == Edit == I really think that's the way to go:
So you're down to
as ro-crate is done #14595 |
That's on the history level, I wouldn't make it so complicated on a first pass. If the history indicates it's archived that should be good enough IMO. A simple thing is to just transform the state on the client.
👍 it's a very cool idea and I think this might be possible, but I wouldn't bother with this in a first pass. That's something we can look into later and that wouldn't affect decisions we'd have to make now. |
So, if I understand it correctly, when the user selects "Archive history" at some point:
This is certainly easier 😆 |
I think this wouldn't hurt to have, and it wouldn't be a huge implementation burden right? Everything else, yeah, YAGNI :) |
As a user, that's all I want. I just want to backup histories that are important to S3, and publish even more important ones to e.g. zenodo. Everything else, ok, it was never a desired feature for me. For the use case of my old boss, which prompted #1734, this would have solved his problem perfectly. He had old analyses he wanted to keep the results of (maybe because he wanted to look at them randomly) but they could easily be on cheaper storage. His issue would be solved by this simplified version For the user story of my medical university coworkers, we want to archive these results because they can be relevant later, but generally they're not looked at. Packing it into an RO Crate and throwing it in the cheap storage is fine there too. For the user story of publishing a history for e.g. a paper, I think it's probably a great test of our export/import system if it must go to the location of publication (e.g. zenodo) and then can be re-imported there or other servers to function as a demo. |
My impression was that we can not purge stuff from Galaxys database. The entries always stay, unfortunately - I would love to really throw everything away from the COVID histories :) So if the entries stay I think there is a lot of value to render an exported history nicely and show the provenance make tools reloadable, linking to the workflow, and make it in general re-usable. An exported history is not easily inspectable for the foreseeable future and we always need to import it again. This easily can create more data than we wish to have? I just imagine a link from a paper to the doi and it directly you to the data but also to the Galaxy rendered view. |
yes, of course. But it solves the immediate problem of "editing" histories without a new feature, and removes the data (which is the bigger problem for most non-covid users) The covid histories will always need an other solution right? Even if you have this, you're still going to have to write your own purging right to remove metadata?
For me, my imagination is like a DOI to WFHub, that there's a button "import in Galaxy". Or that you paste the DOI in Galaxy and can import and explore it. |
I think metadata and individual files should be retrievable quite easily, I wouldn't say this needs to be far into the future. If you have the resources I'd have a frontend person look at what data needs to be in a RO-crate manifest in order for us the render a read-only history preview of an export without ingesting everything. Or alternatively ask the RO-crate people to develop a frontend component we can reuse to list contents. |
So if you export you always want to mark history as deleted? I think archiving, exporting, and freezing are all different concepts. They can go together but should not strictly. I know that a few people want to freeze a history for example without exporting.
And we are discussing this for a year or so ;) |
I guess that's the main point. We can work on these things separately, rather than as on unified concept since they address different needs. To solve the archive/export case, we can do that very simply today, just need to make it easier for users (and help them remember where they sent the history maybe via a HistoryExportAssociation). To solve the freezing case, we need pretty much everything @davelopez mentioned above. Freezing addresses a completely different concept, good to know there is a real use case for it though. When I filed the initial issue I surely conflated these terms, but what I and my users have always needed is "export to an archive + delete". Could you clarify the difference between archiving and exporting? I see those as essentially the same function.
could be optional, but it's useful for most people. (Deleted after the export/archive is successful.) |
I've run into this same issue, the terms would benefit from some clarification. I think it would also be beneficial to compare and align these terms with other common apps like gmail, trello etc. "Archive" caused the most confusion for me personally while following this discussion. It seems better to stick with what it would generally mean elsewhere - as a means of hiding unused items, while export would be used for moving data out of Galaxy into an external medium. The term freezing sounds like something that could potentially be replaced by read-only? |
I think #16003 closes this |
After discussing a bit more about this long-standing feature request a couple of weeks ago during the backend working group meeting (slides) here are a couple of use cases and a high-level break down of the tasks.
Use Case: Archive a history
The most basic use case will be just "marking" a history as archived. This action will
freeze
the history by restricting any further mutation.Example of mutation operations:
An archived history will not be displayed when listing histories by default, but there should be a filter option to show them on demand or a dedicated view.
Those histories will display a badge or some clear indication of their state with possibly additional information on how to restore them.
The Storage Dashboard can detect these histories and suggest exporting them to external storage (see the following use cases) in order to free some space in your Galaxy.
Use Case: Package and Publish an Archived History
After a history is archived, the user can recover its storage space. One way is by publishing the contents to a remote repository.
In this scenario, the users should be able to configure the API credentials in the settings to connect to an external DOI repository (like Zenodo, InvenioRDM, etc.). Then the history and its contents can be packaged into a structured container like RO-Crate and published to the desired external repository.
Galaxy will track the DOI returned by the repository and associate it with the archived history so it can be restored later (as a new copy). The DOI publication also enables other users to import the history for reproducibility.
The packaging and publishing steps should be easy to do by using the Storage Dashboard.
Use Case: Package and Export an Archived History
This scenario replaces the publishing step by exporting the package to any private storage or remote file source.
After exporting the packaged history, the export destination is associated with the history (similar to the DOI case above) and used to recover it in the future. However, the user is responsible for maintaining the exported package.
Implementation Tasks
trackable
Add task-based history export tracking #14839Bonus
freezable
frozen
boolean column to the History model and restricting mutations based on it.Please feel free to add your ideas or concerns in the comments :)
xref #1734, #3088
The text was updated successfully, but these errors were encountered: