-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Collection Operation to Relabelling Identifiers #3603
Implement Collection Operation to Relabelling Identifiers #3603
Conversation
The last point could be solved with a tool that just dumps the element identifiers (that wouldn't necessarily need to be shipped with galaxy), and then you're free to use all the fantastic column/ regex tools in galaxy land to build a 2 column list (if we make this tool accept a 2 column list ...) . |
@jmchilton thanks again for making this first implementation, it looks really useful. I've patched our local Galaxy and the tool works as advertised for me. A couple of observations/suggestions from initial usage:
HTH. Thanks again! |
If you click on the collection in the history panel, it will show the list of the collection datasets. There you can rename the collection by clicking on its name. |
@nsoranzo thanks! I never noticed that before (I was looking for something like the "pencil" icon on regular datasets). |
@pjbriggs @jmchilton I've opened a PR with a simple implementation of a 2-column list rename here |
</element> | ||
</collection> | ||
</param> | ||
<param name="labels" value="new_labels_2.txt" ftype="txt" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftype needs to be tabular, that's why the test failed :(.
I think it needs to be touched up but the basic operation seems to work so far. I think what remains to be done is: - Validate uniqueness of identifiers and provide nice messages if they are not unique. - Validate that at least the required number of lines are present in the file and provide a nice message if not. - Add strict mode to ensure exactly the correct number of lines is added. - Find where validation of identifiers happens in the API and apply same validation here - try not to let unsafe identifiers be created. - Consider more advanced modes - selecting a column, apply a regex replace, pick two columns for nested lists, etc.... None of this may need to be needed in the first iteration. - Consider another mode where a collection is labelled against an existing collection - should that be a separate tool of the same tool.
- Better error handling (check for bad characters when creating collections). - Implement a strict mode parameter to do even more validation. - Rework tabular vs txt mode to be explicit user choice. - Mirror fix in release_17.01 for datasets not having a history,
0057dc6
to
09bcdd9
Compare
A couple more fixes rebased into 09bcdd9 so I am pulling this out of WIP. @mvdbeek Your mapping approach is pretty cool and it makes me think the two missing collections - "group by" and "filter" could easily be implemented by inspecting a table this way. I'd say someday we would still want a "filter by expression" and "group by expression" tools - just like we'd like to have "relabel by expression" - but these should exist beside "relabel by file", "group by file", and "filter by file" tools. I'll try to get something ready for 17.05 - it might be a stretch though. |
if how_type == "tabular": | ||
# We have a tabular file, where the first column is an existing element identifier, | ||
# and the second column is the new element identifier. | ||
source_new_label = (line.strip().split('\t') for line in new_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use strip('\r\n')
here? See the discussion here: galaxyproject/tools-iuc#1233 (comment)
That's better, I agree.
Is that for the "filter by expression" and "group by expression" tools ? I think this PR looks good to me, from your bullet-points I think 1,2 and 3 are addressed, right? |
No - I meant I hope to open the "from file" variants this week. The more I think about it - the more I like them. The expression stuff will come in a future release.
Now those bullet points are indeed - I added one more test case and fix to cover duplicate identifiers in particular. I think the only bullet not addressed now is "Consider another mode where a collection is labelled against an existing collection - should that be a separate tool of the same tool." Which I am happy to push off for now. |
... against supplied labels from an input dataset of type txt.
I think it needs to be touched up but the basic operation seems to work so far. I think what remains to be done is:
Consider another mode where a collection is labelled against an existing collection - should that be a separate tool of the same tool.A collection operation variant of the tool from @pjbriggs at https://github.com/pjbriggs/Amplicon_analysis-galaxy/blob/77340d8bb2470a646deba4933625413fc70985d1/relabel_samples.xml.