Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating tools/pangolin from version 3.1.20 to 4.0.5 #4494

Merged
merged 9 commits into from
Apr 19, 2022

Conversation

planemo-autoupdate
Copy link
Contributor

Hello! This is an automated update of the following tool: tools/pangolin. I created this PR because I think the tool's main dependency is out of date, i.e. there is a newer version available through conda.

I have updated tools/pangolin from version 3.1.20 to 4.0.4.

Project home page: https://github.com/cov-lineages/pangolin/releases

@wm75
Copy link
Contributor

wm75 commented Apr 11, 2022

That's an important update. I guess @pvanheus or me will take care of it. Whoever has some time first :)

Prevents also the use of incompatible cached data from previous versions.
Adds a version command.
@wm75
Copy link
Contributor

wm75 commented Apr 12, 2022

Here's my preliminary work on this. Makes the tool run again at least and updates the documentation.
Tests and the optional header line still need to be fixed (maybe more).

@wm75
Copy link
Contributor

wm75 commented Apr 13, 2022

The test in usher mode is passing now. pangolearn mode (all the failing tests) has some issue, but debugging this from the error alone is prevented by cov-lineages/pangolin#419.
Also cannot inspect local files cause pangolin in pangolearn mode gets killed by my OS OOM protection.

Hopefully this gives us a clearer error message.
Also tries to be explicit about $TMPDIR.
@wm75 wm75 changed the title Updating tools/pangolin from version 3.1.20 to 4.0.4 Updating tools/pangolin from version 3.1.20 to 4.0.5 Apr 13, 2022
@wm75
Copy link
Contributor

wm75 commented Apr 13, 2022

After updating the misleading tempdir message has gone away, but no other error message gets revealed :(
After rebooting my own machine, it survives the pangolin run, but all tests are passing then.
Mysterious.

@bgruening
Copy link
Member

Job in error state.. tool_id: pangolin, exit_code: 1, stderr: �[36mWarning: Ignoring specified datadir /tmp/tmp3pcvfcra/job_working_directory/000/2/working/datadir - could not find init.py file to check versions

That looks like the error.

@wm75
Copy link
Contributor

wm75 commented Apr 13, 2022

No, that you're getting with the --update-data option (because the datadir gets checked before its populated) and if the data was already up to date (because in that case --update-data is not doing anything, but I checked locally and in both cases things work afterwards (also you're getting the same warnings in usher mode).

@wm75
Copy link
Contributor

wm75 commented Apr 13, 2022

So to focus on the important bits:

mkdir datadir && pangolin --update-data --datadir datadir && pangolin --threads ${GALAXY_SLOTS:-1} --tempdir "${TMPDIR:-.}" --datadir datadir --analysis-mode pangolearn --alignment --alignment-file '/tmp/tmp3pcvfcra/files/1/d/9/dataset_1d9bc2f0-cfde-4d3c-8b9e-4cdd7a6f3563.dat' --outfile report.csv --max-ambig 0.5 --min-length 10000 '/tmp/tmp3pcvfcra/files/3/2/f/dataset_32f305da-3f5a-46fd-bf5b-65d6171c8bf8.dat'

is failing with this stderr:

[36mWarning: Ignoring specified datadir /tmp/tmp3pcvfcra/job_working_directory/000/6/working/datadir - could not find __init__.py file to check versions 
�[0mpangolin-data already latest release (v1.3)
constellations already latest release (v0.1.6)
�[36mWarning: Ignoring specified datadir /tmp/tmp3pcvfcra/job_working_directory/000/6/working/datadir - could not find __init__.py file to check versions 
�[0mJob stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
align_to_reference           1              1              1
all                          1              1              1
cache_sequence_assign        1              1              1
create_seq_hash              1              1              1
get_constellations           1              1              1
merged_info                  1              1              1
scorpio                      1              1              1
sequence_qc                  1              1              1
total                        8              1              1

Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
all                      1              1              1
pangolearn               1              1              1
pangolearn_output        1              1              1
total                    3              1              1

Exiting because a job execution failed. Look above for error message

Unfortunately without any sign (at least for me) of an error message.

stdout is:

Latest for pangolin-data is v1.3
Latest for constellations is v0.1.6
�[32m****
Query sequences collapsed from �[0m1�[32m to �[0m1�[32m unique sequences.�[0m
�[32m****
�[0m0�[32m sequences assigned via designations.�[0m
�[32m****
Running sequence QC�[0m
�[32mTotal passing QC: �[0m1
�[32m****
Pangolin running in pangolearn mode.
****�[0m
�[36mWarning: pangoLEARN mode may use a significant amount of RAM, be aware that it will not suit every system.�[0m
Converting minimum length of 10000.0 to maximum ambiguity of 0.666.
�[32mMaximum ambiguity allowed is 0.5.
****�[0m
�[32mQuery file:	�[0m/tmp/tmp3pcvfcra/files/3/2/f/dataset_32f305da-3f5a-46fd-bf5b-65d6171c8bf8.dat
�[32m****
Data files found:�[0m
plearn_model:	/usr/local/lib/python3.8/site-packages/pangolin_data/data/randomForest_v1.joblib
plearn_header:	/usr/local/lib/python3.8/site-packages/pangolin_data/data/randomForestHeaders_v1.joblib
�[32m****�[0m

but again that's not very revealing.

@wm75
Copy link
Contributor

wm75 commented Apr 13, 2022

I got some helpful hints over in cov-lineages/pangolin#433 and cov-lineages/pangolin#395 has more information.
So it seems we cannot test the pangolearn branch right now.
Another consideration might be this: if the pangolearn mode uses that much more memory than usher mode, Galaxy admins would have to give more memory to the tool just because someone might run it not in the default mode. Do we want this?
The alternative would be to split the tool into two.

@bgruening
Copy link
Member

Another consideration might be this: if the pangolearn mode uses that much more memory than usher mode, Galaxy admins would have to give more memory to the tool just because someone might run it not in the default mode. Do we want this?

We can control memory requirements based on parameters. Its not super easy, but possible.

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style-wise the outputs are locking strange, but overall this looks good.

I think we need to play with the tool and estimate the memory consumption in this one mode.

@wm75
Copy link
Contributor

wm75 commented Apr 14, 2022

@bgruening strange because of the conditional actions you mean?
Have never used those myself before but apparently they work just as described in the tool xml syntax docs.

I agree that we should maybe just gather some experience with the new tool. If the pangolearn mode causes problems we can still decide to split it out into a separate tool.

@wm75
Copy link
Contributor

wm75 commented Apr 14, 2022

If you are talking about the strange control chars on stdout/stderr - those come from https://github.com/cov-lineages/pangolin/blob/f2f30b07059e1c959ad943ce3e4aa4704a65079b/pangolin/utils/custom_logger.py.
If you have any idea how we could control that from outside, that would be great.

@bgruening
Copy link
Member

Oh sorry, I was not clear. The XML, you broke the lines very strange. But maybe thats the new editor :)

Feel free to merge.

@wm75
Copy link
Contributor

wm75 commented Apr 14, 2022

It's just my old-fashioned preference for being able to read things at an 80 chars width without much line wrapping :)

https://galaxy-iuc-standards.readthedocs.io/en/latest/best_practices/tool_xml.html#coding-style is somewhat ambiguous here, but not sold to anything anyway.

@wm75 wm75 merged commit e9ddcd4 into galaxyproject:master Apr 19, 2022
@planemo-autoupdate planemo-autoupdate deleted the tools/pangolin branch April 25, 2022 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants