Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzipped FASTA input support #102

Merged
merged 12 commits into from
Jun 30, 2018
Merged

gzipped FASTA input support #102

merged 12 commits into from
Jun 30, 2018

Conversation

nekrut
Copy link

@nekrut nekrut commented May 20, 2018

No description provided.

@peterjc
Copy link
Owner

peterjc commented May 21, 2018

Cross reference #98, I will merge that now.

@peterjc
Copy link
Owner

peterjc commented May 21, 2018

This includes support for gzip'ed FASTA input - see #101

@peterjc
Copy link
Owner

peterjc commented May 21, 2018

Judging from the cron jobs, the failing TravisCI tests on the master are in part upstream changes and/or other issues with the dependencies stack. Sigh.

@nekrut
Copy link
Author

nekrut commented May 21, 2018

@peterjc please take a look at makeblastdb <command> section to see if my changes are reasonable.

@peterjc
Copy link
Owner

peterjc commented May 21, 2018

@nekrut I'm hoping to pull in your changes by hand (keeping authorship intact), and ought to be able to test locally even if TravisCI is giving me trouble. Thanks for working on this.

See also galaxyproject/galaxy#6180 which is one of the issues at play.

@nekrut
Copy link
Author

nekrut commented May 21, 2018

@peterjc I don't care about the authorship as long as things work, so do whatever is easiest

@peterjc
Copy link
Owner

peterjc commented May 21, 2018

Maintaining git authorship is simple enough, I just need to get my master branch tests working again...

#103 seems to have fixed testing with the Galaxy dev branch :)

peterjc added a commit to nekrut/galaxy_blast that referenced this pull request May 24, 2018
peterjc added a commit to nekrut/galaxy_blast that referenced this pull request May 24, 2018
@peterjc
Copy link
Owner

peterjc commented May 24, 2018

Rebased, squashed a few commits, removed the change to the version scheme, and updated the README.

Let's see if TravisCI is happy, then think about adding tests for building databases from gzipped FASTA files. I note that as written this adds additional dependencies on the magic library...

@peterjc
Copy link
Owner

peterjc commented May 25, 2018

The TIGRFAM URL seems to be causing more trouble, see also https://twitter.com/pjacock/status/998941086168076288 and d31e21d

Anton Nekrutenko and others added 9 commits May 28, 2018 12:15
Also wrap tokens in Cheetah with braces, and
more quoting of arguments in the command line.
This is to simplify deployment, especially on older
Galaxy installations which are not yet using conda.

This was tested under both Python 2 and 3.
This is several command line arguments in one token.
@peterjc
Copy link
Owner

peterjc commented May 28, 2018

Still a couple of tests failing,

Exception: Request to upload dataset failed [{"err_data": {"file_type": "An invalid option was selected for file_type, u'fasta.gz', please verify."}, "traceback": "Traceback (most recent call last):\n  File \"/home/travis/build/peterjc/galaxy_blast/galaxy-master/lib/galaxy/web/framework/decorators.py\", line 281, in decorator\n    rval = func(self, trans, *args, **kwargs)\n  File \"/home/travis/build/peterjc/galaxy_blast/galaxy-master/lib/galaxy/webapps/galaxy/api/tools.py\", line 415, in create\n    return self._create(trans, payload, **kwd)\n  File \"/home/travis/build/peterjc/galaxy_blast/galaxy-master/lib/galaxy/webapps/galaxy/api/tools.py\", line 473, in _create\n    vars = tool.handle_input(trans, incoming, history=target_history, use_cached_job=use_cached_job)\n  File \"/home/travis/build/peterjc/galaxy_blast/galaxy-master/lib/galaxy/tools/__init__.py\", line 1370, in handle_input\n    raise exceptions.MessageException(', '.join(msg for msg in err_data.values()), err_data=err_data)\nMessageException: An invalid option was selected for file_type, u'fasta.gz', please verify.\n", "err_msg": "An invalid option was selected for file_type, u'fasta.gz', please verify.", "err_code": 0}]

We need to update .travis.datatypes_conf.xml (a quirk of my TravisCI setup from testing datatypes which were outside the Galaxy core), I'll do that next...

@peterjc
Copy link
Owner

peterjc commented May 28, 2018

That took a while - does this look OK now to merge from your point of view @nekrut? Thanks!

@peterjc peterjc changed the title Blast update WIP gzipped FASTA input support Jun 5, 2018
@peterjc
Copy link
Owner

peterjc commented Jun 5, 2018

If I don't hear otherwise this week, I aim to merge this.

The main gotcha (which the test suite caught) was reverting some of the quoting of arguments in the wrapper (normally good practise in Galaxy wrappers, but here can't be used in a couple of cases where a UI option is mapped to multiple command line switches).

@ADV_FILTER_QUERY@
@ADV_MAX_HITS@
@ADV_WORD_SIZE@
#if (str($adv_opts.identity_cutoff) and float(str($adv_opts.identity_cutoff)) > 0 ):
-perc_identity $adv_opts.identity_cutoff
-perc_identity '${adv_opts.identity_cutoff}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need quotes here, identity_cutoff is a float.

#if str($adv_opts.matrix_gapcosts.matrix):
-matrix $adv_opts.matrix_gapcosts.matrix
$adv_opts.matrix_gapcosts.gap_costs
-matrix '${adv_opts.matrix_gapcosts.matrix}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

-qcov_hsp_perc $adv_opts.qcov_hsp_perc
<token name="@ADV_QCOV_HSP_PERC@"><![CDATA[
#if float(str($adv_opts.qcov_hsp_perc)) > 0:
-qcov_hsp_perc '${adv_opts.qcov_hsp_perc}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string
## Note -max_target_seqs used to simply override -num_descriptions and -num_alignments
## but this was changed in BLAST+ 2.2.27 onwards to force their use (raised with NCBI)
#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):
#if str($output.out_format) in ["6", "ext", "cols", "5"]:
## Most output formats use this, including tabular and XML:
-max_target_seqs $adv_opts.max_hits
-max_target_seqs '${adv_opts.max_hits}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

#else
## Text and HTML output formats 0-4 currently need this instead:
-num_descriptions $adv_opts.max_hits -num_alignments $adv_opts.max_hits
#end if
#end if
#if str($adv_opts.max_hsps)
-max_hsps $adv_opts.max_hsps
-max_hsps '${adv_opts.max_hsps}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

#if str($adv_opts.comp_based_stats):
-comp_based_stats $adv_opts.comp_based_stats
-comp_based_stats '${adv_opts.comp_based_stats}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

#if str($adv_opts.gapopen):
-gapopen $adv_opts.gapopen
-gapopen '${adv_opts.gapopen}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

#if str($adv_opts.gapextend):
-gapextend $adv_opts.gapextend
-gapextend '${adv_opts.gapextend}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

#if str($adv_opts.matrix):
-matrix $adv_opts.matrix
-matrix '${adv_opts.matrix}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotes not needed.

@@ -46,8 +55,8 @@ $hash_index
#end if
## --------------------------------------------------------------------
## Capture the stdout log information to the primary file (plain text):
&gt; "$outfile"
</command>
> "$outfile"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use single quotes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch on the outfile quotes.

@peterjc
Copy link
Owner

peterjc commented Jun 5, 2018

@nsoranzo You flagged a lot of the quoting Anton added as unnecessary - does the IUC have any thing more explicit to say on this beyond:

All Cheetah variables for text parameters, input and output files must be single-quoted, e.g. '${var_name}'.

https://galaxy-iuc-standards.readthedocs.io/en/latest/best_practices/tool_xml.html

@nsoranzo
Copy link
Contributor

nsoranzo commented Jun 5, 2018

Not really, it's probably more of a de facto standard. I guess using quotes only when needed makes the final command shorter/cleaner and may help to spot errors.

Hat tip Nicola Soranzo during pull request review.
@peterjc peterjc merged commit 6786936 into peterjc:master Jun 30, 2018
@peterjc
Copy link
Owner

peterjc commented Jun 30, 2018

Thanks @nekrut - merged during the GCCBOSC 2018 CollaborationFest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants