-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] pe correction
, low_complexity_filter
& abund_cutoff
as config params
#107
Conversation
These are just suggestions to speed up the trimming step as it seems to be very slow.
I'm not capable of reviewing the fastp change without a lot of effort, so I'll have to leave the actual review up to @bluegenes. @mr-eyes have you observed any performance improvement? |
Yes, the |
@mr-eyes do you think you could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes make sense for making things faster.
Would be really great to have at least one timed comparison, with corresponding info on the difference in # trimmed reads!
otherwise lgtm!
I will work on it soon and report. Thanks @bluegenes |
pe correction
, low_complexity_filter
& abund_cutoff
as config params
Thanks, @bluegenes, for the ping. I updated the PR description with more flexible changes. It's ready for review now, and I will post here later benchmark details. |
pe correction
, low_complexity_filter
& abund_cutoff
as config paramspe correction
, low_complexity_filter
& abund_cutoff
as config params
|
||
## Error trimming flags | ||
# fastp_correction: set to ON or 1 for base correction for PE data | ||
fastp_correction: OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use YAML booleans - it looks like OFF is a valid one (per https://yaml.org/type/bool.html), but your code below doesn't use boolean value checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that checking for boolean?
genome-grist/genome_grist/conf/Snakefile
Line 97 in 8f7180f
correction_flag = "--correction" if config_correction in [True, '1'] else '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'if' here worries me - if it's a boolean, then you should just be able to say if config_correction
. Most specifically, '1' should not be magic (which it seems to be here). I don't really know how snakemake's config yaml parser interprets valid vs invalid YAML boolean values, but I would be disturbed to see 'yes' interpreted as false here.
YAML doesn't seem to have a simple answer for what is a valid bool - this is a mess! -
y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF
I guess the simplest thing to do would be to insist that it be a valid bool (True or False), not a '1' or anything else, and complain if it's not.
(I'm disgruntled with YAML because of this kind of thing - Luiz mentioned an increasing preference for TOML, because YAML isn't really that well specified.)
hi @mr-eyes I left some comments, looking good overall! main requests other than that -
|
Co-authored-by: C. Titus Brown <titus@idyll.org>
Co-authored-by: C. Titus Brown <titus@idyll.org>
I did some benchmarks and found no significant change in time/memory. So while it is good to have a flexible configuration, this does not speed up genome-grist the way I expected. However, I think it can affect the performance depending on the size/quality of the dataset. Wanted to ask, was there a specific criterion for selecting the abundtrim and trimming parameters? I can't imagine how it will biologically affect the results. |
did you take a look at https://peerj.com/preprints/890/? |
These are just suggestions to speed up the trimming step as it seems to be very slow. I don't know the implications though...
cc @bluegenes