You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just realized a potential pitfall with this pipeline (and similar approaches).
Like me, you may include empty/blank samples (or have real samples with no reads matching tags and primer sequences). When processing such a sample, Cutadapt will eventually have nothing to work on (so to speak), resulting in an empty file after removal of tags and primers. However vsearch will just work on the (fasta) tmp-file from the previous sample, resulting in the final dereplicated file (S00x.fas) for the current empty/blank/negative being identical to the previous "real" sample. This is of course easy to spot in a project with only a few samples, but may be less apparent with large datasets.
You will need to identify samples that were devoid of reads matching the actual tags for that sample, and remove them before downstream processing. They can be identified eg by searching for the sentence "Unable to read from file" in the logfiles. For example like this:
grep -c "Unable to read from file" S[0-9][0-9][0-9]*log
Regards
Tobias
The text was updated successfully, but these errors were encountered:
tobiasgf
changed the title
Tak care with empty files
Take care with empty files
Sep 12, 2016
tobiasgf
changed the title
Take care with empty files
Take care with samples with no matching tags&primers
Sep 12, 2016
Dear Frederic (& others)
I just realized a potential pitfall with this pipeline (and similar approaches).
Like me, you may include empty/blank samples (or have real samples with no reads matching tags and primer sequences). When processing such a sample, Cutadapt will eventually have nothing to work on (so to speak), resulting in an empty file after removal of tags and primers. However vsearch will just work on the (fasta) tmp-file from the previous sample, resulting in the final dereplicated file (S00x.fas) for the current empty/blank/negative being identical to the previous "real" sample. This is of course easy to spot in a project with only a few samples, but may be less apparent with large datasets.
You will need to identify samples that were devoid of reads matching the actual tags for that sample, and remove them before downstream processing. They can be identified eg by searching for the sentence "Unable to read from file" in the logfiles. For example like this:
Regards
Tobias
The text was updated successfully, but these errors were encountered: