Skip to content

deblur 2021.09 #3141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 9, 2021
Merged

deblur 2021.09 #3141

merged 7 commits into from
Sep 9, 2021

Conversation

antgonza
Copy link
Member

@antgonza antgonza commented Sep 3, 2021

No description provided.

@coveralls
Copy link

coveralls commented Sep 3, 2021

Coverage Status

Coverage increased (+0.008%) to 91.171% when pulling 24af5a7 on antgonza:deblur_2021.09 into dcf6cbe on qiita-spots:dev.

general rule of thumb, as a first analytical pass for meta-analysis for 16S data, we use
5,000 sequences per sample and we prefer 150 base pair trimming. Thus, we directly
contacted all study owners that would recover more than 5% of the samples in their study
(total 24).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be OK to recommend to users that they reach out to qiita-help if they are concerned about how this affected them?

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
Sample counts implications
--------------------------

At the time of writing Qiita had 978,052 16S deblured private or pubic samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear if you refer to the time of writing the bogus parser or this text

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this text, I can add more text to make it clearer ...

In the figure below, we have at different trimming lengths how many samples we recover
based on the minimum number of sequences per sample - this is an important consideration
as we normally need to remove samples below a given threshold for beta diversity
calculations (via rarefactoin) or differential abundance testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only applies, if rarefaction was based off faith PD, all other metrics should not have been affected, iff you did not additionally filter the deblur tables provided by Qiita down to those features contained in the insertion tree file (which comes together with the biom file in qiita)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but this is true only for the full/all table not for the reference-hit because the table needs to be filtered to match what's in the tree, which is automatically done via the meta-analysis construction in Qiita. Do you have any specific text suggestions to cover this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add some more information about this at the end of the intro paragraph, please let me know if that works.

- 96.6% of preparations had 0-10% of features lost
- 12.6% had 10-20% of the features lost
- 9.7% 30-40%
- 6.9% 40-50%
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to give the full list, all the way to 90%.

- 6.9% 40-50%

Remember that the percentage reported above is inclusive at the next level, for example the studies with
40-50% lost are also accounted for at lower levels.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also include a comment that we did not find any strong patterns among the studies that were most greatly affected, whether they were from a specific sample type (according to empo category) or target 16S variable region.

Copy link
Contributor

@wasade wasade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All requests are in suggested changes so if good should be quick to merge

Thank you @wasade!

Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
@ElDeveloper ElDeveloper merged commit d97a99c into qiita-spots:dev Sep 9, 2021
@ElDeveloper
Copy link
Contributor

Thanks @antgonza!

@antgonza
Copy link
Member Author

antgonza commented Sep 9, 2021

Thank you all for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants