Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not an issue but a question #78

Open
stevenjblair opened this issue Apr 15, 2022 · 2 comments
Open

Not an issue but a question #78

stevenjblair opened this issue Apr 15, 2022 · 2 comments

Comments

@stevenjblair
Copy link

stevenjblair commented Apr 15, 2022

Hello,

First of all, I love bustools.

With that said I ended up with bustools 0.41.0 on my laptop and it is lovely. I have no idea how it got there, all of my other machines have an older version that gets the job done but is not such a multitool of BUS goodness that I have at hand on my mac. It runs without anaconda or module loader. I would love the tar if you have it handy.

Anyhow, my question is Do you have documentation on bustools v0.41? Would love to see a little info on some of these new (to me) functions like umicorrect, clusterhist, linker. Would love to see if you have a manual like you have for version 0.3x here: https://bustools.github.io/manual

Best regards and keep up the great work!
Steve

@redst4r
Copy link

redst4r commented May 11, 2022

Same here, I use bustools alot in everyday work, but I'm a little unclear on some of the new stuff in v.0.41. Some updated documentation would be extremely helpful; I tried to dig through the source to understand some of the newer features, but no luck...

In particular, I'm trying to figure out what bustools count --umi-gene is supposed to do!

@Yenaled
Copy link
Collaborator

Yenaled commented Jul 5, 2022

Hi Steve and @redst4r ,

We apologize for the delay in releasing updated documentation. We are a bit behind on things and most of those features you describe are for exploratory+advanced use cases / analyses that aren't part of the typical scRNA-seq workflow. The new features were used to produce the analyses in this paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02386-z

You might find umicorrect useful: It tries to account for sequencing errors in UMIs (e.g. if one read's UMI is "slightly off" from another read's UMI due to a sequencing error, we'll correct the UMIs so that they're the same sequence)

--umi-gene is something that is extremely useful, especially when you have short UMIs. Let's say your UMIs are only 4 base pairs long -- that means there are only 4^4 (=256) possible UMI sequences. That's not good enough to ensure "uniqueness" -- indeed, two distinct molecules might end up with the same UMI sequence (e.g. the sequence TCCG might be assigned to molecule A, an RNA molecule that originated from gene X, as well as molecule B, an RNA molecule that originated from gene Y). If you don't include --umi-gene, the TCCG UMIs will not be counted at all (it'll just be tossed out because bustools count will be unable to figure out why that TCCG UMI sequence belongs to both gene X and gene Y). With --umi-gene, bustools count is able to recognize that we should count that UMI twice: one for gene X and one for gene Y.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants