-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10x use case #50
base: master
Are you sure you want to change the base?
10x use case #50
Conversation
Thanks for the pull request! Did not see it earlier... will take a look but I suspect it will be a good addition! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple initial comments. Hoping to recruit someone with more 10X knowledge to review more for correctness.
Thanks for the updates @ayeaton. Two things I'm trying to understand: (1) What the function will do with multiple different bed files as input, either with some of the same or different barcodes, and (2) if there is an easy way to assemble the matrix as a Sparse matrix from the get-go (what is done when using RG tags for bams) as that could be good for memory considerations. The current implentation first makes a dense matrix which gets converted to sparse by the call to Matrix. Given that 10x samples are likely to be large it might be better to try to construct the matrix as a sparse matrix from the get-go. |
Hi Alicia, Thanks for your comments! To address your first question, if there are barcodes that are the same across bed files, the user can input different names for each bed file using the colData field in getCounts(). These names will then be appended to the barcodes of that bed file so the names for each barcode will be unique. I added a test case to demonstrate. To address your second question, I moved the creation of the sparse matrix a little further up in the function, but I'm not sure how to change the portion of the function that relies on GRanges objects to be more memory efficient. |
Thanks for clarification re the multiple files (and new test case!). I will take a closer look at the matrix creation to see if I have a more concrete suggestion for change (but might not get to it for a few days). |
Hi Alicia,
I wanted to use chromVAR for 10x atac-seq data. I saw that one suggestion was to use the fragments bed file, and alter the functions in chromVAR to treat a column as a barcode. I implemented the changes in my branch and added a small test case. The datafile test_x10_bed.tsv is the first 1,000 rows of the atac seq data from 10x (http://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_5k/atac_v1_pbmc_5k_fragments.tsv.gz).
I changed the following things:
In the get_inputs.R script, I added a function called get_counts_from_x10_beds for the 10x input bed files. I also made some minor changes in the functions readAlignmentFromBed, left_right_to_grglist and getCounts.