-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CiteSeqCount on empty droplet whitelist #29
Comments
Hi, so this a bit more of a CITEseq-count question upstream of dsb but I'm happy to help out. Can you display the command you used for CiteSeqcount? So I understand, you ran Cell Ranger on the RNA data, from that used the filtered output to get cell containing droplet barcodes, then tried to run CiteSeqCount using a whitelist consisting of what you think are empty drops only without those cells? If so, I don't think that will work–the simplest way to use CITE-seq count is to not use a whitelist of barcodes: just specify a high number for the Also, what antibodies are you using? You may be able to align your ADTs with Cell Ranger which would be simpler (that is basically shown in the vignette). |
Sorry for the late response. So given that the number of 'good' cells range between 3-10k, should I just set -cells to around 1M (10-100x the number of filtered barcodes)? I will give it a try and see what happens. For this experiment I used BioLegend TotalSeq-A antibodies. I tried to map the ADTs using CellRanger but the efficiency is much smaller than CiteSeqCount, which provide optimal results (which also make sense). |
@MattPM A related question. What do you think is the optimal negative/empty_droplet number to use for dsb? negative from the HTODemux output is much smaller than the difference between raw and filtered feature bc ,i.e. background = setdiff(colnames(raw$ Thanks |
@kristiangu |
If you want to take a deep dive into the differences between using hashing negative vs negatives defined from library size distribution from the aligned data and how using each impacts the actual normalized counts, take a look at the Supplementary Note of the preprint–it did not impact the normalized counts in the data we tested. As this thread is alluding to, what the 'right' number of background droplets to use will depend on the experiment and the number of cells you expect to recover. If you are using Cell Ranger, I would use 10X the number of recovered cells from the raw output subset to not include cells or those with high mRNA as shown in the vignette on CRAN. The exact cutoffs for mRNA content or library size for the empty drops depend on your data however the thresholds we show as a guide (e.g. less than 80 unique mRNA to be considered an empty drop) are a good stating point and work on most datasets. @grothja @kristiangu Note that as long as you retain the major background peak, the normalized values will be very interpretable even if there is bin modality in the total library size distribution across droplets. |
@MattPM Thanks for the insight. When I use HTODemux negative as empty droplets from filtered feature barcode matrix, I have ~1K vs 8K positive cells. Then I tried a hybrid approach, still use HTODemux singlet as positive from filtered feature barcode matrix, but use negative_mtx_rawprot processed from raw feature barcode matrix, now I got ~72K empty droplets. So the dsb function will be like Another question is regarding the dead/dying cells, do you remove dead cells identified by RNA-seq data before dsb normalization or it does not matter? Thanks for your time |
@gt7901b The function should not take that much longer with different sized background matrix. The warning you're reporting sounds like your matrices are not the same dimensions--check if your background and cell matrix have the same number of rows? (same number of proteins in the same positions) ^That is important so you want to look into what is causing that error, if you can't find it I can help if you send your data. The range you're reporting ~0-20 is typical for dsb values. Overall, the hybrid approach you're describing sounds better to me. FYI Re: "When I use HTODemux negative as empty droplets from filtered feature barcode matrix" |
many thanks, @MattPM . “longer object length is not a multiple of shorter object length” is caused by difference of one Ab. Because one of the negative control Ab has zero counts in negative_mtx_rawprot (there are two negative ctrl Abs in all), I remove that Ab from negative_mtx_rawprot. despite warning, the dsb still finished normalization. Is that fine? |
@gt7901b Thanks very much for bringing this to my attention, it’s a corner case but I will make the function stop if the two input matrices do not have matched rows; that’s quite important. |
@MattPM thanks for the advice. I fixed that and it works very fast. What about dead/dying cells? Will removing dead cells before dsb affect the result? |
@gt7901b It’s better to remove low quality / dead cells before running the function, that’s basically the idea in the vignette; run QC with ADT and mRNA based metrics to get the high quality cells first then normalize. |
Additional error checking on input cell and background matrices: - stop if input matrix rows are not equivalent length #29 - stop if any names in input matrices are not equivalent - warn if the rows are not in the same order and reorder to match. Improve warning and error messages for isotype control name matching issues. Add protein (mean sd) and cell level stats to output if return.stats set to TRUE. Add unit tests for changes.
Hi,
I am trying perform dsb normalisation on my ADTs. However, I am encountering an issue when it comes to perform CiteSeqCount on the whitelists of the empty droplets (excluded filtered barcodes from CellRanger). Basically, the analysis is stuck at testing cell barcode collapsing thresholds.
This happens on samples that have 2.5 to 3 million barcodes in their empty droplets.
Any suggestion? I may be doing something wrong.
Thanks
The text was updated successfully, but these errors were encountered: