Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whitelist is not defined error when running umi_tools extract #593

Closed
prmunn opened this issue May 10, 2023 · 4 comments
Closed

whitelist is not defined error when running umi_tools extract #593

prmunn opened this issue May 10, 2023 · 4 comments

Comments

@prmunn
Copy link

prmunn commented May 10, 2023

I have an issue similar to #509 where I need to use the regex option when also using a whitelist. However, my BC pattern is XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN and I'm not sure what the regex is for this (previously, I've only seen regex patterns for N's and C's). Is there a regex pattern that can also include X's, or alternatively, is the a way to pass in the pattern as a string? (currently the string option does not appear to work with a whitelist).

@IanSudbery
Copy link
Member

This should have been fixed several releases ago, can you give the exact error that you have?

The regex pattern can include Xs, but how that is achevied depends on what you want to do with the XXs. Do you want to keep or discard the bases that match the Xs?

@prmunn
Copy link
Author

prmunn commented May 10, 2023

I'm running version 1.1.2 - is it fixed in that version?
I would like to keep the bases that match the X's

Here is the command I'm running and the resulting error:
umi_tools extract --extract-method=string
CCCC> -p XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN \

--filtered-out=sciRNA-10K_extract_filtered_out.txt
--filtered-out2=sciRNA-10K_extract_filtered_out2.txt
--error-correct-cell
--quality-filter-mask=20
--quality-encoding=phred33
--whitelist=sciRNA-10K_predictedBCwhitelist.txt
-I sciRNA-10K_whitelist_out_R2.fastq
-S sciRNA-10K_hBC_UMI_R2.fastq.gz
--read2-in=sciRNA-10K_whitelist_out_R1.fastq
--read2-out=sciRNA-10K_hBC_UMI_R1.fastq.gz
-L sciRNA-10K_extractBC.log
Traceback (most recent call last):
File "/programs/UMI-tools/bin/umi_tools", line 8, in
sys.exit(main())
File "/programs/UMI-tools/lib64/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/programs/UMI-tools/lib64/python3.9/site-packages/umi_tools/extract.py", line 314, in main
whitelist is None):
NameError: name 'whitelist' is not defined

@IanSudbery
Copy link
Member

This particular problem was fixed in 1.1.3. I recommend you update.

Any to specify the barcode in regex so as to keep the Xs you could use:

XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN

'^...(?P<cell_1>.{12})...(?P<cell_2>.{12})...(?P<cell_3>.{12}).{17}(?P<umi_1>.{8})'

@prmunn
Copy link
Author

prmunn commented May 11, 2023

Thanks for your quick reply. I'll upgrade to the latest version and try the regex you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants