Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter #40

rocpv1977 · 2023-04-03T17:06:47Z

Hi!

I am trying to asses how well duplex_tools split_on_adapter is doing its job and duplex_tools assess_split_on_adapter asks for the following positional arguments:
seqkit_stats_nosecondary
edited_reads
unedited_reads
split_multiple_times

I imagine the last three are the .pkl files that are created in the folder for split files, but I am not sure what "seqkit_stats_nosecondary". I have tried to introduce the output of

seqkit stats path/to/file --all

and

seqkit stats path/to/file --all

but I get this error:

/media/seq-ur/65225E7076CF2AF3/basecalling_bacterias/K_oxytoca/K_oxytoca_29_03_2023/pass/split/seqkit_stats contains 1 reads
Traceback (most recent call last):
File "/home/seq-ur/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'read'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/seq-ur/venv/bin/duplex_tools", line 33, in
sys.exit(load_entry_point('duplex-tools==0.3.2', 'console_scripts', 'duplex_tools')())
File "/home/seq-ur/venv/lib/python3.9/site-packages/duplex_tools/init.py", line 39, in main
args.func(args)
File "/home/seq-ur/venv/lib/python3.9/site-packages/duplex_tools/assess_split_on_adapter.py", line 129, in main
assess(
File "/home/seq-ur/venv/lib/python3.9/site-packages/duplex_tools/assess_split_on_adapter.py", line 32, in assess
txt = txt[txt['read'].isin(expected_read_ids)]
File "/home/seq-ur/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3760, in getitem
indexer = self.columns.get_loc(key)
File "/home/seq-ur/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
raise KeyError(key) from err
KeyError: 'read'

Could you help me understand what "seqkit_stats_nosecondary" is?

Thanks!

ollenordesjo · 2023-04-04T08:01:05Z

Hi @rocpv1977!

Thanks for the question. You're definitely on the right track. You are expected to give it the output from seqkit bam on a bam file that does not have secondary alignments. If your alignment has been done in a way that includes secondary alignments, you would be expected to filter out secondary reads, for example with samtools view:

samtools view -F 256 input.bam > nosecondary.bam
seqkit bam nosecondary.bam 2> nosecondary.txt

Excuse the confusing naming and the lack of documentation regarding this. It's worth tidying up.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter #40

Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter #40

rocpv1977 commented Apr 3, 2023

ollenordesjo commented Apr 4, 2023

Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter #40

Positional arguments (especially seqkit_stats_nosecondary) in duplex_tools assess_split_on_adapter #40

Comments

rocpv1977 commented Apr 3, 2023

ollenordesjo commented Apr 4, 2023