Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[From_CTB] fasta descriptions confound extract-partitions.py #175

Closed
RamRS opened this issue Oct 15, 2013 · 10 comments
Closed

[From_CTB] fasta descriptions confound extract-partitions.py #175

RamRS opened this issue Oct 15, 2013 · 10 comments

Comments

@RamRS
Copy link
Contributor

RamRS commented Oct 15, 2013

Transferring ctb#11 opened on 08-May-2012

I got this error trying to follow the partitioning instructions:

python /Users/jmeppley/tmp/tools/diginorm/khmer/scripts/extract-partitions.py all_pe.p32 all_pe.fasta.part

---
reading partitioned files: ['all_pe.fasta.part']
outputting to files named "all_pe.p32.groupN.fa"
min reads to keep a partition: 5
max size of a group file: 1000000
partition size distribution will go to all_pe.p32.dist

---
Traceback (most recent call last):
  File "/Users/jmeppley/tmp/tools/diginorm/khmer/scripts/extract-partitions.py", line 177, in <module>
    main()
  File "/Users/jmeppley/tmp/tools/diginorm/khmer/scripts/extract-partitions.py", line 81, in main
    for n, name, pid, seq in read_partition_file(filename):
  File "/Users/jmeppley/tmp/tools/diginorm/khmer/scripts/extract-partitions.py", line 26, in read_partition_file
    name, partition_id = name.rsplit('\t', 1)
ValueError: need more than 1 value to unpack

It looks like the code that created the partition-labelled fasta did not account for descriptions in the fasta header. Because the first line in my partition file looks like this:

>M00180:26:000000000-A0JVV:1:1:14959:1511.1 1:N:0:0 2

But the extract-partitions script is trying to parse the partition number out of the name.

I made this change to fix the problem:

diff --git a/scripts/extract-partitions.py b/scripts/extract-partitions.py
index 3c00538..9107791 100755
--- a/scripts/extract-partitions.py
+++ b/scripts/extract-partitions.py
@@ -23,7 +23,10 @@ DEFAULT_THRESHOLD=5
 def read_partition_file(filename):
     for n, record in enumerate(screed.open(filename)):
         name = record.name
-        name, partition_id = name.rsplit('\t', 1)
+        if '\t' in name:
+            name, partition_id = name.rsplit('\t', 1)
+        else:
+            description, partition_id = record.description.rsplit('\t',1)
         yield n, name, int(partition_id), record.sequence

 ###
@RamRS
Copy link
Contributor Author

RamRS commented Oct 15, 2013

@mr-c @ctb Does this have anything to do with #68 or #46 ?

@mr-c
Copy link
Contributor

mr-c commented Feb 21, 2014

@RamRS nope

@mr-c mr-c added this to the 1.0 release milestone Feb 28, 2014
@mr-c mr-c modified the milestones: 1.1+ Release, 1.0 release Apr 2, 2014
@mr-c mr-c modified the milestones: 1.2 Release, 1.2+ Aug 19, 2014
@devbioinfoguy
Copy link

I am going to try to resolve this issue

@lexnederbragt
Copy link

I am going to write a test for this case - it seems to be fixed but I'll confirm that and add the test. I'm being coached by @mr-c who is visiting our lab :-)

@mr-c
Copy link
Contributor

mr-c commented Apr 22, 2015

Working backwards @lexnederbragt and I are unable to reproduce this issue with any version of khmer released during my employment (0.5...1.3)

@mr-c
Copy link
Contributor

mr-c commented Apr 22, 2015

And I can't find a download for an older version of khmer than v0.5. Seeing as that was released a year and a half after this issue was originally reported I am going to mark this issue as fixed once we have merged Lex's test.

@ctb
Copy link
Member

ctb commented Apr 22, 2015

Well, its probably in version control somewhere. And it might (likely) depend on screed. But if it�s fixed it�s fixed :)

@mr-c
Copy link
Contributor

mr-c commented Apr 22, 2015

@ctb Right-o, I didn't try versions of screed older than v0.7.1

@mr-c
Copy link
Contributor

mr-c commented Apr 23, 2015

Tested in #979

@mr-c mr-c closed this as completed Apr 23, 2015
@devbioinfoguy
Copy link

(thumbs-up)

Cheers,
B

On Apr 23, 2015, at 8:40 AM, Michael R. Crusoe notifications@github.com wrote:

Tested in #979 #979

Reply to this email directly or view it on GitHub #175 (comment).


Bob Freeman, Ph.D.
Research Computing Facilitator
XSEDE Campus Champion
FAS Research Computing
Harvard University
38 Oxford Street, Rm 117
Cambridge, MA 02138

617/495.8824, vox

@devbioinfoguy
About: http://bit.ly/1m8n0se

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants