Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code changes to work with pysam 0.13.0 #375

Merged
merged 22 commits into from
Dec 14, 2017
Merged

Conversation

sebastian-luna-valero
Copy link
Member

@AndreasHeger @Acribbs please review

@Acribbs
Copy link
Member

Acribbs commented Nov 24, 2017

Have also updated the test for bam2bam nh-flag.bam because the test output was the same when using samtools view but the binary was different. We copied the new test output and replaced it with the old one.

@sebastian-luna-valero
Copy link
Member Author

Hi @Acribbs

Here are some errors that I got after testing this branch (along with AC-pipeline_pub).

Please let me know your thoughts. They both look like a pysam issue on gtf2tsv.

Best regards,
Sebastian

  • Running the tests for genesets
    Exception #1
      'builtins.OSError(---------------------------------------
    Child was terminated by signal 1: 
    The stderr was: 
    Traceback (most recent call last):
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/bin/cgat", line 11, in <module>
        load_entry_point('CGAT', 'console_scripts', 'cgat')()
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/cgat.py", line 132, in main
        module.main(sys.argv)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/gtf2tsv.py", line 169, in main
        val = getattr(gtf, a)
      File "pysam/libctabixproxies.pyx", line 630, in pysam.libctabixproxies.GTFProxy.__getattr__
    KeyError: 'exon_id'
    
    zcat Homo_sapiens.GRCh38.87.gtf.gz     | cat     | grep "transcript_id"     | cgat gtf2gtf     --method=sort --sort-order=gene+transcript     | cgat gtf2tsv     --attributes-as-columns --output-only-attributes -v 0     | python /ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/csv_cut.py     --remove exon_id transcript_id transcript_name protein_id exon_number     | /ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/scripts/hsort 1     | uniq     | cgat csv2db      --retry  --database-backend=sqlite --database-name=csvdb --database-host= --database-user= --database-password= --database-port=3306     --add-index=gene_id --add-index=gene_name--map=gene_name:str     --table=gene_info     > ensembl.dir/gene_info.load
    -----------------------------------------)' raised in ...
       Task = def loadGeneInformation(...):
       Job  = [Homo_sapiens.GRCh38.87.gtf.gz -> ensembl.dir/gene_info.load]
    
    Traceback (most recent call last):
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/lib/python3.6/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
        register_cleanup, touch_files_only)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/lib/python3.6/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files
        ret_val = user_defined_work_func(*params)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/pipeline_genesets.py", line 891, in loadGeneInformation
        PipelineGtfsubset.loadGeneInformation(infile, outfile)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/PipelineGtfsubset.py", line 387, in loadGeneInformation
        P.run()
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/Pipeline/Execution.py", line 548, in run
        ignore_errors=ignore_errors)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/Pipeline/Cluster.py", line 327, in collectSingleJobFromCluster
        "".join(stderr), statement))
    OSError: ---------------------------------------
    Child was terminated by signal 1: 
    The stderr was: 
    Traceback (most recent call last):
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/bin/cgat", line 11, in <module>
        load_entry_point('CGAT', 'console_scripts', 'cgat')()
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/cgat.py", line 132, in main
        module.main(sys.argv)
      File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/gtf2tsv.py", line 169, in main
        val = getattr(gtf, a)
      File "pysam/libctabixproxies.pyx", line 630, in pysam.libctabixproxies.GTFProxy.__getattr__
    KeyError: 'exon_id'
    
    zcat Homo_sapiens.GRCh38.87.gtf.gz     | cat     | grep "transcript_id"     | cgat gtf2gtf     --method=sort --sort-order=gene+transcript     | cgat gtf2tsv     --attributes-as-columns --output-only-attributes -v 0     | python /ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/csv_cut.py     --remove exon_id transcript_id transcript_name protein_id exon_number     | /ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/scripts/hsort 1     | uniq     | cgat csv2db      --retry  --database-backend=sqlite --database-name=csvdb --database-host= --database-user= --database-password= --database-port=3306     --add-index=gene_id --add-index=gene_name--map=gene_name:str     --table=gene_info     > ensembl.dir/gene_info.load
    -----------------------------------------
  • Running the tests for rnaseqqc
                                               Original exception:
                                               
                                                   Exception #1
                                                     'builtins.OSError(---------------------------------------
                                                   Child was terminated by signal 1: 
                                                   The stderr was: 
                                                   Traceback (most recent call last):
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/bin/cgat", line 11, in <module>
                                                       load_entry_point('CGAT', 'console_scripts', 'cgat')()
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/cgat.py", line 132, in main
                                                       module.main(sys.argv)
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/gtf2tsv.py", line 169, in main
                                                       val = getattr(gtf, a)
                                                     File "pysam/libctabixproxies.pyx", line 630, in pysam.libctabixproxies.GTFProxy.__getattr__
                                                   KeyError: 'p_id'
                                                   
                                                   zcat geneset.dir/refcoding.gtf.gz     |cgat gtf2tsv     --attributes-as-columns     --output-only-attributes     | cgat csv_cut transcript_id gene_id     > geneset.dir/refcoding.tsv
                                                   -----------------------------------------)' raised in ...
                                                      Task = def buildTranscriptGeneMap(...):
                                                      Job  = [geneset.dir/refcoding.gtf.gz -> geneset.dir/refcoding.tsv]
                                                   
                                                   Traceback (most recent call last):
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/lib/python3.6/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
                                                       register_cleanup, touch_files_only)
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/lib/python3.6/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files
                                                       ret_val = user_defined_work_func(*params)
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/pipeline_rnaseqqc.py", line 559, in buildTranscriptGeneMap
                                                       P.run()
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/Pipeline/Execution.py", line 548, in run
                                                       ignore_errors=ignore_errors)
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-pipelines/CGATPipelines/Pipeline/Cluster.py", line 327, in collectSingleJobFromCluster
                                                       "".join(stderr), statement))
                                                   OSError: ---------------------------------------
                                                   Child was terminated by signal 1: 
                                                   The stderr was: 
                                                   Traceback (most recent call last):
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/conda-install/envs/cgat-p/bin/cgat", line 11, in <module>
                                                       load_entry_point('CGAT', 'console_scripts', 'cgat')()
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/cgat.py", line 132, in main
                                                       module.main(sys.argv)
                                                     File "/ifs/projects/sebastian/pipeline-testing-new-pipelines/cgat-scripts/CGAT/scripts/gtf2tsv.py", line 169, in main
                                                       val = getattr(gtf, a)
                                                     File "pysam/libctabixproxies.pyx", line 630, in pysam.libctabixproxies.GTFProxy.__getattr__
                                                   KeyError: 'p_id'
                                                   
                                                   zcat geneset.dir/refcoding.gtf.gz     |cgat gtf2tsv     --attributes-as-columns     --output-only-attributes     | cgat csv_cut transcript_id gene_id     > geneset.dir/refcoding.tsv
                                                   -----------------------------------------

@Acribbs
Copy link
Member

Acribbs commented Nov 27, 2017

hmm iv seen this error before, I will look into it tomorrow.

@Acribbs
Copy link
Member

Acribbs commented Nov 27, 2017

So I have ran

zcat geneset.dir/refcoding.gtf.gz     | grep ENST00000459772

and it seems for this gene there is no p_id being generated during the cufflinks step for this gene. It may require more investigation as to why because it isn't clear to me at the moment what the exact problem is.

@Acribbs
Copy link
Member

Acribbs commented Nov 27, 2017

have ran:

zcat Homo_sapiens.GRCh38.87.gtf.gz     | cat     | grep "transcript_id"     | cgat gtf2gtf     --method=sort --sort-order=gene+transcript     | grep ENST00000373020

and the transcript has exon id that other transcripts for this gene dont have. Will look into this further.

@sebastian-luna-valero
Copy link
Member Author

Hi,

Following up our discussion over the group meeting today, I just wanted to confirm that the version of cufflinks in our central installation and the conda environment is the same (2.2.1), so this should not be the cause of the problem.

Best regards,
Sebastian

@Acribbs
Copy link
Member

Acribbs commented Nov 29, 2017

gtf2tsv: p_id issue: This seems to be relatated to a python 3.6 issue because on my py35 environment I can run the script no problem on the commandline.

@Acribbs
Copy link
Member

Acribbs commented Nov 29, 2017

Ah I see what the issue is now, it should have been obvious when an AttributeError is raised (i.e. the value is not present in the attributes column) the val is not set to " " but raises a KeyError not an AttributeError. Not sure why AttributeError has changed from python 3.5 to KeyError in py 3.6. I think the fix is to capture both exceptions. Where would you like me to make the fix? @sebastian-luna-valero In a seperate cgat branch?

@sebastian-luna-valero sebastian-luna-valero merged commit 433ff78 into master Dec 14, 2017
@sebastian-luna-valero sebastian-luna-valero deleted the AC-pysam_changes branch December 14, 2017 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants