Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError numpy.float #36

Open
brambloemen opened this issue Jun 24, 2024 · 6 comments
Open

TypeError numpy.float #36

brambloemen opened this issue Jun 24, 2024 · 6 comments

Comments

@brambloemen
Copy link

Hello,

I would very much like to use this tool for binning my Flye metagenome assemblies.
However, I ran into the following error:

graphmb --assembly Flye/ --outdir GraphMB --depth depths.txt --numcores 8

logging to GraphMB/20240624-090003graphmb_output.log
Running GraphMB 0.2.5
using cuda: False
setting seed to 1
setting tf seed
Reading cache from
Not using SCG file: marker_gene_stats.tsv (not found)
Reading assembly info file
found circular contig contig_140 mult. 1
--- name depth markers label edges
found circular contig contig_877 mult. 1
--- name depth markers label edges
found circular contig contig_903 mult. 1
--- name depth markers label edges
found circular contig contig_897 mult. 1
--- name depth markers label edges
==============================
DATASET STATS:
number of sequences: 594
assembly length: 0.031 Gb
assembly N50: 0.152 Mb
assembly average length (Mb): 0.053 max: 4.914 min: 0.0
Uncaught exception
Traceback (most recent call last):
  File "/data/brbloemen/mambaforge/bin/graphmb", line 8, in <module>
    sys.exit(main())
  File "/data/brbloemen/mambaforge/lib/python3.10/site-packages/graphmb/main.py", line 480, in main
    dataset.print_stats()
  File "/data/brbloemen/mambaforge/lib/python3.10/site-packages/graphmb/contigsdataset.py", line 177, in print_stats
    print("coverage samples: {}".format(len(self.node_depths[0])))
TypeError: object of type 'numpy.float64' has no len()
srun: error: bioit-rd: task 0: Exited with exit code 1
@AndreLamurias
Copy link
Collaborator

Hi, what dataset are you using and what's the content of the depths.txt file?

@brambloemen
Copy link
Author

Hi,

I'm trying to run the tool on a metagenomic assembly of nanopore reads generated with Flye v2.9.3

I'm using the following command to generate the depths.txt:
jgi_summarize_bam_contig_depths --outputDepth {output.txt} {input.bam}
The input for this is a bam file generated by aligning nanopore reads (fastq) to a Flye --meta assembly using minimap2.

Here is the depth file:
depths.txt

@AndreLamurias
Copy link
Collaborator

Your file seems fine so it's quite possible that it's not being read by GraphMB
The file should be inside Flye/
Can you check GraphMB/20240624-090003graphmb_output.log for any message about the depths file?

@brambloemen
Copy link
Author

I did copy the file to the Flye/ dir.
Here is the log:

Running GraphMB 0.2.5
using cuda: False
setting seed to 1
Reading cache from
Not using SCG file: marker_gene_stats.tsv (not found)
Reading assembly info file
Uncaught exception
Traceback (most recent call last):
  File "/data/brbloemen/mambaforge/bin/graphmb", line 8, in <module>
    sys.exit(main())
  File "/data/brbloemen/mambaforge/lib/python3.10/site-packages/graphmb/main.py", line 480, in main
    dataset.print_stats()
  File "/data/brbloemen/mambaforge/lib/python3.10/site-packages/graphmb/contigsdataset.py", line 177, in print_stats
    print("coverage samples: {}".format(len(self.node_depths[0])))
TypeError: object of type 'numpy.float64' has no len()

@AndreLamurias
Copy link
Collaborator

Can you run again with the options: ""--loglevel debug --reload" ?

@brambloemen
Copy link
Author

When rerunning with the options you mention, I seem to get this error instead:

Running GraphMB 0.2.5
Namespace(assembly='Flye/', assembly_name='assembly.fasta', graph_file='assembly_graph.gfa', edge_threshold=None, depth='depths.txt', features='features.tsv', labels=None, embs=None, model_name='gcn', activation='relu', layers_vae=2, layers_gnn=3, hidden_gnn=128, hidden_vae=512, embsize_gnn=32, embsize_vae=64, batchsize=256, batchtype='auto', dropout_gnn=0.1, dropout_vae=0.2, lr_gnn=0.01, lr_vae=0.001, graph_alpha=1, kld_alpha=200, ae_alpha=1, scg_alpha=1, clusteringalgo='vamb', kclusters=None, aggtype='lstm', decoder_input='vae', vaepretrain=500, ae_only=False, negatives=10, quick=False, classify=False, fanout='10,25', epoch=500, print=10, evalepochs=20, evalskip=50, eval_split=0.0, kmer=4, rawfeatures=False, clusteringloss=False, targetmetric='hq', concatfeatures=False, no_loss_weights=True, no_sample_weights=True, early_stopping=0.1, nruns=1, mincontig=1000, minbin=200000, mincomp=1, randomize=False, labelgraph=False, binarize=False, noedges=False, read_embs=False, reload=True, markers='marker_gene_stats.tsv', post='writeembs_contig2bin', writebins=False, skip_preclustering=False, outname='graphmb', cuda=False, noise=False, savemodel=False, tsne=False, numcores=8, outdir='GraphMB', assembly_type='flye', contignodes=False, seed=1, quiet=False, read_cache=False, version=False, loglevel='debug')
using cuda: False
setting seed to 1
Cache not found on GraphMB
processing sequences Flye/assembly.fasta
read 594 seqs
processing GFA file (edge nodes) Flye/assembly_graph.gfa
skipped contigs 904 < 1000
read 0, edges
reading depths
reading labels
Saved cache to GraphMB

Not using SCG file: marker_gene_stats.tsv (not found)
Reading assembly info file
==============Running VAE model=====================
Uncaught exception
Traceback (most recent call last):
  File "/data/brbloemen/mambaforge/envs/graphmb/bin/graphmb", line 8, in <module>
    sys.exit(main())
  File "/data/brbloemen/mambaforge/envs/graphmb/lib/python3.9/site-packages/graphmb/main.py", line 499, in main
    vae_embs, _ = train_ccvae.run_model_ccvae(dataset, args, logger, 0,
  File "/data/brbloemen/mambaforge/envs/graphmb/lib/python3.9/site-packages/graphmb/train_ccvae.py", line 154, in run_model_ccvae
    X, adj, cluster_mask, neg_pair_idx, pos_pair_idx = prepare_data_for_gnn(
  File "/data/brbloemen/mambaforge/envs/graphmb/lib/python3.9/site-packages/graphmb/train_ccvae.py", line 91, in prepare_data_for_gnn
    edge_features = edge_weights / edge_weights.max()
  File "/data/brbloemen/mambaforge/envs/graphmb/lib/python3.9/site-packages/numpy/core/_methods.py", line 40, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants