Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAMP download error #423

Open
jasmezz opened this issue Oct 11, 2024 · 7 comments
Open

DRAMP download error #423

jasmezz opened this issue Oct 11, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@jasmezz
Copy link
Collaborator

jasmezz commented Oct 11, 2024

EDIT: PLEASE SEE #423 (comment) FOR LATEST STATUS

Description of the bug

The DRAMP database issue reported by J.C. Ruitenberg (https://nfcore.slack.com/archives/C02K5GX2W93/p1728552583863299) seems to affect the DRAMP download and processing permanently since it was updated on September 8 (see: http://dramp.cpu-bioinfor.org/static/update.php) – 3 days after funcscan 2.0.0 release.

I reported the problem already to the DRAMP maintainers; if this is not going to be solved soon, the DRAMP download script bin/ampcombi_download.py should be updated on our side to not crash if non-ascii-characters are found in the database sheet during parsing.

Example error here: https://github.com/nf-core/funcscan/actions/runs/11293369456/job/31411353829?pr=422

Command used and terminal output

No response

Relevant files

No response

System information

No response

@jasmezz jasmezz added the bug Something isn't working label Oct 11, 2024
@jasmezz
Copy link
Collaborator Author

jasmezz commented Oct 23, 2024

Still no reply from DRAMP maintainers (neither to my e-mail nor via uploading a fixed DRAMP database). I think we should go ahead and adapt the DRAMP download script, @Darcy220606 👾
If you check their homepage, they have it under CC-BY license. Meaning we could upload it on our side (Zenodo?) to track versions and provide a working DRAMP version. Either:

@jasmezz
Copy link
Collaborator Author

jasmezz commented Oct 30, 2024

Update: DRAMP maintainers fixed the reported error (removed invalid 工 character from sequence DRAMP31926) and updated the database online. However, it turned out that other amino acid sequences contain invalid characters as well (see screenshot). Waiting for updates on that. In any case, @Darcy220606 will adapt the DRAMP download script to filter such AMPs out to not have the pipeline crash when parsing the database.

Image

@jfy133
Copy link
Member

jfy133 commented Oct 31, 2024

Where on early are these characters coming from?!

It's sort of making me question the reliability of the database...

@swelbo
Copy link

swelbo commented Nov 12, 2024

Hello - has there been any progress with this?

I'm getting this error when running:

nextflow run nf-core/funcscan -profile test,docker --outdir ~/path/to/test

I assume this is the same issues.

#######

Command error:
Traceback (most recent call last):
File "/home/harry/.nextflow/assets/nf-core/funcscan/bin/ampcombi_download.py", line 78, in
download_DRAMP("amp_ref_database")
File "/home/harry/.nextflow/assets/nf-core/funcscan/bin/ampcombi_download.py", line 49, in download_DRAMP
for record in seq_record:
File "/usr/local/lib/python3.11/site-packages/Bio/SeqIO/Interfaces.py", line 72, in next
return next(self.records)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/Bio/SeqIO/FastaIO.py", line 246, in iterate
Seq(sequence), id=first_word, name=first_word, description=title
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/Bio/Seq.py", line 2034, in init
self._data = bytes(data, encoding="ASCII")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u03a6' in position 1: ordinal not in range(128)

@Darcy220606
Copy link
Contributor

Hi @swelbo Indeed, we are currently working on a fix for ampcombi, so it should be fixed in the coming two weeks. :)

@jfy133 jfy133 pinned this issue Nov 13, 2024
@davidlyttle
Copy link

Just bumping the thread, since I'm experiencing the exact same issue -any updates?

@jasmezz
Copy link
Collaborator Author

jasmezz commented Nov 28, 2024

As discussed on Slack, a fix incl. the new version of ampcombi will be out soon (aim next week). In the meantime, people can use a working DRAMP version from earlier this year; it is already processed by ampcombi, so just extract this archive and supply the resulting directory (or rather the subdirectory within it) to the pipeline with --amp_ampcombi_db <path_to_db_dir>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants