Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foldseek #15

Merged
merged 48 commits into from
Mar 15, 2023
Merged

Foldseek #15

merged 48 commits into from
Mar 15, 2023

Conversation

bordin89
Copy link
Collaborator

No description provided.

@bordin89 bordin89 added the draft Draft, do not merge label Oct 27, 2022
@bordin89 bordin89 requested a review from sillitoe October 27, 2022 12:15
@bordin89 bordin89 removed the draft Draft, do not merge label Oct 27, 2022
unique_af_ids.add(file_stub)
foldseek_reader = get_foldseek_reader(fs_input_file)
for foldseek_result_as_dict in foldseek_reader:
result = FoldseekSummary(**foldseek_result_as_dict)
Copy link
Contributor

@sillitoe sillitoe Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this link into foldseek_reader (so that it yields a FoldseekSummary object rather than dict)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried changing that, but it breaks the code. I am still haven't understood how to pass from DictReader to a mapping.

and int(result.bits) >= DEFAULT_FS_BITS_CUTOFF
and result.query in unique_af_ids
):
best_hits.add(result.query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is best_hits used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I am wondering if we should output something for every query (1) or just report what's in the foldseek output (2).
For (1) Best_HIT is there to store queryids that have a foldseek hit. Then I was planning to intersect the two sets and for query_ids where we don't have a Foldseek hit we output 'NO_HIT' or something like that.
I could finish implementing (1) or remove it to keep it like (2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, only just seen this response. My original question was because this set was only keeping track of the unique query names, rather than keeping track of the best match for each query (presumably the foldseek results can have multiple targets for multiple queries).

sillitoe and others added 25 commits November 2, 2022 17:32
Conflicts:
	.gitignore
	cath_alphaflow/cli.py
	cath_alphaflow/io_utils.py
Add possibility to do symlinking for a querydb
to cath_alphaflow.models.domains
Remove temp files using shutil, removed unecessary loop
Cleaned leftover labels from chopping in click
Introduced default for AF_version
Introduce safeguard to delete af_tmp_dir only if present,
defaults to none.
Add comments
Introduce filters
Require target databases and raw files
Expand click options on coverage and aligner
Remove temporary files
Unlink tmp file if present
Introduce FS_BINARY from config or env
Merge branch 'main' into foldseek
protein_letters_3to1
to avoid warning about deprecation
warning
@bordin89 bordin89 requested a review from sillitoe March 15, 2023 10:12
Copy link
Contributor

@sillitoe sillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the use of subprocess.call(...) to subprocess.run(..., check=True) before you merge.

@bordin89 bordin89 merged commit dc06740 into main Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants