-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foldseek #15
Conversation
using symlinks.
and convert fs to summary.
to summary
unique_af_ids.add(file_stub) | ||
foldseek_reader = get_foldseek_reader(fs_input_file) | ||
for foldseek_result_as_dict in foldseek_reader: | ||
result = FoldseekSummary(**foldseek_result_as_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this link into foldseek_reader
(so that it yields a FoldseekSummary
object rather than dict
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried changing that, but it breaks the code. I am still haven't understood how to pass from DictReader to a mapping.
and int(result.bits) >= DEFAULT_FS_BITS_CUTOFF | ||
and result.query in unique_af_ids | ||
): | ||
best_hits.add(result.query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is best_hits
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I am wondering if we should output something for every query (1) or just report what's in the foldseek output (2).
For (1) Best_HIT is there to store queryids that have a foldseek hit. Then I was planning to intersect the two sets and for query_ids where we don't have a Foldseek hit we output 'NO_HIT' or something like that.
I could finish implementing (1) or remove it to keep it like (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, only just seen this response. My original question was because this set
was only keeping track of the unique query names, rather than keeping track of the best match for each query (presumably the foldseek results can have multiple targets for multiple queries).
Use native os.symlink Check if symlink exists before creating it.
with check=True
Conflicts: .gitignore cath_alphaflow/cli.py cath_alphaflow/io_utils.py
Add possibility to do symlinking for a querydb
to cath_alphaflow.models.domains
Remove temp files using shutil, removed unecessary loop Cleaned leftover labels from chopping in click Introduced default for AF_version Introduce safeguard to delete af_tmp_dir only if present, defaults to none.
Add comments Introduce filters
Require target databases and raw files Expand click options on coverage and aligner Remove temporary files
Merge branch 'main' into foldseek
protein_letters_3to1 to avoid warning about deprecation warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the use of subprocess.call(...)
to subprocess.run(..., check=True)
before you merge.
No description provided.