Foldseek #15

bordin89 · 2022-10-27T12:15:06Z

No description provided.

using symlinks.

and convert fs to summary.

to summary

Use iterators

cath_alphaflow/commands/convert_cif_to_foldseek_db.py

sillitoe · 2022-10-31T13:01:05Z

cath_alphaflow/commands/convert_foldseek_output_to_summary.py

+        unique_af_ids.add(file_stub)
+    foldseek_reader = get_foldseek_reader(fs_input_file)
+    for foldseek_result_as_dict in foldseek_reader:
+        result = FoldseekSummary(**foldseek_result_as_dict)


You could move this link into foldseek_reader (so that it yields a FoldseekSummary object rather than dict)

Tried changing that, but it breaks the code. I am still haven't understood how to pass from DictReader to a mapping.

sillitoe · 2022-10-31T14:36:21Z

cath_alphaflow/commands/convert_foldseek_output_to_summary.py

+            and int(result.bits) >= DEFAULT_FS_BITS_CUTOFF
+            and result.query in unique_af_ids
+        ):
+            best_hits.add(result.query)


How is best_hits used?

So, I am wondering if we should output something for every query (1) or just report what's in the foldseek output (2).
For (1) Best_HIT is there to store queryids that have a foldseek hit. Then I was planning to intersect the two sets and for query_ids where we don't have a Foldseek hit we output 'NO_HIT' or something like that.
I could finish implementing (1) or remove it to keep it like (2).

Sorry, only just seen this response. My original question was because this set was only keeping track of the unique query names, rather than keeping track of the best match for each query (presumably the foldseek results can have multiple targets for multiple queries).

cath_alphaflow/commands/run_foldseek.py

Use native os.symlink Check if symlink exists before creating it.

with check=True

…nto foldseek

Conflicts: .gitignore cath_alphaflow/cli.py cath_alphaflow/io_utils.py

Add possibility to do symlinking for a querydb

to cath_alphaflow.models.domains

Remove temp files using shutil, removed unecessary loop Cleaned leftover labels from chopping in click Introduced default for AF_version Introduce safeguard to delete af_tmp_dir only if present, defaults to none.

Add comments Introduce filters

Require target databases and raw files Expand click options on coverage and aligner Remove temporary files

Unlink tmp file if present Introduce FS_BINARY from config or env

Merge branch 'main' into foldseek

protein_letters_3to1 to avoid warning about deprecation warning

sillitoe

Please update the use of subprocess.call(...) to subprocess.run(..., check=True) before you merge.

bordin89 added 15 commits October 17, 2022 14:33

Add CLI option for convert_cif_to_foldseek_db

1f8e46b

Add FS paths to config.env.example

776c444

Add defults for Foldseek suffixes

e09985c

Add Foldseek paths to settings.py

a2fb1e5

Create module to generate Foldseek query database

3d91ade

using symlinks.

create run foldseek module

0127853

added settings.json to gitignore

117344b

add run foldseek, convert cif to db

ba0883d

and convert fs to summary.

add foldseek constants

1c794a9

remove default for intermediate suffix

8987027

add module to convert foldseek output

f373ed0

to summary

Add Foldseek Summary

7b4e0a3

Add Foldseek Summary Writer

c317c11

remove duplicate declaration

0bfb1a8

Revisited foldseek parser

ec83189

bordin89 added the draft Draft, do not merge label Oct 27, 2022

bordin89 requested a review from sillitoe October 27, 2022 12:15

bordin89 added 3 commits October 27, 2022 14:59

Include filters for overlap and bits

8dbc269

Use iterators

Create new FoldseekReader

009e067

Add query to FoldseekSummary

c92572e

bordin89 removed the draft Draft, do not merge label Oct 27, 2022

sillitoe reviewed Oct 31, 2022

View reviewed changes

cath_alphaflow/commands/convert_cif_to_foldseek_db.py Outdated Show resolved Hide resolved

sillitoe reviewed Oct 31, 2022

View reviewed changes

cath_alphaflow/commands/convert_cif_to_foldseek_db.py Outdated Show resolved Hide resolved

sillitoe reviewed Oct 31, 2022

View reviewed changes

cath_alphaflow/commands/convert_cif_to_foldseek_db.py Outdated Show resolved Hide resolved

sillitoe reviewed Oct 31, 2022

View reviewed changes

cath_alphaflow/commands/run_foldseek.py Outdated Show resolved Hide resolved

bordin89 added 3 commits November 2, 2022 11:06

Replace subprocess.call with subprocess.run

6ada0ef

Use native os.symlink Check if symlink exists before creating it.

Clean to return fs_querydb_path.exists

59508d9

Change subprocess.call to subproces.run

3dfbded

with check=True

sillitoe and others added 25 commits November 2, 2022 17:32

update description

53d87e6

Merge branch 'foldseek' of github.com:UCLOrengoGroup/cath-alphaflow i…

e556909

…nto foldseek

tidy up path calculations

783e3c7

add check for expected output file

94aa5a9

clarify usage of foldseek path

d0193e0

provide defaults for foldseek settings

d18955e

add foldseek tests

1aabd7c

add create_cli_runner

81bbe1c

install foldseek

80888e0

correct path

8bd009c

simplify path

6ee75c2

Merge branch 'main' into foldseek

478b9e9

Conflicts: .gitignore cath_alphaflow/cli.py cath_alphaflow/io_utils.py

Add DEFAULT_FS_QUERYDB_NAME to constants

45cbd3e

Add option to generate query db for set of files

c8aacbe

Add possibility to do symlinking for a querydb

Switch from cath_alphaflow.domains

8d40098

to cath_alphaflow.models.domains

Fix typo

89077db

Merge branch 'main' into foldseek

b4467e2

Add Foldseek overlap to settings

6dd1795

Replace tmp_dir with TemporaryDirectory

21d301b

Remove temp files using shutil, removed unecessary loop Cleaned leftover labels from chopping in click Introduced default for AF_version Introduce safeguard to delete af_tmp_dir only if present, defaults to none.

Switch from file_stub to id_type based reader

53dd10c

Add comments Introduce filters

Introduce defaults for coverage and aligner

7b426a0

Require target databases and raw files Expand click options on coverage and aligner Remove temporary files

Add classmethod from_foldseek_query

030ff32

Fix test

ad5f61a

Unlink tmp file if present Introduce FS_BINARY from config or env

Update main branch for files untouched by this repo

2acb3e7

Merge branch 'main' into foldseek

Switch from three_to_one to

ab35109

protein_letters_3to1 to avoid warning about deprecation warning

bordin89 requested a review from sillitoe March 15, 2023 10:12

Remove three_to_one

c062973

sillitoe approved these changes Mar 15, 2023

View reviewed changes

Replace subprocess call with subprocess run

f3fbc33

bordin89 merged commit dc06740 into main Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foldseek #15

Foldseek #15

bordin89 commented Oct 27, 2022

sillitoe Oct 31, 2022 •

edited

Loading

bordin89 Nov 2, 2022

sillitoe Oct 31, 2022

bordin89 Nov 2, 2022

sillitoe Nov 14, 2022

sillitoe left a comment

Foldseek #15

Foldseek #15

Conversation

bordin89 commented Oct 27, 2022

sillitoe Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

bordin89 Nov 2, 2022

Choose a reason for hiding this comment

sillitoe Oct 31, 2022

Choose a reason for hiding this comment

bordin89 Nov 2, 2022

Choose a reason for hiding this comment

sillitoe Nov 14, 2022

Choose a reason for hiding this comment

sillitoe left a comment

Choose a reason for hiding this comment

sillitoe Oct 31, 2022 •

edited

Loading