Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor matching code #14

Open
Yoshanuikabundi opened this issue Jan 23, 2025 · 0 comments
Open

Refactor matching code #14

Yoshanuikabundi opened this issue Jan 23, 2025 · 0 comments
Assignees

Comments

@Yoshanuikabundi
Copy link
Collaborator

Should do this post-MVP. It'll make adding features and maintenance easier.

The basic abstract process of matching is:

  1. Loop over all residues and find name-based matches from database
  2. Loop over name-based matches and filter for connectivity
  3. Check any unmatched residues against unknown-molecules

At the moment, this is done like so:

get_all_matches
    loop over residue_indices
        Find name-based matches
    loop over name-based matches
        Find connectivity matches
    loop over connectivity matches
        find and yield crosslink matches
 
topology_from_pdb
    <parse PDB file>
    loop over get_all_matches
        check unmatched residues against unknown-molecules
        raise if residue is unmatched
        <build up topology with _add_to_molecule>
    <finalize and return>

This makes get_all_matches really big, and means the code jumps around a lot. Instead, we should do something like

get_name_based_matches
    loop over residue_indices
        loop over residue_definitions
            yield subset_matches_residue

filter_on_connectivity
    loop over input matches with neighbours
        yield connectivity matches

filter_on_crosslinks
    loop over input matches (or CONECT records?)
        yield crosslink matches

match_residues
    get_name_based_matches
    filter_on_connectivity
    filter_on_crosslinks
    If no matches remain:
        match_unknown_molecules
    <Other fallbacks to be implemented in future>


build_up_topology
    loop over input matched residues
        <build up topology with _add_to_molecule>

topology_from_pdb
    <parse PDB file>
    loop over match_residues
        collect match failures or degeneracies
    raise if any residues were unmatched or multiply matched
    [we now have a complete description of the PDB]
    build_up_topology
    <finalize and return>

That refactor will make maintaining, testing, expanding etc the code much easier. To support it, we could define a protocol/ABC ("trait" in RustSpeak) for residue matches. The existing ResidueMatch class would implement this protocol, and we would add a new class:

  • MoleculeMatch for unique_molecules
    Further classes could be added in future for additional substructures, bond inference, etc. _add_to_molecule would take the base class as input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant