This project enables the discovery and design of novel bioinsecticides targeting specific proteins. It features tools for predicting toxicity, generating bioinsecticides, and obtaining 3D structures of the designed molecules. By leveraging neural networks for toxicity prediction and bioinsecticide generation, combined with genetic algorithms for design refinement, this project enhances the efficiency and specificity of bioinsecticide development.
-
FASTA Sequence: Ensure your target protein is in FASTA format (amino acid sequence).
-
Neural Networks: You need two neural networks:
- Toxicity Prediction: Use
cnn_affinity.py
to train or utilize the pre-trained model. - Bioinsecticide Generation: Use
generate_rnn.py
to train or utilize the pre-trained model.
Alternatively, use the pre-trained models located in the
definitive_models
folder. - Toxicity Prediction: Use
-
Data: Use data from databases such as Chembl, PubChem, or the provided "insect.csv".
To predict toxicity using the CNN model, run:
python check_affinity.py --model_path <path_to_model> --data_path <path_to_data> --target_path <path_to_target_protein>
The program will return the toxicity of the designed bioinsecticides using the 'calculate_affinity' function.
To generate bioinsecticides using the RNN model, run:
python pretrained_rnn.py --model_path <path_to_model> --data_path <path_to_data> --target_path <path_to_target_protein>
The program will return the designed bioinsecticides using the 'generate' function.
For combining both models (generation and toxicity prediction), use:
python affinity_with_target_and_generator.py --model_path <path_to_model> --data_path <path_to_data> --target_path <path_to_target_protein> --toxicity_limit <toxicity_limit> --output_path <path_to_output>
The program will generate bioinsecticides and filter out those exceeding the specified toxicity limit. You can also specify a path to check generated molecules.
To use the genetic algorithm, run:
python genetic_algorithm.py --smiles_list <smiles_list> --csv_file <path_to_csv_file> --rnn_model <path_to_rnn_model> --model_path <path_to_model> --generations <number_of_generations> --output_path <path_to_output>
You can provide SMILES sequences directly, via a CSV file, or use an RNN model to guide the generation. The program will return the best SMILES sequence from the last generation.
To obtain the 3D structure of the designed bioinsecticides, run:
python 3d_repr.py --model_path <path_to_model> --data_path <path_to_data> --target_path <path_to_target_protein> --toxicity_limit <toxicity_limit> --output_path <path_to_output>
This will generate an SDF file containing the 3D structure of the bioinsecticides. Use PyMOL to convert the SDF file to other formats (e.g., PDB) using the pymol_3d.py script.
Clone the repository:
git clone https://github.com/RubenVG02/BioinsecticidesDiscovery.git
Or download the latest release:
wget https://github.com/RubenVG02/BioinsecticidesDiscovery/releases/latest
Ensure Python 3.7 or higher is installed. Install the required libraries using:
pip install -r requirements.txt
- Design of new bioinsecticides based on the target protein
- Improving the structure of previously designed bioinsecticides based on the target protein
- Predicting the toxicity of the designed bioinsecticides
- Obtaining CSV files and screenshots of the results
- Obtaining the 3D structure of the designed bioinsecticides in different formats (SFD, PDB, etc.)
- Fast and easy to use
- Add more databases to the CNN
- Add more databases to the RNN
- More complexity to the GA