Monomerizer

Monomerizer (or SMILES2Seq, #SMILES2FASTA) is a pipeline that converts peptides and peptidomimetics, represented as SMILES (chemical formulae), into sequences of amino acids and terminal modifications.

For more information, visit our paper: Coming soon?🙏.

To use the output data to finetune our foundation language model for peptidomimetics, visit: GPepT

Usage

To run a Monomerizer demo, use the following command:

python3 run_pipeline.py --input_file demo/example_smiles.txt

By default, results will be saved to the output/<datetime> directory. The raw directory contains the raw result, and the standard directory contains the sequences after standardizing them to the standard dictionary accepted by GPepT.
Replace demo/example_smiles.txt with the path to your input file containing SMILES strings. (The input file must follow the format of the example files in the demo directory.)

Optional arguments

--output_dir <path>
--min_amino_acids <int>: Minimum number of amino acids required for processing. Default is 3.
--batch_size <int>: Number of SMILES to process in each batch. Default is 100.
--max_workers <int>: Maximum number of parallel workers. Default is the number of available CPU cores.
-draw: Draws output file like this.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
demo		demo
src		src
README.md		README.md
TOC.png		TOC.png
dictionary.txt		dictionary.txt
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monomerizer

Usage

Optional arguments

About

Releases

Packages

Languages

tsudalab/Monomerizer

Folders and files

Latest commit

History

Repository files navigation

Monomerizer

Usage

Optional arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages