- Message (Prompt) template design.
- Message templates (
code/prompt_template
). - Prompt templates (
code/prompt_template
).
- Message templates (
- Caption generation. (Optional)
- We apply pre-trained generative model to generate captions for molecules (smiles2caption).
- Generated captions are saved in
input/caption
.
- Description generation. (Optional)
- We generate descriptions of molecule graph structure accompanied by atom features (rdkit.Chem).
- Generated descriptions are saved in
input/description
.
- Predictor Message generation.
- Generated Predictor messages are saved in
input/message
. - Generated prompts are saved in
input/prompt
. llm.template = IF(D)
: ask LLM to provide useful features for the task, (with given description).llm.template = IP(D)
: ask LLM to make predictions for the task, (with given description).llm.template = IE(D)
: ask LLM to make predictions for the task and provide explanations, (with given description).llm.template = FS(D)-3
: given 3 example knowledge, ask LLM to make predictions for the task and provide explanations, (with given description).
- Generated Predictor messages are saved in
- Query LLMs for response.
- Considering the consistency and popularity, we use ChatGPT for now.
- Make predictions based on LMs.
- Generated embeddings are saved as
output/prt_lms/ogbg-molbace/IF/microsoft/deberta-base-seed42.emb
- Generated embeddings are saved as
- Make predictions based on GNNs.
- Generated predictions are saved as
output/gnns/ogbg-molbace/gin-v-raw-seed42/predictions.pt
- Generated predictions are saved as
Notebook files in code/notebook
are for demo test.
Example:
python -m code.generate_caption dataset ogbg-molbace demo_test True device 0
Example:
python -m code.generate_description dataset ogbg-molbace demo_test True
python -m code.generate_compressor_message dataset ogbg-molbace demo_test True
python -m code.query_chatgpt dataset ogbg-molbace llm.template compress_des demo_test True
We no longer use it since generated prompts differ from SOTA messages for ChatGPT. We use Message (below) instead.
Example:
python -m code.generate_prompt dataset ogbg-molbace demo_test True
Example:
python -m code.generate_predictor_message dataset ogbg-molbace demo_test True
Example:
python -m code.query_chatgpt dataset ogbg-molbace llm.template IF demo_test True
WANDB_DISABLED=True TOKENIZERS_PARALLELISM=False CUDA_VISIBLE_DEVICES=0,1 python -m code.mainLM dataset ogbg-molbace data.text raw seed 42
python -m code.mainGNN dataset ogbg-molbace data.feature IF seed 42