Introduction

Source code for GraphText: Graph Reasoning in Text Space.

Steps to reproduce

Python Environment

pip install -r requirements.txt

Key Hyper-parameters

Given an ego-graph, GraphText extracts text information (attributes) and relation information to construct a tree.

The node text attributes, denoted as text_info, is a set of attributes derived from the (ego)graph, the valid items to compose the set are:

choice: The label of the node, in terms of the choice; e.g. in Cora, "D" is for the class of "Neural Network". Note that, if the node is not in the training set, the choice will be "NA".
a{k}x_t: The K-Means clustering index of original feature propagated $k$ times $k>=0$. To illustrate: a0x_t means the K-Means clustering index of the raw feature, a2x_t means the K-Means clustering index of the feature propagated 2 times.
a{k}y_t: The choice of training labels propagated $k$ times $k>=1$.

The relations, denoted as rel_info, is a set of attributes derived from the (ego)graph, the valid items to compose the set are:

choice: The label of the node, in terms of the choice; e.g. in Cora, "D" is for the class of "Neural Network". Note that, if the node is not in the training set, the choice will be "NA".
a{k}x_t: The K-Means clustering index of original feature propagated $k$ times $k>=0$. To illustrate: a0x_t means the K-Means clustering index of the raw feature, a2x_t means the K-Means clustering index of the feature propagated 2 times.
a{k}y_t: The choice of training labels propagated $k$ times $k>=1$.

Commands

Setup OPENAI-API-Key

Make sure to set the openai api key to environment variable before running ICL experiments. You can set it up by export OPENAI_API_KEY="YourOwnAPIKey", or changing the configs/main.yaml for convenience:

env:
  vars:
    openai_api_key: ${oc.env:OPENAI_API_KEY,YourAPIKey} # Overwrite this to your API key

In-context Learning

Original Split

export OPENAI_API_KEY="YourOwnAPIKey"
cd src/scripts
python run_icl.py data=cora text_info=a2y_t.a3y_t rel_info=spd0.ppr.a2x_sim.a3x_sim 
python run_icl.py data=citeseer text_info=a3y_t.a0x_t rel_info=spd0.spd2.ppr.a2x_sim 
python run_icl.py data=texas text_info=a2y_t.a3y_t rel_info=spd2 
python run_icl.py data=wisconsin text_info=choice.a0x_t rel_info=a0x_sim.spd3
python run_icl.py data=cornell text_info=a1y_t.a4y_t rel_info=spd1.a3x_sim

Few-Shot Node Classification

export OPENAI_API_KEY="YourOwnAPIKey"
cd src/scripts
python run_icl.py data=citeseer data.n_shots=1 text_info=a0x_t.a3y_t rel_info=spd0.spd3
python run_icl.py data=citeseer data.n_shots=3 text_info=a0x_t.a3y_t rel_info=spd0.spd3.a2x_sim.a3x_sim
python run_icl.py data=citeseer data.n_shots=5 text_info=a0x_t.a3y_t rel_info=spd0.spd3.ppr.a3x_sim
python run_icl.py data=citeseer data.n_shots=10 text_info=a0x_t.a3y_t rel_info=spd0.a0x_sim.a1x_sim
python run_icl.py data=citeseer data.n_shots=15 text_info=a0x_t.a3y_t rel_info=spd0.a0x_sim.a1x_sim
python run_icl.py data=citeseer data.n_shots=20 text_info=a0x_t.a3y_t rel_info=spd0.spd3.a2x_sim.a3x_sim

python run_icl.py data=texas data.n_shots=1 text_info=a2y_t rel_info=spd0.spd2
python run_icl.py data=texas data.n_shots=3 text_info=choice rel_info=spd3
python run_icl.py data=texas data.n_shots=5 text_info=a2y_t rel_info=spd0.spd2
python run_icl.py data=texas data.n_shots=10 text_info=choice rel_info=spd2
python run_icl.py data=texas data.n_shots=15 text_info=choice rel_info=spd2
python run_icl.py data=texas data.n_shots=20 text_info=choice rel_info=spd2

Supervised Fine-tuning (SFT)

GraphText supports instruction fine-tuning a LLM on graph. An MLP is used to map the continuous feature to text space (as tokens). We recommend to use BF16 for stable training.

cd src/scripts
python run_sft.py exp=sft lora.r=-1 run_sft.py data=citeseer_tag nb_padding=false add_label_name_output=false max_bsz_per_gpu=4 eq_batch_size=16 rel_info=spd0.a0x_sim.ppr text_info=x llm.base_model=llama2-7b node_dropout=0 subgraph_size=3 total_steps=1000

python run_sft.py exp=sft lora.r=-1 run_sft.py data=cora_tag nb_padding=false add_label_name_output=false max_bsz_per_gpu=4 eq_batch_size=16 rel_info=spd0.a1x_sim text_info=x llm.base_model=llama2-7b node_dropout=0 subgraph_size=3 total_steps=1000

Misc

Analyze the Results

We highly recommend using Wandb to track the metrics. All the results are saved to an Excel file "${out_dir}{split}-${alias}.csv" with prompt and the generated text.

Other Useful Parameters

data.n_shots: Number of shots for few-shot settings.
debug: Specify debug=true for a fake/small LLM in ICL/SFT to debug (to save time and money when developing).
data.max_train_samples, data.max_eval_samples, data.max_test_samples: Number of samples for train/eval/test.
use_wandb: use_wandb=true use_wandb=false to turn on/off Wandb sync.
lora.r: Specifies the rank for LoRA (used in SFT experiments only), if `lora.r'<0, then, LoRA is turned off (only the projection layer is trained).

FAQ

GPT initialize failed

Error message: Error locating target 'llm.gpt.GPT', set env var HYDRA_FULL_ERROR=1 to see chained exception. Checklist:

Check if openai is installed.
Check if OPENAI_API_KEY is in your environment variable. Make sure to export OPENAI_API_KEY="YourOwnAPIKey before running the code.

Citation

If you find our work useful, please consider citing our work:

@misc{zhao2023graphtext,
      title={GraphText: Graph Reasoning in Text Space}, 
      author={Jianan Zhao and Le Zhuo and Yikang Shen and Meng Qu and Kai Liu and Michael Bronstein and Zhaocheng Zhu and Jian Tang},
      year={2023},
      eprint={2310.01089},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Steps to reproduce

Python Environment

Key Hyper-parameters

Commands

Setup OPENAI-API-Key

In-context Learning

Original Split

Few-Shot Node Classification

Supervised Fine-tuning (SFT)

Misc

Analyze the Results

Other Useful Parameters

FAQ

GPT initialize failed

Citation

About

Releases

Packages

Languages

AndyJZhao/GraphText

Folders and files

Latest commit

History

Repository files navigation

Introduction

Steps to reproduce

Python Environment

Key Hyper-parameters

Commands

Setup OPENAI-API-Key

In-context Learning

Original Split

Few-Shot Node Classification

Supervised Fine-tuning (SFT)

Misc

Analyze the Results

Other Useful Parameters

FAQ

GPT initialize failed

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages