Skip to content

TagGPT is a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion.

Notifications You must be signed in to change notification settings

viteski/TagGPT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TagGPT

TagGPT is a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion.

Paper Link: TagGPT: Large Language Models are Zero-shot Multimodal Taggers

Open in Spaces

🔧 Dependencies

  • Python >= 3.7
  • PyTorch == 2.0.0
  • transformers==4.27.4
pip install -r requirements.txt

💻 How to use TagGPT

Step 1: Tagging system construction

You need a batch of data to build your tagging system. Here, we can use the Kuaishou open source data, which you can download here (password: ihc2).

First, you can place the data in the './data/' folder and format it with the following command.

python ./scripts/main.py --data_path ./data/222k_kw.ft --func data_format

Then, you can use the following command to generate candidate tags based on LLMs.

python ./scripts/main.py --data_path ./data/sentences.txt --func tag_gen --openai_key "put your own key here" --gen_feq 5

Next, the tagging system can be obtained by post-processing.

python ./scripts/main.py --data_path ./data/tag_gen.txt --func posterior_process

Step 2: Data tagging

TagGPT can assign tags to the given samples based on the built tagging system, and you can adapt your data to what './data/examples.csv looks like.

And TagGPT provides two different tagging paradigms:

  1. Generative tagger
python main.py --data_path ../data/examples.csv --tag_path ../data/final_tags.csv --func selective_tagger --openai_key "put your own key here"
  1. Selective tagger
python main.py --data_path ../data/examples.csv --tag_path ../data/final_tags.csv --func generative_tagger --openai_key "put your own key here"

🤗 Acknowledgements

We appreciate the open source of the following projects: Kuaishou, Hugging Face, LangChain.

📧 Contact Information

For help or issues using the TagGPT, please submit a GitHub issue.

For other communications, please contact Chen Li palchenli@tencent.com or Yixiao Ge yixiaoge@tencent.com.

About

TagGPT is a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%