🔍 Explore our models:

📍Table of Contents

📍Table of Contents
📖 Project Introduction
🗺️ Technical Architecture
- 1. Overall Technical Architecture
- 2. Application Workflow
✨ Technical Report
📆 Update Notes
🛠️ Usage Guide
📋 Project Code Structure
☕ Project Members (listed in no particular order)
💖 Special Thanks
License
Star History

📖 Project Introduction

This project, named "The God of Cookery," is inspired by the renowned movie of the same name starring the comedic master, Stephen Chow. The project's goal is to provide cooking advice and recipe recommendations through artificial intelligence technology, helping users to enhance their cooking skills and reduce the barriers to cooking, thereby realizing the movie's message: "With heart, anyone can become a god of cookery." The core concept of this application is based on the InternLM dialogue model, which has been fine-tuned using the XiaChuFang Recipe Corpus, consisting of 1,520,327 Chinese recipes. The model is hosted on ModelScope, and the application is deployed on OpenXLab. Special thanks to the Moda Community for providing free space for model hosting and to OpenXLab for offering the deployment environment and GPU resources. Please note that the answers provided by this application are intended for reference only and should not be considered as actual steps for recipe preparation. Due to the "hallucination" characteristics of large-scale models, some recipes might cause psychological or physiological effects. Users are advised not to take these recipes out of context.

🗺️ Technical Architecture

1. Overall Technical Architecture

The project primarily relies on the open-source model from the Shanghai AI Lab, known as internlm-chat-7b, which includes both first and second generations. We fine-tuned this model on the XiaChuFang Recipe Corpus, which consists of 1,520,327 Chinese recipes. This tuning was facilitated by Xtuner with LoRA fine-tuning, resulting in the creation of the shishen2_full model. Post-tuning, the model was integrated with a vector database into Langchain, achieving an enhanced retrieval effect through RAG (Retrieval-Augmented Generation). It supports multimodal (voice, text, image) question-answering dialogues. The frontend interaction with users is implemented using Streamlit.

2. Application Workflow

Upon receiving a request from a user, the application loads the models (voice model, text-to-image model, fine-tuned dialogue model) and processes the user's text or voice input. If the RAG switch is not activated, it directly calls the fine-tuned dialogue model to generate a reply, formats the result, and uses the stable diffusion model to generate an image, finally returning the result to the user. If the RAG switch is activated, it uses Langchain to search the vector database, inputs the search results into the fine-tuned dialogue model to generate a reply, formats the result, and calls the stable diffusion model to generate an image, ultimately returning the result to the user.

✨ Technical Report

Access the technical report and explanatory videos through the following links:

1. Technical Report

2. Explanatory Video

Section Name	Document Author	Technical Lead
General Overview	zzd2001, chg001, zhanghui-china	zhanghui-china
Voice Recognition	zzd001	sole fish
Text-to-Image	Fang Yuliang	Fang Yuliang
RAG	zzd2001	Charles, Yue Zhengmeng
Model Fine-Tuning	zzd2001	chg001, zzd2001, zhanghui-china
Web UI	Fang Yuliang	Fang Yuliang

📆 Update Notes

Coming Soon...
RAG system based on llama-index and HyQE
Speech output
Support of other LLMs
[2024.4.21] HyQE RAG system with LangChain proposed by team member @Yue Zhengmeng merged to main branch
[2024.3.20] Updated README
[2024.3.19] Integrated documentation into the docs directory
[2024.3.9] Based on the RAG module (faiss) by team member @Yue Zhengmeng , integrated the text2image branch, released the fourth phase of the second-generation application based on OpenXLab A100 Click to try it out and OpenXLab A10 application Click to try it out
[2024.3.4] Added English README
[2024.3.3] Based on the paraformer voice input module by team member @sole fish, integrated the text2image branch, released the third phase of the second-generation application based on OpenXLab A100 ~~Click to try it out(Link deprecated)~~
[2024.2.24] Based on the RAG module (Chroma) by team member @Charles, integrated the text2image branch, released the second phase of the second-generation application based on OpenXLab A100 ~~Click to try it out(Link deprecated)~~
[2024.2.22] Based on the text-to-image module by team member @Fang Yuliang and the whisper voice input module by @sole fish, integrated the text2image branch, released the first phase of the second-generation application(InternLM2-chat-7B as the base model) based on OpenXLab A100 ~~Click to try it out(Link deprecated)~~
[2024.1.30] Released the model and APP finetuned on the whole 1.5 million recipe based on InternLM-chat-7B (Using InternStudio+A100 1/4X2 40G memory for fine-tuning, from 1.25 15:46 to 1.30 12:25, fine-tuning duration was 4 days 20 hours 39 minutes) by team member @zhanghui-china
[2024.1.28] Released the model and APP finetuned on a slice of 1.5 million recipe based on InternLM-chat-7B (Using WSL+Ubuntu22.04+RTX4090 24G memory for fine-tuning, from 1.26 18:40 to 1.28 13:46, fine-tuning duration was 1 day 19 hours 6 minutes) by team member @zhanghui-china

🛠️ Usage Guide

1. Data Set Preparation

Download the 1.5 million XiaChuFang fine-tuning dataset: Download Link (password: 8489)

2. Installation

Set up a Python virtual environment:

conda create -n cook python=3.10 -y
conda activate cook

Clone the repository:

git clone https://github.com/SmartFlowAI/TheGodOfCookery.git
cd ./TheGodOfCookery

Install PyTorch and other dependencies:

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Note: Choose the CUDA version according to your own CUDA installation, typically 11.8 or 12.1.

3. Training

Train the first-generation 7b model using xtuner 0.1.9, fine-tune on internlm-chat-7b.
Train the second-generation 7b model using xtuner 0.1.13, fine-tune on internlm2-chat-7b.
Train the second-generation 1.8b model using xtuner 0.1.15.dev0, fine-tune on internlm2-chat-1.8b.

Fine-tuning method:

xtuner train ${YOUR_CONFIG} --deepspeed deepspeed_zero2

--deepspeed indicates using DeepSpeed to optimize the training process. XTuner integrates several strategies, including ZeRO-1, ZeRO-2, and ZeRO-3. If you wish to disable this feature, simply remove this parameter.

Convert the saved .pth model (if using DeepSpeed, this will be a directory) into a LoRA model:

export MKL_SERVICE_FORCE_INTEL=1
xtuner convert pth_to_hf ${YOUR_CONFIG} ${PTH} ${LoRA_PATH}

Merge the LoRA model into the HuggingFace model:

xtuner convert merge ${Base_PATH} ${LoRA_PATH} ${SAVE_PATH}

4. Dialogue

xtuner chat ${SAVE_PATH} [optional arguments]

Arguments:

--prompt-template: Use 'internlm_chat' for the first-generation model and 'internlm2_chat' for the second-generation model.
--system: Specify the dialogue system identifier.
--bits {4,8,None}: Specify the LLM's bit rate. Default is fp16.
--no-streamer: If you want to remove the streamer.
--top: For second-generation models, a recommendation of 0.8.
--temperature: For second-generation models, a recommendation of 0.8.
--repetition-penalty: For the second-generation 7b model, recommended 1.002; for the 1.8b model, recommended 1.17; for the first-generation model, no need to specify.
For more information, execute xtuner chat -h to view.

5. Demonstration

Two-phase dialogue effects (text + image dialogue):

Demo access addresses: A100 A10

First-phase dialogue effects (text-only dialogue):

Demo examples

6. Model Addresses

ModelScope First-Generation 7b Model
ModelScope Second-Generation 7b Model
ModelScope Second-Generation 1.8b Model
OpenXLab First-Generation 7b Model
OpenXLab Second-Generation 7b Model

Example code for model interaction:

import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM
from tools.transformers.interface import GenerationConfig, generate_interactive

# Relative path on ModelScope, for example, the path for the second-generation fine-tuned model would be zhanghuiATchina/zhangxiaobai_shishen2_full

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')
model = model.eval()

messages = []
generation_config = GenerationConfig(max_length=max_length, top_p=0.8, temperature=0.8, repetition_penalty=1.002)

response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
response, history = model.chat(tokenizer, "How to make sour and spicy fish", history=history)
print(response)

7. Practice Documentation

First-Generation Phase One Practice
Second-Generation Phase One Practice

8. Demo Video

Phase One Practice Video
Competition Video

📋 Project Code Structure

Second Phase

Project Directory
|---assets  # Image directory, generated assets are also temporarily stored here, planning to move to other directories in the future
|     |---robot.png                                        # Dialogue robot icon
|     |---user.png                                         # Dialogue user icon
|     |---shishen.png                                      # Project icon (Main contributor: Liu Guanglei)
|
|---config   # Configuration file directory (Main contributor: Fang Yuliang)
|     |---__init__.py                                      # Initialization script
|     |---config.py                                        # Configuration script
|
|---docs   # Documentation directory
|     |---tech_report.md                                   # Technical report
|     |---Introduce_x.x.pdf                                # Project introduction PPT
|
|---eval   # RAG module evaluation directory
|
|---food_icon          # Ingredient icon directory
|     |---*.png                                            # Icons for various ingredients
|
|---gen_image    # Text-to-Image directory (Main contributor: Fang Yuliang)
|     |---__init__.py                                      # Initialization script
|     |---sd_gen_image.py                                  # Text-to-Image module using Stable Diffusion
|     |---zhipu_ai_image.py                                # Text-to-Image module using Zhipu AI
|
|---images   # Cache images generated by the text-to-image model
|
|---rag_langchain   # Second-generation RAG code directory (Main contributor: Yue Zhengmeng)
|     |---chroma_db                                        # Chroma database directory
|     |     |- chroma.sqlite3                              # Chroma database file
|     |---data                                             # Directory of Recipe Datasets
|     |     |- tran_dataset_1000.json                      # Test Recipe Datasets with only 1000 data
|     |---faiss_index                                      # FAISS database directory
|     |     |- index.faiss   
|     |     |- index.pkl
|     |---retrieve                                         # retrieve save directory
|     |     |- bm25retriever.pkl                           # Serialized saved BM25retrieve
|     |---CookMasterLLM.py                                 # LLM packaged by langchain
|     |---create_db_json.py                                # Create vector database script
|     |---HyQEContextualCompressionRetriever.py            # HyQE retriever
|     |---interface.py                                     # RAG module interface
|     |---README.md                                        # RAG module description
|
|---speech    # Paraformer voice recognition directory (Main contributor: solo fish)
|     |---__init__.py                                      # Initialization script
|     |---utils.py                                         # Voice recognition processing script
|
|---app.py                                                 # Web Demo main script
|---cli_demo.py                                            # Model testing script
|---convert_t2s.py                                         # Traditional to Simplified Chinese conversion tool (Main contributor: Bin Bin)
|---download.py                                            # Model download script
|---parse_cur_response.py                                  # Output formatting tool (Main contributor: Bin Bin)
|---start.py                                               # streamlit start script
|---web_demo.py                                            # Web Demo start script
|---requirements.txt                                       # System dependency packages (please use pip install -r requirements.txt for installation)
|---README.md                                              # This document

☕ Project Members (listed in no particular order)

Name	Organization	Contribution	Remarks
Zhang Xiaobai	Graduated from Nanjing University, Data Engineer at a company	Project planning, testing, and miscellaneous tasks	Huawei Cloud HCDE (formerly Huawei Cloud MVP), Top 10 Huawei Cloud Community Bloggers in 2020, Outstanding Ascend Community Developer in 2022, Outstanding Huawei Cloud Community Moderator in 2022, MindSpore Evangelist, Excellent DataWhale Learner
sole fish	PhD student at the University of Chinese Academy of Sciences	Voice input module
Charles	Bachelor's degree from Tongji University, currently applying for master's	RAG module (based on Chroma) for the first generation
Yue Zhengmeng	Bachelor's degree from Shanghai Ocean University, currently applying for master's	RAG module (based on faiss & Chroma) for the second generation
Bin Bin	Bachelor's degree from East China Normal University, Algorithm Developer at a company	Formatting output
Fang Yuliang	Graduated from Nanjing University, Algorithm Engineer at a company	Text-to-Image module, configuration tools
Liu Guanglei	-	Icon design, frontend optimization
Xuanyuan	Master's student at Nanjing University	Document preparation, dataset, model fine-tuning
Hong Cheng	Major maintainer of minisora	Resource integration and suggestions on future development
usamimeri	Undergraduate in Xiamen University	First steps of llama-index framework

💖 Special Thanks

We would like to extend our gratitude to the Shanghai Artificial Intelligence Laboratory for organizing the Shusheng·Puyu Practical Camp event~~~

We are deeply grateful for the computational support provided by OpenXLab for project deployment~~~

A heartfelt thank you to Puyu Assistant for their support of the project~~~

License

This project is licensed under the Apache License 2.0.

Star History

[](

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

📍Table of Contents

📖 Project Introduction

🗺️ Technical Architecture

1. Overall Technical Architecture

2. Application Workflow

✨ Technical Report

📆 Update Notes

🛠️ Usage Guide

1. Data Set Preparation

2. Installation

3. Training

4. Dialogue

5. Demonstration

6. Model Addresses

7. Practice Documentation

8. Demo Video

📋 Project Code Structure

☕ Project Members (listed in no particular order)

💖 Special Thanks

License

Star History

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

📍Table of Contents

📖 Project Introduction

🗺️ Technical Architecture

1. Overall Technical Architecture

2. Application Workflow

✨ Technical Report

📆 Update Notes

🛠️ Usage Guide

1. Data Set Preparation

2. Installation

3. Training

4. Dialogue

5. Demonstration

6. Model Addresses

7. Practice Documentation

8. Demo Video

📋 Project Code Structure

☕ Project Members (listed in no particular order)

💖 Special Thanks

License

Star History