FlagAI-Open · marscrazy · May 28, 2022 · May 21, 2022 · May 21, 2022 · May 23, 2022
diff --git a/CLA.md b/CLA.md
@@ -0,0 +1,89 @@
+# The Contributor License Agreement
+
+The [Cloud Native Computing Foundation](https://www.cncf.io) (CNCF) defines
+the legal status of the contributed code in two different types of _Contributor License Agreements_
+(CLAs), [individual contributors](https://github.com/cncf/cla/blob/master/individual-cla.pdf) and [corporations](https://github.com/cncf/cla/blob/master/corporate-cla.pdf).
+
+FlagAI can only accept original source code from CLA signatories.
+
+
+It is important to read and understand this legal agreement.
+
+## How do I sign?
+
+After creating your first Pull Request the linux-foundation-easycla bot will respond with information regarding your CLA status along with a link to sign the CLA.
+
+<img width="1065" alt="EasyCLA bot" src="https://user-images.githubusercontent.com/69111235/152226443-f6fe61ee-0e92-46c5-b6ea-c0deb718a585.png">
+
+#### 1. Authorize EasyCLA to read some of your GitHub information
+
+<img width="554" alt="GitHub EasyCLA Authorization" src="https://user-images.githubusercontent.com/69111235/152228712-7d22f9d0-9f3c-4226-9ee0-bacba4b47725.png">
+
+Click on the "Please click here to be authorized" link to navigate to the GitHub Authorize Linux Foundation: EasyCLA page. Then click Authorize LF-Engineering to give the Linux Foundation read-only access to list the email addresses associated with your GitHub account.
+
+#### 2. Select from the two types of contributor
+
+<img width="1407" alt="EasyCLA" src="https://user-images.githubusercontent.com/69111235/152224818-1246453a-b086-4a57-9d14-c10d62ad438f.png">
+
+
+After authorizing EasyCLA, you will be redirected to a page to identify which type of contributor you are. 
+Select the most appropriate option:
+  * Individual Contributor: You are contributing as yourself, and not as part of another organization.
+  * Corporate Contributor: You are contributing on behalf of your employer or other organization.
+
+#### 3. Sign the CLA
+
+Once you select the type of contributor, proceed to Sign the CLA and follow the instructions to complete the signing process through DocuSign.
+
+**Ensure your GitHub e-mail address matches e-mail address used to sign CLA**
+
+After you have filled out  the information, Click "Finish" and you will be redirected back to your Pull Request.
+
+#### 4. Look for an email indicating successful signup.
+
+> Hello,
+> 
+> This is a notification email from EasyCLA regarding the project Cloud Native Computing > Foundation (CNCF).
+> 
+> The CLA has now been signed. You can download the signed CLA as a PDF here.
+> 
+> If you need help or have questions about EasyCLA, you can read the documentation or reach out to us for support.
+> 
+> Thanks,
+> EasyCLA Support Team
+
+
+
+#### 5. Validate your CLA
+
+Once you are redirected back to your GitHub Pull Request, reply with a comment `/easycla` to update the CLA status of your PR.
+
+
+## Changing your Affiliation
+
+If you've changed employers and still contribute to Kubernetes, your affiliation
+needs to be updated. The Cloud Native Computing Foundation uses [gitdm](https://github.com/cncf/gitdm)
+to track who is contributing and from where. Create a pull request on the [gitdm](https://github.com/cncf/gitdm)
+repository with a change to the corresponding developer affiliation text file.
+Your entry should look similar to this:
+
+```
+Jorge O. Castro*: jorge!heptio.com, jorge!ubuntu.com, jorge.castro!gmail.com
+Heptio
+Canonical until 2017-03-31
+```
+
+## Troubleshooting
+
+If you encounter any problems signing the CLA and need further assistance, log a ticket by clicking on the link [please submit a support request ticket](https://jira.linuxfoundation.org/plugins/servlet/theme/portal/4) from the EasyCLA bot's response. Someone from the CNCF will respond to your ticket to help.
+
+Should you have any issues using the LF Support Site, send a message to the
+backup e-mail support address <login-issues@jira.linuxfoundation.org>
+
+## Setting up the CNCF CLA check
+
+If you are a Kubernetes GitHub organization or repo owner and would like to setup
+the Linux Foundation CNCF CLA check for your repositories, [read the docs on setting up the CNCF CLA check](/github-management/setting-up-cla-check.md)
+
+
+[Linux Foundation Support Site]: https://support.linuxfoundation.org/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -8,6 +8,9 @@ side, please stick to the following process:
 3. If we decide your concern needs code changes, we would be happy to accept a pull request. Please consider the
 commit guidelines below.
 
+## Sign the CLA
+
+Before you can contribute to FlagAI, you will need to sign the [Contributor License Agreement](CLA.md).
 
 ## Git Commit Guidelines
 
@@ -34,17 +37,13 @@ pip install -r requirements.txt
 ```
 
 ### tests
-
-
-
-To run all basic tests execute:
-```bash
-python test.py
+Install `pytest` for testing
 ```
-
-To check the test results in
+pip install pytest
 ```
-tests/test_report
+To run all basic tests execute:
+```bash
+pytest
 ```
 
 ### code formatting

diff --git a/README.md b/README.md
@@ -8,16 +8,16 @@
 
 FlagAI (Fast LArge-scale General AI models) is an fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality. Currently, we are focusing on NLP models and tasks. In near futher, we will support for other modalities.
 
-* Now it supports GLM, BERT, RoBERTa, GPT2, T5, and models from Huggingface Transformers.
+* Now it supports **WuDao GLM** with a maximum of 10 billion parameters (see [Introduction to GLM](/docs/GLM.md)). It also supports **BERT**, **RoBERTa**, **GPT2**, **T5**, and models from Huggingface Transformers.
 
-* It provides APIs to quickly download and use those pre-trained models on a given text, fine-tune them on your own datasets, and then share them with the community on our model hub.
+* It provides APIs to quickly download and use those pre-trained models on a given text, fine-tune them on widely-used datasets collected from [SuperGLUE](https://super.gluebenchmark.com/) and [CLUE](https://github.com/CLUEbenchmark/CLUE) benchmarks, and then share them with the community on our model hub. It also provides [prompt-learning](/docs/TUTORIAL_7_PROMPT_LERANING.md) toolkit for few shot tasks.   
 
 * These models can be applied to (Chinese/English) Text, for tasks like text classification, information extraction, question answering, summarization, and text generation.
 
-* FlagAI is backed by the three most popular data/model parallel libraries — PyTorch/Deepspeed/Megatron-LM — with seamless integration between them. Users can parallel their training/testing process with less than ten lines of code.
+* FlagAI is backed by the three most popular data/model parallel libraries — [PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM) — with seamless integration between them. Users can parallel their training/testing process with less than ten lines of code.
 
 
-The code is partially based on [Transformers](https://github.com/huggingface/transformers) and [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples).
+The code is partially based on [GLM](https://github.com/THUDM/GLM), [Transformers](https://github.com/huggingface/transformers) and [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples).
 
 
 <!-- toc -->
@@ -114,13 +114,17 @@ for text in test_data:
 ```
 
 ## Pretrained Models and examples
-* [Poetry generation with GLM-large-ch](docs/TUTORIAL_9_GLM_EXAMPLE_PEOTRY_GENERATION.md)
-* [Title Generation with RoBerta-WWM ](/docs/TUTORIAL_10_BERT_EXAMPLE_TITLE_GENERATION.md)
-* [Semantic Matching with RoBerta-WWM](/docs/TUTORIAL_11_BERT_EXAMPLE_SEMANTIC_MATCHING.md)
-* [NER with RoBerta-WWM](/docs/TUTORIAL_14_BERT_EXAMPLE_NER.md)
-* [Writing with GPT-2](/docs/TUTORIAL_15_GPT2_WRITING.md)
-* [Title generation with T5](/docs/TUTORIAL_16_T5_EXAMPLE_TITLE_GENERATION.md)
-* [Supported tasks](/docs/AllSupportedTasks.md)
+
+* [Blank_Filling_QA with GLM ](/docs/TUTORIAL_11_GLM_BLANK_FILLING_QA.md)
+* [Title Generation with GLM ](/docs/TUTORIAL_12_GLM_EXAMPLE_TITLE_GENERATION.md)
+* [Poetry generation with GLM-large-ch](docs/TUTORIAL_13_GLM_EXAMPLE_PEOTRY_GENERATION.md)
+* [Using huggingface's t5-11b & tricks ](docs/TUTORIAL_14_HUGGINGFACE_T5.md)
+* [Title Generation with RoBerta-WWM](/docs/TUTORIAL_15_BERT_EXAMPLE_TITLE_GENERATION.md)
+* [Semantic Matching with RoBerta-WWM](/docs/TUTORIAL_16_BERT_EXAMPLE_SEMANTIC_MATCHING.md)
+* [NER with RoBerta-WWM](/docs/TUTORIAL_17_BERT_EXAMPLE_NER.md)
+* [Writing with GPT-2](/docs/TUTORIAL_18_GPT2_WRITING.md)
+* [Title generation with T5](/docs/TUTORIAL_19_T5_EXAMPLE_TITLE_GENERATION.md)
+* [Supported tasks](/docs/TUTORIAL_20_SUPPORTED_TASKS.md)
 
 
 This session explains how the base NLP classes work, how you can load pre-trained models to tag your
@@ -131,22 +135,16 @@ language models, sequence labeling models, and text classification models. Let u
 
 ## Tutorials
 We provide a set of quick tutorials to get you started with the library:
-
-* [Tutorial 1: Basics](docs/TUTORIAL_1_BASICS.md)
-* [Tutorial 2: Project structure](docs/TUTORIAL_2_PROJECT_STRUCTURE.md)
-* [Tutorial 3: Supported tokenizers](docs/TUTORIAL_3_TOKENIZER.md)
-* [Tutorial 4: Supported datasets](docs/TUTORIAL_4_DATASET.md)
-* [Tutorial 5: Supported models](https://model.baai.ac.cn/models)
-* [Tutorial 6: Training a model](docs/TUTORIAL_8_TRAINING.md)
-* [Tutorial 7: AutoLoader](docs/TUTORIAL_12_INSTRUCTIONS_FOR_AutoLoader.md)
-* [Tutorial 8: Predictor](docs/TUTORIAL_13_INSTRUCTIONS_FOR_PREDICTOR.md)
-
-## Learn More About FlagAI
-* [Datasets: supported datasets & PET integration.](docs/APPENDIX_TASK.md)
-* [Setup enviroments for data/model parallel](docs/EnvironmentSetup.md)
-* [Three types of generation](docs/Seq2seqMethod.md)
-* [Using huggingface's t5-3b & tricks ](docs/Huggingface_t5.md)
-* [Transform a model into Megatron-LM version](docs/ChangeToMegatron.md)
+* [Tutorial 1: How to construct and use Tokenizer](/docs/TUTORIAL_1_TOKENIZER.md)
+* [Tutorial 2: Dataset Preprocessing Pipeline](/docs/TUTORIAL_2_DATASET.md)
+* [Tutorial 3: Major Function of Model Module](/docs/TUTORIAL_3_MODEL.md)
+* [Tutorial 4: Customize trainer for model and data-parallel training](/docs/TUTORIAL_4_TRAINER.md)
+* [Tutorial 5: Simplify model and tokenizer Initialization by Using Autoloader](/docs/TUTORIAL_5_INSTRUCTIONS_FOR_AutoLoader.md)
+* [Tutorial 6: Use off-the-shelf inference Algorithms with Predictor](/docs/TUTORIAL_6_INSTRUCTIONS_FOR_PREDICTOR.md)
+* [Tutorial 7: Use FlagAI prompt-learning tool-kit to improve performance on SuperGLUE](/docs/TUTORIAL_7_PROMPT_LERANING.md)
+* [Tutorial 8: Setup environment for training models with multi-machine](/docs/TUTORIAL_8_ENVIRONMENT_SETUP.md)
+* [Tutorial 9: Text generation with encoder/decoder/encoder-decoder models](/docs/TUTORIAL_9_SEQ2SEQ_METHOD.md)
+* [Tutorial 10: How to transform a customized model into a megatron-LM-style parallel model](/docs/TUTORIAL_10_MEGATRON.md)
 
 ## Contributing
 

diff --git a/README_zh.md b/README_zh.md
@@ -5,19 +5,19 @@
 
 --------------------------------------------------------------------------------
 
-FlagAI 是一个快速、易于使用和可扩展的大型模型工具包。 我们的目标是支持在多模态的各种下游任务上训练、微调和部署大规模模型。 目前，我们专注于 NLP 模型和任务。 在不久的将来，我们将支持其他模态。
+FlagAI 是一个快速、易于使用和可扩展的大模型工具包。 我们的目标是支持在多模态的各种下游任务上训练、微调和部署大规模模型。 目前，我们专注于 NLP 模型和任务。 在不久的将来，我们将支持其他模态。
 <br><br>
 
-* 现在它支持 GLM、BERT、RoBERTa、GPT2、T5 模型和 Huggingface Transformers 的模型。
+* 现在它支持最高百亿参数的**WUDAO GLM**(详见[GLM介绍](/doc_zh/GLM.md))。它同时也支持**BERT**、**RoBERTa**、**GPT2**、**T5** 模型和 Huggingface Transformers 的模型。
 
-* 它提供 API 以快速下载并在给定（中/英文）文本上使用这些预训练模型，在您自己的数据集上对其进行微调，然后在我们的模型中心与社区共享它们。
+* 它提供 API 以快速下载并在给定（中/英文）文本上使用这些预训练模型，在您自己的数据集上对其进行微调(fine-tuning)或者应用[提示学习(prompt-tuning)](/doc_zh/TUTORIAL_7_PROMPT_LERANING.md)，然后在我们的模型中心与社区共享它们。 
 
 * 这些模型可以应用于文本，用于文本分类、信息提取、问答、摘要、文本生成等任务，尤其是中文。
 
-* FlagAI 由三个最流行的数据/模型并行库（PyTorch/Deepspeed/Megatron-LM）提供支持，它们之间实现了无缝集成。 你可以用不到十行代码来并行你的训练/测试过程。
+* FlagAI 由三个最流行的数据/模型并行库（[PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)）提供支持，它们之间实现了无缝集成。 你可以用不到十行代码来并行你的训练/测试过程。
 
 
-本项目的部分代码基于[Transformers](https://github.com/huggingface/transformers) 和 [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples).
+本项目的部分代码基于[GLM](https://github.com/THUDM/GLM),[Transformers](https://github.com/huggingface/transformers) 和 [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples).
 
 <!-- toc -->
 
@@ -181,36 +181,34 @@ for text_pair in test_data:
 ```
 
 # 预训练模型以及样例
-* [RoBERTa-base-ch用于标题生成](doc_zh/TUTORIAL_10_BERT_EXAMPLE_TITLE_GENERATION.md)
-* [RoBERTa-base-ch用于语义相似度匹配](doc_zh/TUTORIAL_11_BERT_EXAMPLE_SEMANTIC_MATCHING.md)
-* [GLM-large-ch用于诗歌生成](doc_zh/TUTORIAL_9_GLM_EXAMPLE_PEOTRY_GENERATION.md)
-* [RoBERTa-base-ch用于命名实体识别](/docs/TUTORIAL_14_BERT_EXAMPLE_NER.md)
-* [GPT-2用于文本续写](/docs/TUTORIAL_15_GPT2_WRITING.md)
-* [T5用于标题生成](/docs/TUTORIAL_16_T5_EXAMPLE_TITLE_GENERATION.md)
-* [所有支持的任务](docs/AllSupportedTasks.md)
+* [GLM-large-ch用户完形填空问答](/doc_zh/TUTORIAL_11_GLM_BLANK_FILLING_QA.md)
+* [GLM-large-ch用于诗歌生成](doc_zh/TUTORIAL_13_GLM_EXAMPLE_PEOTRY_GENERATION.md)
+* [GLM-large-ch用于标题生成](doc_zh/TUTORIAL_12_GLM_EXAMPLE_TITLE_GENERATION.md)
+* [对 huggingface t5-11b 模型的支持 以及加速的tricks](doc_zh/TUTORIAL_14_HUGGINGFACE_T5.md)
+* [RoBERTa-base-ch用于标题生成](doc_zh/TUTORIAL_15_BERT_EXAMPLE_TITLE_GENERATION.md)
+* [RoBERTa-base-ch用于语义相似度匹配](doc_zh/TUTORIAL_16_BERT_EXAMPLE_SEMANTIC_MATCHING.md)
+* [RoBERTa-base-ch用于命名实体识别](/doc_zh/TUTORIAL_17_BERT_EXAMPLE_NER.md)
+* [GPT-2用于文本续写](/doc_zh/TUTORIAL_18_GPT2_WRITING.md)
+* [T5用于标题生成](/doc_zh/TUTORIAL_19_T5_EXAMPLE_TITLE_GENERATION.md)
+* [所有支持的任务](doc_zh/TUTORIAL_20_SUPPORTED_TASKS.md)
 
 
 本节解释了本项目中基础NLP类是如何工作的，如何加载预先训练的模型来标记您的文本，如何使用不同的词或文档嵌入来得到表示，以及如何训练自己的语言模型、序列标注模型和文本分类模型。
 
 
 # 教程
 我们提供了一组教程来帮助您快速上手使用本库：
-* [教程 1: 基础知识](doc_zh/TUTORIAL_1_BASICS.md)
-* [教程 2: 项目结构](doc_zh/TUTORIAL_2_PROJECT_STRUCTURE.md)
-* [教程 3: 项目支持的分词器](doc_zh/TUTORIAL_3_TOKENIZER.md)
-* [教程 4: 项目支持的数据集](doc_zh/TUTORIAL_4_DATASET.md)
-* [教程 5: 项目支持的模型](https://model.baai.ac.cn/models)
-* [教程 6: 训练一个模型](doc_zh/TUTORIAL_8_TRAINING.md)
-* [教程 7: AutoLoader工具](doc_zh/TUTORIAL_12_INSTRUCTIONS_FOR_AutoLoader.md)
-* [教程 8: Predictor工具](doc_zh/TUTORIAL_13_INSTRUCTIONS_FOR_PREDICTOR.md)
-
-
-# 了解更多关于FlagAI
-* [数据集：支持的数据集和 `PET` 集成](doc_zh/APPENDIX_TASK.md)
-* [数据/模型并行的环境设置](doc_zh/EnvironmentSetup.md)
-* [三种不同的生成方式](doc_zh/Seq2seqMethod.md)
-* [对 huggingface t5-3b 模型的支持 以及加速的tricks](doc_zh/Huggingface_t5.md)
-* [转化一个模型为Megatron-LM的模型并行版本](doc_zh/ChangeToMegatron.md)
+* [Tutorial 1: 构建和应用分词器](/doc_zh/TUTORIAL_1_TOKENIZER.md)
+* [Tutorial 2: 数据集预处理流程](/doc_zh/TUTORIAL_2_DATASET.md)
+* [Tutorial 3: 模型的主要功能及相关结构](/doc_zh/TUTORIAL_3_MODEL.md)
+* [Tutorial 4: 模型训练(支持并行化)](/doc_zh/TUTORIAL_4_TRAINER.md)
+* [Tutorial 5: 使用AutoLoader工具快速构建模型](/doc_zh/TUTORIAL_5_INSTRUCTIONS_FOR_AutoLoader.md)
+* [Tutorial 6: 使用Predictor工具进行预测](/doc_zh/TUTORIAL_6_INSTRUCTIONS_FOR_PREDICTOR.md)
+* [Tutorial 7: FlagAI提示学习功能](/doc_zh/TUTORIAL_7_PROMPT_LERANING.md)
+* [Tutorial 8: 数据/模型并行的环境设置](/doc_zh/TUTORIAL_8_ENVIRONMENT_SETUP.md)
+* [Tutorial 9: 使用**编码器/解码器/编解码**器模型进行文本生成](/doc_zh/TUTORIAL_9_SEQ2SEQ_METHOD.md)
+* [Tutorial 10: 转化一个模型为Megatron-LM的模型并行版本](/doc_zh/TUTORIAL_10_MEGATRON.md)
+
 
 # 贡献代码
 感谢您对贡献的兴趣！ 参与的方式有很多； 从我们的[贡献者指南](CONTRIBUTING.md) 开始，然后检查这些[未解决的问题](https://github.com/BAAI-WuDao/Sailing/issues)以执行特定任务。