Skip to content

Commit

Permalink
Update: fix missing references of figures. (#1155)
Browse files Browse the repository at this point in the history
* update soc3-zn

* Update _blog.yml

Try to resolve conflicts

* Update: proofreading zh/ethics-soc-3.md

* add how-to-generate cn version

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* unity game in hf space translation completed

* Update: punctuations of how-to-generate.md

* hf-bitsandbytes-integration cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Proofread hf-bitsandbytes-integration.md

* Proofread: red-teaming.md

* Update: add red-teaming to zh/_blog.yml

* Update _blog.yml

* Update: add red-teaming to zh/_blog.yml

Fix: red-teaming title in zh/_blog.yml

* Fix: red-teaming PPLM translation

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add: stackllama.md

* if blog translation completed

* Update unity-in-spaces.md

Add a link for AI game

* Update if.md

Fix “普罗大众” to “普惠大众”

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: formatting and punctuations of starcoder.md

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofreading zh/unity-in-spaces.md

* fix(annotated-diffusion.md): fix image shape desc in PIL and Tensor (#1080)

modifiy the comment after ToTensor with the correct image shape CHW

* Add text-to-video blog (#1058)

Adds an overview of text-to-video generative models, task specific challenges, datasets, and more.

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix broken link in text-to-video.md (#1083)

* Update: proofreading zh/unity-in-spaces.md

Fix: incorrect _blog.yml format

* Update: proofreading zh/deep-learning-with-proteins.md

* update ethics-diffusers-cn (#6)

* update ethics-diffusers

* update ethics-diffusers

---------

Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: proofreading zh/ethics-diffusers.md

* 1. introducing-csearch done (#11)

2. text-to-video done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofread zh/text-to-video.md

* Update: proofreading zh/introducing-csearch.md

* generative-ai-models-on-intel-cpu cn done (#13)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/generative-ai-models-on-intel-cpu.md
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>

* add starchat-alpha zh translation (#10)

* Preparing blogpost annoucing `safetensors` security audit + official support. (#1096)

* Preparing blogpost annoucing `safetensors` security audit + official
support.

* Taking into account comments + Grammarly.

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Update safetensors-official.md

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Adding thumbnail.

* Include changes from Stella.

* Update safetensors-official.md

* Update with Stella's comments.

* Remove problematic sentence.

* Rename + some rephrasing.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Update safetensors-security-audit.md

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Last fixes.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Hotfixing safetensors. (#1131)

* Removing the checklist formatting is busted. (#1132)

* Update safetensors-security-audit.md (#1134)

* [time series transformers] update dataloader API (#1135)

* update dataloader API

* revert comment

* add back Cached transform

* New post: Hugging Face and IBM (#1130)

* Initial version

* Minor fixes

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Resize image

* Update blog index

---------

Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Show authors of safetensors blog post (#1137)

Update: proofread zh/starchat-alpha.md

* add megatron-training & assisted-generation (#8)

* add megatron-training

* add megatron-training

* add megatron-training

* add megatron-training

* add assisted-generation

* add assisted-generation

* add assisted-generation

* Update: proofreading zh/assisted-generation

* Update: proofread zh/megatron-training.md

* rwkv model blog translation completed (#12)

* rwkv model blog translation completed

* add 3 additional parts in the blog tail

* Update: proofread zh/rwkv.md

* Fix: missing subtitle/notes for image references.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>
Co-authored-by: innovation64 <liyang19991126@126.com>
Co-authored-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: SuSung-boy <872414318@qq.com>
Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com>
Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com>
Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>
Co-authored-by: Victor Muštar <victor.mustar@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com>
Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com>
  • Loading branch information
22 people authored May 30, 2023
1 parent 821e31b commit be8e96c
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions zh/rwkv.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,19 @@ RNN 架构是最早广泛用于处理序列数据的神经网络架构之一。

| ![rnn_diagram](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/142_rwkv/RNN-scheme.png) |
| :-: |
| <b><a href="https://karpathy.github.io/2015/05/21/rnn-effectiveness/" rel="noopener" target="_blank"></a></b> |
| <b>RNN 在不同场景下 RNN 的网络配置简图。图片来源:<a href="https://karpathy.github.io/2015/05/21/rnn-effectiveness/" rel="noopener" target="_blank">Andrej Karpathy 的博文</a></b> |

由于 RNN 在计算每一时刻的预测值时使用的都是同一组网络权重,因此 RNN 很难解决长距离序列信息的记忆问题,这一定程度上也是训练过程中梯度消失导致的。为解决这个问题,相继有新的网络架构被提出,如 LSTM 或者 GRU,其中 transformer 是已被证实最有效的架构。

在 transformer 架构中,不同时刻的输入 token 可以在 self-attention 模块中并行处理。首先 token 经过 Q、K、V 权重矩阵做线性变换投影到不同的空间,得到的 Q、K 矩阵用于计算注意力分数 (通过 softmax,如下图所示),然后乘以 V 的隐状态得到最终的隐状态,这种架构设计可以有效缓解长距离序列问题,同时具有比 RNN 更快的训练和推理速度。

| ![transformer_diagram](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/142_rwkv/transformer-scheme.png) |
| :-: |
| <b><a href="https://jalammar.github.io/illustrated-transformer/" rel="noopener" target="_blank" ></a></b> |
| <b>transformer 模型中的注意力分数计算公式。图片来源:<a href="https://jalammar.github.io/illustrated-transformer/" rel="noopener" target="_blank" >Jay Alammar 的博文</a></b> |

| ![rwkv_attention_formula](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/142_rwkv/RWKV-formula.png) |
| :-: |
| <b><a href="https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-formula.png" rel="noopener" target="_blank" ></a></b> |
| <b>RWKV 模型中的注意力分数计算公式。来源:<a href="https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-formula.png" rel="noopener" target="_blank" >RWKV 博文</a></b> |

在训练过程中,Transformer 架构相比于传统的 RNN 和 CNN 有多个优势,最突出的优势是它能够学到上下文特征表达。不同于每次仅处理输入序列中一个 token 的 RNN 和 CNN,transformer 可以单次处理整个输入序列,这种特性也使得 transformer 可以很好地应对长距离序列 token 依赖问题,因此 transformer 在语言翻译和问答等多种任务中表现非常亮眼。

Expand All @@ -68,7 +68,7 @@ RNN 本身支持非常长的上下文长度。即使在训练时接收的上下

| ![rwkv_loss](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/142_rwkv/RWKV-loss.png) |
| :-: |
| <b><a href="https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-ctxlen.png" rel="noopener" target="_blank"></a></b> |
| <b>LM Loss 在不同上下文长度和模型大小的曲线。图片来源:<a href="https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-ctxlen.png" rel="noopener" target="_blank">RWKV 原始仓库</a></b> |

1. 传统的 RNN 模型无法并行训练,而 RWKV 更像一个 “线性 GPT”,因此比 GPT 训练得更快。

Expand All @@ -88,7 +88,7 @@ RWKV 模型架构与经典的 transformer 模型架构非常相似 (例如也包

| ![rwkv_loss](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/142_rwkv/RWKV-eval.png) |
| :-: |
| <b><a href="https://johanwind.github.io/2023/03/23/rwkv_overview.html" rel="noopener" target="_blank" ></a></b> |
| <b>RWKV-4 与其他常见架构的性能对比。图片来源:<a href="https://johanwind.github.io/2023/03/23/rwkv_overview.html" rel="noopener" target="_blank" >Johan Wind 的博文</a></b> |

#### 指令微调/Chat 版: RWKV-4 Raven

Expand Down

0 comments on commit be8e96c

Please sign in to comment.