Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump release #654

Merged
merged 742 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
742 commits
Select commit Hold shift + click to select a range
106d524
reviewed ep.59 (#406)
tyisme614 Dec 12, 2022
83db0cd
docs(zh-cn): Reviewed 60_what-is-the-bleu-metric.srt (#407)
tyisme614 Dec 12, 2022
51795d1
finished review (#408)
tyisme614 Dec 12, 2022
4c247e7
docs(zh-cn): Reviewed 61_data-processing-for-summarization.srt (#409)
tyisme614 Dec 12, 2022
f0e56ea
Fix subtitle - translation data processing (#411)
lewtun Dec 12, 2022
a9cb468
[FR] Final PR (#412)
lbourdois Dec 15, 2022
2880e37
[ko] Add chapter 8 translation (#417)
dlfrnaos19 Dec 23, 2022
e42c6e2
docs(zh-cn): Reviewed 62_what-is-the-rouge-metric.srt (#419)
tyisme614 Dec 23, 2022
b771e83
fixed errors (#420)
tyisme614 Dec 23, 2022
3263aec
docs(zh-cn): Reviewed 63_data-processing-for-causal-language-modeling…
tyisme614 Dec 23, 2022
a5257de
docs(zh-cn): Reviewed 65_data-processing-for-question-answering.srt (…
tyisme614 Dec 23, 2022
3a0a677
finished review (#422)
tyisme614 Dec 23, 2022
39d78ee
Add Ko chapter2 2.mdx (#418)
nsbg Dec 24, 2022
a977f4a
update textbook link (#427)
Bearnardd Dec 27, 2022
44277eb
Visual fixes (#428)
lbourdois Dec 27, 2022
78a3576
finish first round review (#429)
iCell Dec 28, 2022
af0c221
Fix French subtitles + refactor conversion script (#431)
lewtun Dec 28, 2022
1d4e07f
Add tokenizer to MLM Trainer (#432)
lewtun Dec 28, 2022
33fe207
Fix FR video descriptions (#433)
lewtun Dec 30, 2022
78bad8b
Fix dead GPT model docs link. (#430)
dnaveenr Jan 4, 2023
22d0918
Translate into Korean: 2-3 (#434)
rainmaker712 Jan 4, 2023
fae85a2
Add korean translation of chapter5 (1,2) (#441)
wonhyeongseo Jan 4, 2023
47c9050
Update 3.mdx (#444)
richardachen Jan 4, 2023
4c38b1f
docs(zh-cn): Reviewed 67_the-post-processing-step-in-question-answeri…
tyisme614 Jan 4, 2023
9b46cd2
docs(zh-cn): Reviewed 66_the-post-processing-step-in-question-answeri…
tyisme614 Jan 4, 2023
604ca67
docs(zh-cn): Reviewed 01_the-pipeline-function.srt (#452)
iCell Jan 4, 2023
5252a89
finish review (#453)
iCell Jan 4, 2023
d280d9b
Revise some unnatural translations (#458)
beyondguo Jan 4, 2023
68cb970
Fix chapter 5 links (#461)
richardachen Jan 5, 2023
041fabe
fix small typo (#460)
bsenst Jan 5, 2023
d29a927
Add Ko chapter2 3~8.mdx & Modify Ko chapter2 2.mdx typo (#446)
nsbg Jan 5, 2023
f71cf6c
Add captions for tasks videos (#464)
lewtun Jan 5, 2023
fdb61a9
[FR] Add 🤗 Tasks videos (#468)
lbourdois Jan 8, 2023
e126995
Synchronous Chinese course update
yaoqih Feb 12, 2023
2e48b14
review sync
yaoqih Feb 12, 2023
b28a4a8
Update 3.mdx
yaoqih Feb 12, 2023
17d0217
format zh_CN
yaoqih Feb 12, 2023
e5b8b08
format all mdx
yaoqih Feb 12, 2023
cc83c68
Remove temp folder
yaoqih Feb 12, 2023
52bf475
finished review (#449)
tyisme614 Feb 13, 2023
0d55521
docs(zh-cn): Reviewed 31_navigating-the-model-hub.srt (#451)
tyisme614 Feb 13, 2023
f47740c
docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline funct…
innovation64 Feb 13, 2023
630513e
docs(zh-cn): Reviewed 03_what-is-transfer-learning.srt (#457)
iCell Feb 13, 2023
a5399c5
docs(zh-cn): 32_managing-a-repo-on-the-model-hub.srt (#469)
tyisme614 Feb 13, 2023
f165508
docs(zh-cn): Reviewed No. 10 - Instantiate a Transformers model (PyTo…
PowerChina Feb 13, 2023
27a5a36
docs(zh-cn): 33_the-push-to-hub-api-(pytorch).srt (#473)
tyisme614 Feb 13, 2023
caaef80
docs(zh-cn): Reviewed 34_the-push-to-hub-api-(tensorflow).srt (#479)
tyisme614 Feb 13, 2023
0a0a179
running python utils/code_formatter.py
chenglu99 Feb 18, 2023
97b3123
Merge pull request #1 from chenglu99/main
yaoqih Feb 18, 2023
f877f03
review 05 cn translations
iCell Feb 19, 2023
7507ede
review 06 cn translations
iCell Feb 19, 2023
b03d1b1
Merge pull request #498 from yaoqih/main
xianbaoqian Feb 19, 2023
b310a42
Review No.11
bon-qi Feb 19, 2023
b374b42
translate no.24
maybenotime Feb 19, 2023
aaee39f
review 06 cn translations
iCell Feb 19, 2023
f001423
review 07 cn translations
iCell Feb 19, 2023
bf69b15
Update 23_what-is-dynamic-padding.srt
nuass Feb 20, 2023
ca1ee22
Update 23_what-is-dynamic-padding.srt
nuass Feb 20, 2023
9ac39b0
Update 23_what-is-dynamic-padding.srt
nuass Feb 20, 2023
eaf6336
Update subtitles/zh-CN/23_what-is-dynamic-padding.srt
nuass Feb 21, 2023
593070d
Update subtitles/zh-CN/23_what-is-dynamic-padding.srt
nuass Feb 21, 2023
7731b66
add blank
maybenotime Feb 22, 2023
3b15333
Review No. 11, No. 12
bon-qi Feb 24, 2023
af22e13
Review No. 13
bon-qi Feb 24, 2023
73f5a98
Review No. 12
bon-qi Feb 24, 2023
33d4748
Review No. 14
bon-qi Feb 24, 2023
ace04ee
Merge pull request #512 from nuass/main
xianbaoqian Feb 27, 2023
d757b64
Merge pull request #508 from iCell/shawn/review-07
xianbaoqian Feb 27, 2023
dfcf449
Merge pull request #509 from maybenotime/my_translate
xianbaoqian Feb 27, 2023
df3e0cf
Merge pull request #506 from iCell/shawn/review-06
xianbaoqian Feb 27, 2023
e1593ee
Merge pull request #505 from iCell/shawn/review-05
xianbaoqian Feb 27, 2023
35b49aa
finished review
tyisme614 Feb 27, 2023
0077839
optimized translation
tyisme614 Feb 28, 2023
3d45b8b
optimized translation
tyisme614 Feb 28, 2023
5bf5512
docs(zh-cn): Reviewed No. 29 - Write your training loop in PyTorch
FYJNEVERFOLLOWS Mar 1, 2023
5e9a4dd
Review 15
bon-qi Mar 1, 2023
edcd017
Review 16
bon-qi Mar 2, 2023
8c09b21
Review 17
bon-qi Mar 2, 2023
7ff227a
Review 18
bon-qi Mar 2, 2023
adb4834
Review ch 72 translation
zhangchaosd Mar 5, 2023
b5202b6
Update 72 cn translation
zhangchaosd Mar 5, 2023
22ae117
To be reviewed No.42-No.54
bon-qi Mar 6, 2023
b7f0272
No.11 check-out
bon-qi Mar 6, 2023
91658cf
No.12 check-out
bon-qi Mar 6, 2023
ef56aa2
No. 13 14 check-out
bon-qi Mar 6, 2023
a37f421
No. 15 16 check-out
bon-qi Mar 6, 2023
c2bace6
No. 17 18 check-out
bon-qi Mar 6, 2023
97c8493
Add note for "token-*"
bon-qi Mar 6, 2023
c68f8d9
Reviewed No.8, 9, 10
bon-qi Mar 6, 2023
0c8cb97
Reviewed No.42
bon-qi Mar 7, 2023
f9678cc
Review No.43
bon-qi Mar 7, 2023
3818b03
finished review
tyisme614 Mar 8, 2023
0f69de8
optimized translation
tyisme614 Mar 8, 2023
6cb8ba2
finished review
tyisme614 Mar 9, 2023
e295e89
optimized translation
tyisme614 Mar 9, 2023
b1b5794
Review 44(need refine)
bon-qi Mar 10, 2023
7849039
Review 45(need refine)
bon-qi Mar 10, 2023
8589c1a
Review No. 46 (need refine)
bon-qi Mar 10, 2023
e7adb34
Review No.47
bon-qi Mar 10, 2023
c8045f7
Review No.46
bon-qi Mar 10, 2023
373fe71
Review No.45
bon-qi Mar 10, 2023
ccfb507
Review No.44
bon-qi Mar 10, 2023
acab389
Review No.48
bon-qi Mar 10, 2023
3c72012
Review No.49
bon-qi Mar 10, 2023
254cf77
Review No.50
bon-qi Mar 10, 2023
2d03f08
Modify Ko chapter2 8.mdx (#465)
nsbg Mar 10, 2023
146bdea
Fixed typo (#471)
tkburis Mar 10, 2023
3842502
fixed subtitle errors (#474)
tyisme614 Mar 10, 2023
96aa135
Fixed a typo (#475)
gdacciaro Mar 10, 2023
ca81c80
Update 3.mdx (#526)
carlos-aguayo Mar 10, 2023
32bfdff
[zh-TW] Added chapters 1-9 (#477)
ateliershen Mar 10, 2023
cff8856
finished review
tyisme614 Mar 10, 2023
1d92a90
Explain why there are more tokens, than reviews (#476)
pavel-nesterov Mar 10, 2023
92671bc
[RU] Subtitles for Chapter 1 of the video course (#489)
artyomboyko Mar 10, 2023
305f0b1
Review No.52
bon-qi Mar 10, 2023
5dfcc95
[ru] Added the glossary and translation guide (#490)
501Good Mar 10, 2023
d229ff7
[ru] Chapters 0 and 1 proofreading, updating and translating missing …
501Good Mar 10, 2023
33ace99
Review No.51
bon-qi Mar 11, 2023
40d54e0
Review No.53
bon-qi Mar 11, 2023
25c44d3
Review No.54
bon-qi Mar 11, 2023
b0c60c8
finished review
tyisme614 Mar 11, 2023
b81337f
modified translation
tyisme614 Mar 11, 2023
2435416
modified translation
tyisme614 Mar 11, 2023
ac8a0ab
modified subtitle
tyisme614 Mar 11, 2023
f1500d8
Merge branch 'main' of github.com:FYJNEVERFOLLOWS/huggingface-course
FYJNEVERFOLLOWS Mar 13, 2023
722dd7e
translated
FYJNEVERFOLLOWS Mar 13, 2023
0bc7dc0
Fix typo (#532)
jybarnes Mar 17, 2023
0ffbef4
review chapter4/2
gxy-gxy Mar 19, 2023
46ae77c
review chapter4/2
gxy-gxy Mar 19, 2023
2c4bea8
review chapter4/2
gxy-gxy Mar 19, 2023
b6a9632
Review 75
bon-qi Mar 22, 2023
3f6515a
Review No.20, need review some
bon-qi Mar 22, 2023
72c1f04
docs(zh-cn): Reviewed Chapter 7/1
jinyouzhi Mar 22, 2023
b77a8c5
Update 1.mdx
jinyouzhi Mar 22, 2023
7501ef4
Review No.22
bon-qi Mar 22, 2023
b1623c6
Review No.21 (need refinement)
bon-qi Mar 22, 2023
3c56c95
Review No.30, need review: 26 27 28 30 73 74
bon-qi Mar 23, 2023
007c858
Review 30 (good)
bon-qi Mar 23, 2023
e4f6434
Review 20
bon-qi Mar 23, 2023
c346c80
Review 21 (refine)
bon-qi Mar 23, 2023
996dc89
Review 21
bon-qi Mar 24, 2023
f832a4d
Review 22
bon-qi Mar 24, 2023
fe4c7d7
Review 26
bon-qi Mar 24, 2023
4140a21
Review 27
bon-qi Mar 24, 2023
238196c
Review 28
bon-qi Mar 24, 2023
0626ce6
Review 30
bon-qi Mar 24, 2023
071272c
Review 73
bon-qi Mar 24, 2023
1f3ab61
Review 74
bon-qi Mar 24, 2023
cfc456b
Fix typo
vhch Mar 28, 2023
0d13c12
Review 26-28, 42-54, 73-75
bon-qi Apr 1, 2023
21e6e6b
The GPT2 link is broken
tsureshkumar Apr 3, 2023
14067bb
typo in `Now your turn!` section
feeeper Apr 4, 2023
1d5471a
`chunk_size` should be instead of `block_size`
feeeper Apr 9, 2023
98218c6
Merge pull request #542 from Vermillion-de/main
xianbaoqian Apr 10, 2023
f381a75
Merge pull request #534 from gxy-gxy/main
xianbaoqian Apr 10, 2023
af00e34
Merge pull request #536 from jinyouzhi/course_review
xianbaoqian Apr 10, 2023
9254fb6
Merge branch 'main' into 0313
xianbaoqian Apr 10, 2023
1db1185
Merge pull request #531 from FYJNEVERFOLLOWS/0313
xianbaoqian Apr 10, 2023
097427c
Merge pull request #529 from tyisme614/review_ep41
xianbaoqian Apr 10, 2023
632ac11
Merge pull request #528 from tyisme614/review_ep40
xianbaoqian Apr 10, 2023
acba72a
Merge pull request #527 from tyisme614/review_ep39
xianbaoqian Apr 10, 2023
a19892b
Merge pull request #525 from tyisme614/review_ep38
xianbaoqian Apr 10, 2023
17df4bf
Merge pull request #520 from zhangchaosd/main
xianbaoqian Apr 10, 2023
5b077d8
Merge pull request #522 from tyisme614/review_ep37
xianbaoqian Apr 10, 2023
f631679
Merge pull request #515 from tyisme614/review_ep35
xianbaoqian Apr 11, 2023
aca889a
Merge pull request #530 from tyisme614/optimize_en_ep41
xianbaoqian Apr 11, 2023
de2cb27
refactor: rephrase text to improve clarity and specificity
Pranav-Bobde Apr 24, 2023
e5a8fcf
Demo link fixes (#562)
MKhalusova May 10, 2023
cccc2c9
Bump release (#566)
lewtun May 10, 2023
9c44804
Revert "Bump release (#566)" (#567)
lewtun May 10, 2023
86f4396
updated documentation links
nnoboa Jun 4, 2023
0018bb4
[doc build] Use secrets (#581)
mishig25 Jun 9, 2023
ceef93d
docs: fix broken links
vipulaSD Jun 15, 2023
f6ded40
changed 'perspires' to 'persists' in chapter 1 quiz
abzdel Jun 22, 2023
333d7fe
Update 4.mdx
JieShenAI Jun 24, 2023
b3cfa9c
Update 4.mdx : Fix Typo
Aug 5, 2023
47427cf
Fix chapter1/5 old documentation links
osanseviero Aug 6, 2023
77df728
fix link
dawoodkhan82 Sep 19, 2023
2af4831
Update 2.mdx
Sookeyy-12 Sep 20, 2023
b88cae1
Update 2.mdx
Sookeyy-12 Sep 20, 2023
52ec6a9
Update 2.mdx
Sookeyy-12 Sep 20, 2023
812f00a
Update 2.mdx
Sookeyy-12 Sep 20, 2023
5a99fa2
Update 2.mdx
Sookeyy-12 Sep 20, 2023
0e84926
Update 2.mdx
Sookeyy-12 Sep 20, 2023
82e991b
Update 2.mdx
Sookeyy-12 Sep 20, 2023
7c0ee67
Update 2.mdx
Sookeyy-12 Sep 20, 2023
5cb46d9
Update 2.mdx
Sookeyy-12 Sep 20, 2023
9093131
Update 2.mdx
Sookeyy-12 Sep 20, 2023
e003d52
Update 2.mdx
Sookeyy-12 Sep 20, 2023
b299376
Update 2.mdx
Sookeyy-12 Sep 20, 2023
02e8159
Fix syntax in vi/chapter7/7.mdx
mishig25 Sep 24, 2023
f1345db
Merge pull request #618 from huggingface/mishig25-patch-3
mishig25 Sep 24, 2023
d1ff989
Merge pull request #614 from huggingface/blocks-events-link
mishig25 Sep 24, 2023
df6be57
Fixed the broken link to the loading datasets page
osanseviero Sep 25, 2023
adf9b1a
Remove `get_lr()` from logs which refers to nonexistent function
bwindsor22 Oct 3, 2023
e7d45fb
Update 4.mdx
paschembri Oct 15, 2023
5c0b7bb
Update en-version
paschembri Oct 15, 2023
12558d7
fix: remove useless token
rtrompier Oct 19, 2023
3423281
fix: remove useless token (#635)
rtrompier Oct 19, 2023
5828ea1
Translate Chapter 3 to Spanish (#510)
mariagrandury Oct 23, 2023
1983457
Translating Chapter 6 to Spanish (#523)
datacubeR Oct 23, 2023
0916d0a
Update 5.mdx
k3ybladewielder Oct 24, 2023
ce2b5e7
Merge pull request #586 from abzdel/patch-1
merveenoyan Nov 22, 2023
e6fadfa
Merge pull request #587 from JieShenAI/patch-2
merveenoyan Nov 22, 2023
1447b5b
Merge pull request #598 from SingularityGuy/SingularityGuy-patch-1
merveenoyan Nov 22, 2023
b004f50
Merge pull request #540 from vhch/main
merveenoyan Nov 22, 2023
df2fa29
Merge pull request #543 from tsureshkumar/patch-1
merveenoyan Nov 22, 2023
3d53e6f
Merge pull request #264 from kambizG/main
merveenoyan Nov 22, 2023
632b16e
Merge pull request #558 from Pranav-Bobde/patch-1
merveenoyan Nov 22, 2023
7aab892
Update doc CI (#643)
lewtun Dec 5, 2023
3def036
Фиксация текущих результатов.
artyomboyko Dec 12, 2023
2857120
Fix translation
osanseviero Dec 14, 2023
cfb9e62
Removed judgmental arguments
osanseviero Dec 14, 2023
1970e1e
Remove get_lr() from logs which refers to nonexistent function from b…
osanseviero Dec 14, 2023
5097c17
Фиксирую текущее состояние.
artyomboyko Dec 16, 2023
ab2a8b5
Fixing the transfer results for today.
artyomboyko Dec 17, 2023
11f30d1
Translated files 3b and partially 4. Fixing the result.
artyomboyko Dec 18, 2023
6450802
Fixing today's translation.
artyomboyko Dec 19, 2023
be9d376
fix typos in Spanish translation (#511)
mariagrandury Dec 20, 2023
198d352
Fixing today's translation. Files: 6.mdx, 7.mdx and half of 8.mdx.
artyomboyko Dec 20, 2023
16f9009
Merge pull request #544 from feeeper/patch-1
osanseviero Dec 20, 2023
60f7702
Merge pull request #551 from feeeper/patch-2
osanseviero Dec 20, 2023
4c8adfa
Merge pull request #582 from vipulaSD/patch-2
osanseviero Dec 20, 2023
c27def4
Merge branch 'huggingface:main' into main
artyomboyko Dec 21, 2023
3113713
The translation of chapter 6 has been completed.
artyomboyko Dec 21, 2023
2be3db1
Delete chapters/en/.ipynb_checkpoints/_toctree-checkpoint.yml
artyomboyko Dec 21, 2023
32c9ad0
Delete chapters/en/chapter5/.ipynb_checkpoints/8-checkpoint.mdx
artyomboyko Dec 21, 2023
72e6779
Delete chapters/en/chapter6/.ipynb_checkpoints/1-checkpoint.mdx
artyomboyko Dec 21, 2023
c2f871b
Delete chapters/en/chapter6/.ipynb_checkpoints/2-checkpoint.mdx
artyomboyko Dec 21, 2023
ce3ac4d
Delete chapters/en/chapter6/.ipynb_checkpoints/8-checkpoint.mdx
artyomboyko Dec 21, 2023
8f7520a
Delete chapters/en/chapter6/.ipynb_checkpoints/9-checkpoint.mdx
artyomboyko Dec 21, 2023
73855b0
Delete chapters/ru/.ipynb_checkpoints/TRANSLATING-checkpoint.txt
artyomboyko Dec 21, 2023
02395c1
Delete chapters/ru/.ipynb_checkpoints/_toctree-checkpoint.yml
artyomboyko Dec 21, 2023
849d5dd
Delete chapters/ru/chapter5/.ipynb_checkpoints/8-checkpoint.mdx
artyomboyko Dec 21, 2023
be33220
Update 10.mdx
artyomboyko Dec 21, 2023
e9552b0
Update 10.mdx
artyomboyko Dec 21, 2023
5cffa31
Update 10.mdx
artyomboyko Dec 21, 2023
d11fc34
Update chapters/ru/chapter6/4.mdx
artyomboyko Dec 22, 2023
ccbae71
Update chapters/ru/chapter6/4.mdx
artyomboyko Dec 22, 2023
22bde78
Update chapters/ru/chapter6/3.mdx
artyomboyko Dec 22, 2023
eaafdc5
Update chapters/ru/chapter6/3.mdx
artyomboyko Dec 22, 2023
ea57588
Update chapters/ru/chapter6/3b.mdx
artyomboyko Dec 22, 2023
0d34014
Update chapters/ru/chapter6/3.mdx
artyomboyko Dec 22, 2023
8a9bbbc
Update 3.mdx
artyomboyko Dec 22, 2023
b5b2da8
Update 7.mdx
artyomboyko Dec 22, 2023
c67bdb0
Update 3.mdx
artyomboyko Dec 22, 2023
4b4f711
Update chapters/ru/chapter6/3b.mdx
artyomboyko Dec 22, 2023
f00418f
Update chapters/ru/chapter6/5.mdx
artyomboyko Dec 25, 2023
2c733c2
Merge pull request #647 from blademoon/main
MKhalusova Jan 8, 2024
b587c68
Merge branch 'release' into bump_release
lewtun Jan 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ jobs:
additional_args: --not_python_module
languages: ar bn de en es fa fr gj he hi id it ja ko pt ru th tr vi zh-CN zh-TW
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
1 change: 0 additions & 1 deletion .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,3 @@ jobs:
path_to_docs: course/chapters/
additional_args: --not_python_module
languages: ar bn de en es fa fr gj he hi id it ja ko pt ru th tr vi zh-CN zh-TW
hub_base_path: https://moon-ci-docs.huggingface.co
13 changes: 0 additions & 13 deletions .github/workflows/delete_doc_comment.yml

This file was deleted.

17 changes: 17 additions & 0 deletions .github/workflows/upload_pr_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Upload PR Documentation

on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: course
hub_base_path: https://moon-ci-docs.huggingface.co
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
2 changes: 1 addition & 1 deletion chapters/de/chapter3/2.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ In diesem Abschnitt verwenden wir den MRPC-Datensatz (Microsoft Research Paraphr
<Youtube id="W_gMJF0xomE"/>
{/if}

Das Hub enthält nicht nur Modelle; Es hat auch mehrere Datensätze in vielen verschiedenen Sprachen. Du kannst die Datensätze [hier](https://huggingface.co/datasets) durchsuchen, und wir empfehlen, einen weiteren Datensatz zu laden und zu verarbeiten, sobald Sie diesen Abschnitt abgeschlossen haben (die Dokumentation befindet sich [hier](https: //huggingface.co/docs/datasets/loading_datasets.html#from-the-huggingface-hub)). Aber jetzt konzentrieren wir uns auf den MRPC-Datensatz! Dies ist einer der 10 Datensätze, aus denen sich das [GLUE-Benchmark](https://gluebenchmark.com/) zusammensetzt. Dies ist ein akademisches Benchmark, das verwendet wird, um die Performance von ML-Modellen in 10 verschiedenen Textklassifizierungsaufgaben zu messen.
Das Hub enthält nicht nur Modelle; Es hat auch mehrere Datensätze in vielen verschiedenen Sprachen. Du kannst die Datensätze [hier](https://huggingface.co/datasets) durchsuchen, und wir empfehlen, einen weiteren Datensatz zu laden und zu verarbeiten, sobald Sie diesen Abschnitt abgeschlossen haben (die Dokumentation befindet sich [hier](https://huggingface.co/docs/datasets/loading)). Aber jetzt konzentrieren wir uns auf den MRPC-Datensatz! Dies ist einer der 10 Datensätze, aus denen sich das [GLUE-Benchmark](https://gluebenchmark.com/) zusammensetzt. Dies ist ein akademisches Benchmark, das verwendet wird, um die Performance von ML-Modellen in 10 verschiedenen Textklassifizierungsaufgaben zu messen.

Die Bibliothek 🤗 Datasets bietet einen leichten Befehl zum Herunterladen und Caching eines Datensatzes aus dem Hub. Wir können den MRPC-Datensatz wie folgt herunterladen:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter1/10.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ result = classifier("This is a course about the Transformers library")
choices={[
{
text: "The model is a fine-tuned version of a pretrained model and it picked up its bias from it.",
explain: "When applying Transfer Learning, the bias in the pretrained model used perspires in the fine-tuned model.",
explain: "When applying Transfer Learning, the bias in the pretrained model used persists in the fine-tuned model.",
correct: true
},
{
Expand Down
4 changes: 2 additions & 2 deletions chapters/en/chapter1/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ By the way, you can evaluate the carbon footprint of your models' training throu

This pretraining is usually done on very large amounts of data. Therefore, it requires a very large corpus of data, and training can take up to several weeks.

*Fine-tuning*, on the other hand, is the training done **after** a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task. Wait -- why not simply train directly for the final task? There are a couple of reasons:
*Fine-tuning*, on the other hand, is the training done **after** a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task. Wait -- why not simply train the model for your final use case from the start (**scratch**)? There are a couple of reasons:

* The pretrained model was already trained on a dataset that has some similarities with the fine-tuning dataset. The fine-tuning process is thus able to take advantage of knowledge acquired by the initial model during pretraining (for instance, with NLP problems, the pretrained model will have some kind of statistical understanding of the language you are using for your task).
* Since the pretrained model was already trained on lots of data, the fine-tuning requires way less data to get decent results.
Expand Down Expand Up @@ -144,7 +144,7 @@ We will dive into those architectures independently in later sections.

A key feature of Transformer models is that they are built with special layers called *attention layers*. In fact, the title of the paper introducing the Transformer architecture was ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762)! We will explore the details of attention layers later in the course; for now, all you need to know is that this layer will tell the model to pay specific attention to certain words in the sentence you passed it (and more or less ignore the others) when dealing with the representation of each word.

To put this into context, consider the task of translating text from English to French. Given the input "You like this course", a translation model will need to also attend to the adjacent word "You" to get the proper translation for the word "like", because in French the verb "like" is conjugated differently depending on the subject. The rest of the sentence, however, is not useful for the translation of that word. In the same vein, when translating "this" the model will also need to pay attention to the word "course", because "this" translates differently depending on whether the associated noun is masculine or feminine. Again, the other words in the sentence will not matter for the translation of "this". With more complex sentences (and more complex grammar rules), the model would need to pay special attention to words that might appear farther away in the sentence to properly translate each word.
To put this into context, consider the task of translating text from English to French. Given the input "You like this course", a translation model will need to also attend to the adjacent word "You" to get the proper translation for the word "like", because in French the verb "like" is conjugated differently depending on the subject. The rest of the sentence, however, is not useful for the translation of that word. In the same vein, when translating "this" the model will also need to pay attention to the word "course", because "this" translates differently depending on whether the associated noun is masculine or feminine. Again, the other words in the sentence will not matter for the translation of "course". With more complex sentences (and more complex grammar rules), the model would need to pay special attention to words that might appear farther away in the sentence to properly translate each word.

The same concept applies to any task associated with natural language: a word by itself has a meaning, but that meaning is deeply affected by the context, which can be any other word (or words) before or after the word being studied.

Expand Down
10 changes: 5 additions & 5 deletions chapters/en/chapter1/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Encoder models are best suited for tasks requiring an understanding of the full

Representatives of this family of models include:

- [ALBERT](https://huggingface.co/transformers/model_doc/albert.html)
- [BERT](https://huggingface.co/transformers/model_doc/bert.html)
- [DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)
- [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)
- [RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)
- [ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)
- [BERT](https://huggingface.co/docs/transformers/model_doc/bert)
- [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)
- [ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)
- [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)
2 changes: 1 addition & 1 deletion chapters/en/chapter2/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ With Transformer models, there is a limit to the lengths of the sequences we can
- Use a model with a longer supported sequence length.
- Truncate your sequences.

Models have different supported sequence lengths, and some specialize in handling very long sequences. [Longformer](https://huggingface.co/transformers/model_doc/longformer.html) is one example, and another is [LED](https://huggingface.co/transformers/model_doc/led.html). If you're working on a task that requires very long sequences, we recommend you take a look at those models.
Models have different supported sequence lengths, and some specialize in handling very long sequences. [Longformer](https://huggingface.co/docs/transformers/model_doc/longformer) is one example, and another is [LED](https://huggingface.co/docs/transformers/model_doc/led). If you're working on a task that requires very long sequences, we recommend you take a look at those models.

Otherwise, we recommend you truncate your sequences by specifying the `max_sequence_length` parameter:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter3/2.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ In this section we will use as an example the MRPC (Microsoft Research Paraphras
<Youtube id="W_gMJF0xomE"/>
{/if}

The Hub doesn't just contain models; it also has multiple datasets in lots of different languages. You can browse the datasets [here](https://huggingface.co/datasets), and we recommend you try to load and process a new dataset once you have gone through this section (see the general documentation [here](https://huggingface.co/docs/datasets/loading_datasets.html#from-the-huggingface-hub)). But for now, let's focus on the MRPC dataset! This is one of the 10 datasets composing the [GLUE benchmark](https://gluebenchmark.com/), which is an academic benchmark that is used to measure the performance of ML models across 10 different text classification tasks.
The Hub doesn't just contain models; it also has multiple datasets in lots of different languages. You can browse the datasets [here](https://huggingface.co/datasets), and we recommend you try to load and process a new dataset once you have gone through this section (see the general documentation [here](https://huggingface.co/docs/datasets/loading)). But for now, let's focus on the MRPC dataset! This is one of the 10 datasets composing the [GLUE benchmark](https://gluebenchmark.com/), which is an academic benchmark that is used to measure the performance of ML models across 10 different text classification tasks.

The 🤗 Datasets library provides a very simple command to download and cache a dataset on the Hub. We can download the MRPC dataset like this:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter3/6.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ Test what you learned in this chapter!
explain: "This is what we did with <code>Trainer</code>, not the 🤗 Accelerate library. Try again!"
},
{
text: "It makes our training loops work on distributed strategies",
text: "It makes our training loops work on distributed strategies.",
explain: "Correct! With 🤗 Accelerate, your training loops will work for multiple GPUs and TPUs.",
correct: true
},
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter6/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Let's take the example we used during training, with the three merge rules learn
("h", "ug") -> "hug"
```

The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"hu"` and `"g"` being merged.
The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"h"` and `"ug"` being merged.

<Tip>

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter6/7.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ So, the sum of all frequencies is 210, and the probability of the subword `"ug"`

<Tip>

✏️ **Now your turn!** Write the code to compute the the frequencies above and double-check that the results shown are correct, as well as the total sum.
✏️ **Now your turn!** Write the code to compute the frequencies above and double-check that the results shown are correct, as well as the total sum.

</Tip>

Expand Down
4 changes: 2 additions & 2 deletions chapters/en/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ print(f"'>>> Concatenated reviews length: {total_length}'")
'>>> Concatenated reviews length: 951'
```

Great, the total length checks out -- so now let's split the concatenated reviews into chunks of the size given by `block_size`. To do so, we iterate over the features in `concatenated_examples` and use a list comprehension to create slices of each feature. The result is a dictionary of chunks for each feature:
Great, the total length checks out -- so now let's split the concatenated reviews into chunks of the size given by `chunk_size`. To do so, we iterate over the features in `concatenated_examples` and use a list comprehension to create slices of each feature. The result is a dictionary of chunks for each feature:

```python
chunks = {
Expand Down Expand Up @@ -1035,7 +1035,7 @@ Neat -- our model has clearly adapted its weights to predict words that are more

<Youtube id="0Oxphw4Q9fo"/>

This wraps up our first experiment with training a language model. In [section 6](/course/en/chapter7/section6) you'll learn how to train an auto-regressive model like GPT-2 from scratch; head over there if you'd like to see how you can pretrain your very own Transformer model!
This wraps up our first experiment with training a language model. In [section 6](/course/en/chapter7/6) you'll learn how to train an auto-regressive model like GPT-2 from scratch; head over there if you'd like to see how you can pretrain your very own Transformer model!

<Tip>

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter7/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ split_datasets["train"][1]["translation"]
'fr': 'Par défaut, développer les fils de discussion'}
```

We get a dictionary with two sentences in the pair of languages we requested. One particularity of this dataset full of technical computer science terms is that they are all fully translated in French. However, French engineers are often lazy and leave most computer science-specific words in English when they talk. Here, for instance, the word "threads" might well appear in a French sentence, especially in a technical conversation; but in this dataset it has been translated into the more correct "fils de discussion." The pretrained model we use, which has been pretrained on a larger corpus of French and English sentences, takes the easier option of leaving the word as is:
We get a dictionary with two sentences in the pair of languages we requested. One particularity of this dataset full of technical computer science terms is that they are all fully translated in French. However, French engineers leave most computer science-specific words in English when they talk. Here, for instance, the word "threads" might well appear in a French sentence, especially in a technical conversation; but in this dataset it has been translated into the more correct "fils de discussion." The pretrained model we use, which has been pretrained on a larger corpus of French and English sentences, takes the easier option of leaving the word as is:

```py
from transformers import pipeline
Expand Down
3 changes: 1 addition & 2 deletions chapters/en/chapter7/6.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -870,7 +870,6 @@ for epoch in range(num_train_epochs):
if step % 100 == 0:
accelerator.print(
{
"lr": get_lr(),
"samples": step * samples_per_step,
"steps": completed_steps,
"loss/train": loss.item() * gradient_accumulation_steps,
Expand Down Expand Up @@ -912,4 +911,4 @@ And that's it -- you now have your own custom training loop for causal language

</Tip>

{/if}
{/if}
2 changes: 1 addition & 1 deletion chapters/en/events/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ You can find all the demos that the community created under the [`Gradio-Blocks`

**Natural language to SQL**

<iframe src="https://huggingface.co/spaces/Curranj/Words_To_SQL" frameBorder="0" height="640" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe>
<iframe src="https://curranj-words-to-sql.hf.space" frameBorder="0" height="640" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe>
39 changes: 38 additions & 1 deletion chapters/es/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,25 @@
title: ¡Haz completado el uso básico!
- local: chapter2/8
title: Quiz de final de capítulo
quiz: 2

- title: 3. Ajuste (fine-tuning) de un modelo preentrenado
sections:
- local: chapter3/1
title: Introducción
- local: chapter3/2
title: Procesamiento de los datos
- local: chapter3/3
title: Ajuste de un modelo con la API Trainer
- local: chapter3/3_tf
title: Ajuste de un modelo con Keras
- local: chapter3/4
title: Entrenamiento completo
- local: chapter3/5
title: Ajuste de modelos, ¡hecho!
- local: chapter3/6
title: Quiz de final de capítulo
quiz: 3

- title: 5. La librería 🤗 Datasets
sections:
Expand All @@ -66,9 +76,36 @@
- local: chapter5/7
title: 🤗 Datasets, ¡listo!
- local: chapter5/8
title: Quiz
title: Quiz de final de capítulo
quiz: 5


- title: 6. La librería 🤗 Tokenizers
sections:
- local: chapter6/1
title: Introducción
- local: chapter6/2
title: Entrenar un nuevo tokenizador a partir de uno existente
- local: chapter6/3
title: Los poderes especiales de los Tokenizadores Rápidos (Fast tokenizers)
- local: chapter6/3b
title: Tokenizadores Rápidos en un Pipeline de Question-Answering
- local: chapter6/4
title: Normalización y pre-tokenización
- local: chapter6/5
title: Tokenización por Codificación Byte-Pair
- local: chapter6/6
title: Tokenización WordPiece
- local: chapter6/7
title: Tokenización Unigram
- local: chapter6/8
title: Construir un tokenizador, bloque por bloque
- local: chapter6/9
title: Tokenizadores, listo!
- local: chapter6/10
title: Quiz de final de capítulo
quiz: 1

- title: 8. ¿Cómo solicitar ayuda?
sections:
- local: chapter8/1
Expand Down
Loading
Loading