Merge pull request #515 from tyisme614/review_ep35

docs(zh-cn): Reviewed 35_loading-a-custom-dataset.srt
huggingface · Apr 11, 2023 · f631679 · f631679
2 parents 5b077d8 + 3d45b8b
commit f631679
Showing 1 changed file with 35 additions and 35 deletions.
diff --git a/subtitles/zh-CN/35_loading-a-custom-dataset.srt b/subtitles/zh-CN/35_loading-a-custom-dataset.srt
@@ -20,8 +20,8 @@
 
 5
 00:00:08,430 --> 00:00:09,750
-尽管 Hugging Face Hub 主持
-Although the Hugging Face Hub hosts
+尽管 Hugging Face Hub 上承载了
+Although the HuggingFace Hub hosts
 
 6
 00:00:09,750 --> 00:00:11,730
@@ -30,27 +30,27 @@ over a thousand public datasets,
 
 7
 00:00:11,730 --> 00:00:12,930
-你经常需要处理数据
+你可能仍然需要经常处理存储在你的笔记本电脑
 you'll often need to work with data
 
 8
 00:00:12,930 --> 00:00:15,900
-存储在你的笔记本电脑或某些远程服务器上。
+或存储在远程服务器上的数据。
 that is stored on your laptop or some remote server.
 
 9
 00:00:15,900 --> 00:00:18,060
-在本视频中，我们将探讨数据集库如何
+在本视频中，我们将探讨如何利用 Datasets 库
 In this video, we'll explore how the Datasets library
 
 10
 00:00:18,060 --> 00:00:20,310
-可用于加载不可用的数据集
+加载 Hugging Face Hub 以外
 can be used to load datasets that aren't available
 
 11
 00:00:20,310 --> 00:00:21,510
-在 Hugging Face Hub 上。
+的数据集。
 on the Hugging Face Hub.
 
 12
@@ -75,22 +75,22 @@ To load a dataset in one of these formats,
 
 16
 00:00:31,200 --> 00:00:32,730
-你只需要提供格式的名称
+你只需要向 load_dataset 函数
 you just need to provide the name of the format
 
 17
 00:00:32,730 --> 00:00:34,350
-到 load_dataset 函数，
+提供格式的名称，
 to the load_dataset function,
 
 18
 00:00:34,350 --> 00:00:35,790
-连同 data_files 参数
+并且连同 data_files 参数一起传入
 along with a data_files argument
 
 19
 00:00:35,790 --> 00:00:37,610
-指向一个或多个文件路径或 URL。
+该参数指向一个或多个文件路径或 URL。
 that points to one or more filepaths or URLs.
 
 20
@@ -105,7 +105,7 @@ In this example, we first download a dataset
 
 22
 00:00:45,960 --> 00:00:48,963
-关于来自 UCI 机器学习库的葡萄酒质量。
+该数据集是来自 UCI 机器学习库的葡萄酒质量数据。
 about wine quality from the UCI machine learning repository.
 
 23
@@ -150,7 +150,7 @@ so here we've also specified
 
 31
 00:01:06,750 --> 00:01:09,030
-分隔符是分号。
+分号作为分隔符。
 that the separator is a semi-colon.
 
 32
@@ -165,7 +165,7 @@ is loaded automatically as a DatasetDict object,
 
 34
 00:01:13,020 --> 00:01:15,920
-CSV 文件中的每一列都表示为一个特征。
+CSV 文件中的每一列都代表一个特征。
 with each column in the CSV file represented as a feature.
 
 35
@@ -175,7 +175,7 @@ If your dataset is located on some remote server like GitHub
 
 36
 00:01:20,280 --> 00:01:22,050
-或其他一些存储库，
+或其他一些数据仓库，
 or some other repository,
 
 37
@@ -205,12 +205,12 @@ This format is quite common in NLP,
 
 42
 00:01:35,100 --> 00:01:36,750
-你通常会找到书籍和戏剧
+你常常会发现书籍和戏剧
 and you'll typically find books and plays
 
 43
 00:01:36,750 --> 00:01:39,393
-只是一个包含原始文本的文件。
+只是一个包含原始文本的独立文件。
 are just a single file with raw text inside.
 
 44
@@ -220,7 +220,7 @@ In this example, we have a text file of Shakespeare plays
 
 45
 00:01:43,020 --> 00:01:45,330
-存储在 GitHub 存储库中。
+存储在 GitHub 仓库中。
 that's stored on a GitHub repository.
 
 46
@@ -245,12 +245,12 @@ As you can see, these files are processed line-by-line,
 
 50
 00:01:55,110 --> 00:01:57,690
-所以原始文本中的空行也被表示
+所以原始文本中的空行
 so empty lines in the raw text are also represented
 
 51
 00:01:57,690 --> 00:01:58,953
-作为数据集中的一行。
+也按照数据集中的一行表示。
 as a row in the dataset.
 
 52
@@ -270,12 +270,12 @@ where every row in the file is a separate JSON object.
 
 55
 00:02:09,510 --> 00:02:11,100
-对于这些文件，你可以加载数据集
+对于这些文件，你可以通过选择 JSON 加载脚本
 For these files, you can load the dataset
 
 56
 00:02:11,100 --> 00:02:13,020
-通过选择 JSON 加载脚本
+来加载数据集
 by selecting the JSON loading script
 
 57
@@ -285,12 +285,12 @@ and pointing the data_files argument to the file or URL.
 
 58
 00:02:17,160 --> 00:02:19,410
-在这个例子中，我们加载了一个 JSON 行文件
+在这个例子中，我们加载了一个多行 JSON 的文件
 In this example, we've loaded a JSON lines files
 
 59
 00:02:19,410 --> 00:02:21,710
-基于 Stack Exchange 问题和答案。
+其内容基于 Stack Exchange 问题和答案。
 based on Stack Exchange questions and answers.
 
 60
@@ -310,27 +310,27 @@ so the load_dataset function allow you to specify
 
 63
 00:02:31,200 --> 00:02:32,733
-要加载哪个特定密钥。
+要加载哪个特定关键词。
 which specific key to load.
 
 64
 00:02:33,630 --> 00:02:35,910
-例如，用于问答的 SQuAD 数据集
+例如，用于问答的 SQuAD 数据集有它的格式，
 For example, the SQuAD dataset for question and answering
 
 65
 00:02:35,910 --> 00:02:38,340
-有它的格式，我们可以通过指定来加载它
+我们可以通过指定我们感兴趣的数据字段
 has its format, and we can load it by specifying
 
 66
 00:02:38,340 --> 00:02:40,340
-我们对数据字段感兴趣。
+我们对 data 字段感兴趣。
 that we're interested in the data field.
 
 67
 00:02:41,400 --> 00:02:42,780
-最后一件事要提
+最后要和大家分享的内容是
 There is just one last thing to mention
 
 68
@@ -340,7 +340,7 @@ about all of these loading scripts.
 
 69
 00:02:44,910 --> 00:02:46,410
-你可以有不止一次分裂，
+你可以有不止一次数据切分，
 You can have more than one split,
 
 70
@@ -350,7 +350,7 @@ you can load them by treating data files as a dictionary,
 
 71
 00:02:49,080 --> 00:02:52,140
-并将每个拆分名称映射到其对应的文件。
+并将每个拆分的名称映射到其对应的文件。
 and map each split name to its corresponding file.
 
 72
@@ -360,22 +360,22 @@ Everything else stays completely unchanged
 
 73
 00:02:53,970 --> 00:02:55,350
-你可以看到一个加载的例子
+你可以看到一个例子，
 and you can see an example of loading
 
 74
 00:02:55,350 --> 00:02:58,283
-此 SQuAD 的训练和验证拆分均在此处。
+加载此 SQuAD 的训练和验证分解步骤都在这里。
 both the training and validation splits for this SQuAD here.
 
 75
 00:02:59,550 --> 00:03:02,310
-这样，你现在可以从笔记本电脑加载数据集，
+这样，你现在可以加载来自笔记本电脑的数据集，来自 Hugging Face Hub 的数据集，
 And with that, you can now load datasets from your laptop,
 
 76
 00:03:02,310 --> 00:03:04,653
-Hugging Face Hub，或任何其他地方。
+或来自任何其他地方的数据集。
 the Hugging Face Hub, or anywhere else want.
 
 77