You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: beginner_source/quickstart/dataquickstart_tutorial.py
+18-19Lines changed: 18 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@
23
23
#
24
24
# If not properly organized, code for processing data samples can quickly get messy and become hard to maintain. Since different model architectures can be applied to many data types, we ideally want our dataset code to be decoupled from our model training code. To this end, PyTorch provides a simple Datasets interface for linking managing collections of data.
25
25
#
26
-
# A whole set of example datasets such as Fashion MNIST that implement this interface are built into PyTorch extension libraries. They are subclasses of torch.utils.data.Dataset that have parameters and functions specific to the type of data and the particular dataset. The actual data samples can be downloaded from the internet.These are useful for benchmarking and testing your models before training on your own custom datasets.
26
+
# A whole set of example datasets such as Fashion MNIST that implement this interface are built into PyTorch extension libraries. They are subclasses of `torch.utils.data.Dataset` that have parameters and functions specific to the type of data and the particular dataset. The actual data samples can be downloaded from the internet.These are useful for benchmarking and testing your models before training on your own custom datasets.
27
27
#
28
28
# You can find some of them below.
29
29
#
@@ -36,7 +36,7 @@
36
36
# Iterating through a Dataset
37
37
# -----------------
38
38
#
39
-
# Once we have a Dataset we can index it manually like a list `clothing[index]`.
39
+
# Once we have a Dataset ``ds``, we can index it manually like a list: ``ds[index]``.
40
40
#
41
41
# Here is an example of how to load the `Fashion-MNIST <https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/>`_ dataset from torch vision.
42
42
# `Fashion-MNIST <https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/>`_ is a dataset of Zalando’s article images consisting of of 60,000 training examples and 10,000 test examples.
# To work with your own data lets look at the a simple custom image Dataset implementation:
80
+
# To work with your own data, we need to implement a custom class that inherits from ``Dataset```. Let's look at a custom image dataset implementation. In this example, we have a number of images stored in a directory, and their labels stored separately in CSV annotation file.
80
81
#
81
82
82
83
importos
@@ -114,7 +115,7 @@ def __getitem__(self, idx):
114
115
# Imports
115
116
# -------
116
117
#
117
-
# Import os for file handling, torch for PyTorch, `pandas <https://pandas.pydata.org/>`_ for loading labels, `torch vision <https://pytorch.org/blog/pytorch-1.7-released/>`_ to read image files, and Dataset to implement the Dataset interface.
118
+
# Import `os` for file handling, torch for PyTorch, `pandas <https://pandas.pydata.org/>`_ for loading labels, `torch vision <https://pytorch.org/blog/pytorch-1.7-released/>`_ to read image files, and Dataset to implement the Dataset interface.
118
119
#
119
120
# Example:
120
121
#
@@ -130,16 +131,13 @@ def __getitem__(self, idx):
130
131
# Init
131
132
# -----------------
132
133
#
133
-
# The init function is used for all the first time operations when our Dataset is loaded. In this case we use it to load our annotation labels to memory and the keep track of directory of our image file. Note that different types of data can take different init inputs you are not limited to just an annotations file, directory_path and transforms but for images this is a standard practice.
134
-
# A sample csv annotations file may look as follows:
135
-
#
136
-
# tshirt1.jpg, 0
137
-
#
138
-
# tshirt2.jpg, 0
134
+
# The init function is used for all the first time operations when our Dataset is loaded. In this case we use it to load our annotation labels to memory and the keep track of directory of our image file. Note that different types of data can take different init inputs. You are not limited to just an annotations file, directory path and transforms, but for images this is a standard practice.
135
+
# A sample csv annotations file may look as follows: ::
# The __len__ function is very simple here we just need to return the number of samples in our dataset.
154
+
# The __len__ function is very simple, we just need to return the number of samples in our dataset.
157
155
#
158
156
# Example:
159
157
@@ -164,9 +162,9 @@ def __len__(self):
164
162
# __getitem__
165
163
# -----------------
166
164
#
167
-
# The __getitem__ function is the most important function in the Datasets interface this. It takes a tensor or an index as input and returns a loaded sample from you dataset at from the given indecies.
165
+
# The __getitem__ function is the most important function in the Datasets interface. It takes a tensor or an index as input and returns a loaded sample from you dataset at the given indices.
168
166
#
169
-
# In this sample if provided a tensor we convert the tensor to a list containing our index. We then load the file at the given index from our image directory as well as the image label from our pandas annotations DataFrame. This image and label are then wrapped in a single sample dictionary which we can apply a Transform on and return. To learn more about Transforms see the next section of the Blitz.
167
+
# If provided a tensor as an index, we convert the tensor to a list first. We then load the file at the given index from our image directory, as well as the image label from our pandas annotations DataFrame. This image and label are then wrapped in a single sample dictionary which we can apply a Transform on and return. To learn more about Transforms see the next section of the Blitz.
170
168
#
171
169
# Example:
172
170
#
@@ -190,16 +188,17 @@ def __getitem__(self, idx):
190
188
# Now we have a organized mechansim for managing data which is great, but there is still a lot of manual work we would have to do train a model with our Dataset.
191
189
#
192
190
# For example we would have to manually maintain the code for:
191
+
#
193
192
# * Batching
194
193
# * Suffling
195
194
# * Parallel batch distribution
196
195
#
197
-
# The PyTorch Dataloader *torch.utils.data.DataLoader* is an iterator that handles all of this complexity for us enabling us to load a dataset and focusing on train our model.
196
+
# The PyTorch Dataloader ``torch.utils.data.DataLoader`` is an iterator that handles all of this complexity for us, enabling us to load a dataset and focus on training our model.
0 commit comments