data reader for mnist #1325

qingqing01 · 2017-02-13T07:38:57Z

iterable的，每次调用返回一个batch的数据，每个pass/epoch的开始会shuffle数据。
每个pass/epoch的最后一个batch的样本数可能小于batch_size - 依然符合paddle以前的用法，这个对测试是有必要的 ( 虽然其他很多平台tf、torch、caffe等每个batch的样本数一定相等)。
只是针对MNIST用法，开始数据全部load到内存里。

大数据不一定都能load到内存，可能需要设计其他的缓存机制。

luotao1 · 2017-02-13T08:08:21Z

demo/mnist/reader.py

+
+
+class DataReader(object):
+    def __init__(self, data, labels, batch_size, is_shuffle=False):


如果有多个data或labels为空的情况，这个接口可以复用么

这个只是针对MNIST，不是通用的，其他的任务需要重新写。

helinwang · 2017-02-14T00:52:22Z

demo/mnist/reader.py

+            num_magic, n, num_row, num_col = struct.unpack(">IIII", f.read(16))
+            images = np.fromfile(f, 'ubyte', count=n * num_row * num_col).\
+                reshape(n, num_row * num_col).astype('float32')
+            images = images / 255.0 * 2.0 - 1.0


好奇images = images / 255.0 * 2.0 - 1.0这样把均值往0.0拉近一些，会比images = images / 255.0大概好多少？（比如说是98.55％ -> 98.57％或者98.5%->98.9%），非常大概的估计就好。

images = images / 255.0 * 2.0 - 1.0 -> 是归到[-1, 1]
images = images / 255.0 ->[0, 1] 两者结果得做实验对比吧，感觉相差可能不会太大。

这里是继续采用了原始mnist demo的处理方式。

这两个取值范围就不一样吧，一个是[-1, 1]，一个是[0, 1]。

helinwang · 2017-02-14T00:53:44Z

demo/mnist/reader.py

+
+def create_datasets(dir='./data/raw_data/'):
+    '''
+    数据download 和 load可以依据https://github.com/PaddlePaddle/Paddle/pull/872来简化


没有看到download的函数，感觉要是能自动download会方便用户使用一些。

#872 这个PR merge了之后会自动下载数据，这里就没有写。

data reader for mnist

d02adfa

luotao1 reviewed Feb 13, 2017

View reviewed changes

update

6a2bbf4

qingqing01 mentioned this pull request Feb 13, 2017

Data reader for api #1326

Closed

helinwang reviewed Feb 14, 2017

View reviewed changes

qingqing01 closed this Feb 24, 2017

qingqing01 deleted the api_reader branch July 7, 2017 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data reader for mnist #1325

data reader for mnist #1325

qingqing01 commented Feb 13, 2017 •

edited

Loading

luotao1 Feb 13, 2017

qingqing01 Feb 13, 2017

helinwang Feb 14, 2017 •

edited

Loading

qingqing01 Feb 14, 2017

hedaoyuan Feb 14, 2017

helinwang Feb 14, 2017

qingqing01 Feb 14, 2017



		class DataReader(object):
		def __init__(self, data, labels, batch_size, is_shuffle=False):

data reader for mnist #1325

data reader for mnist #1325

Conversation

qingqing01 commented Feb 13, 2017 • edited Loading

luotao1 Feb 13, 2017

Choose a reason for hiding this comment

qingqing01 Feb 13, 2017

Choose a reason for hiding this comment

helinwang Feb 14, 2017 • edited Loading

Choose a reason for hiding this comment

qingqing01 Feb 14, 2017

Choose a reason for hiding this comment

hedaoyuan Feb 14, 2017

Choose a reason for hiding this comment

helinwang Feb 14, 2017

Choose a reason for hiding this comment

qingqing01 Feb 14, 2017

Choose a reason for hiding this comment

qingqing01 commented Feb 13, 2017 •

edited

Loading

helinwang Feb 14, 2017 •

edited

Loading