Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it should be 8x8 not 7x7 #1

Open
dexter4455 opened this issue May 9, 2016 · 5 comments
Open

it should be 8x8 not 7x7 #1

dexter4455 opened this issue May 9, 2016 · 5 comments

Comments

@dexter4455
Copy link

dexter4455 commented May 9, 2016

Hey Jordi,

Thanks for the great work. Correct me if i'm wrong (and if so please explain how you got to that number). But on page 114 and in code of cnn.py you write "The resulting output of the convolution has a dimension of 7x7 as we are applying the 5x5 window to a 12x12 space with a stride size of 1" - i assume (because it wasn't mentioned, that padding is 0 - it must be)

But its actually (12-5+0)+1=8 , which is 8x8 not 7x7

Thanks, waiting for your reply!

@jorditorresBCN
Copy link
Owner

I'm on travel at ML Summer School 2016. I will review it next weekend, sorry for any inconvenience. regards, Jori

@jorditorresBCN
Copy link
Owner

Hello derek4455,

I discussed this with one of the engineers in our research team at BSC-CNS, Maurici Yagües, and finally we aggreed the following answer. First of all, let me thanks Maurici for his help in this topic!

You are right in pointing out that the explanation and the code are
not consistent. Your computation for the 8x8 layer is correct given
the drawings on previous pages and the explanation. However, in the
convolution function, the convolution layers are initialized with the
parameter padding="SAME". This makes the size of the output equal to
the size of the input by adding enough zero-padding, so no shrinking
is done in the convolution layers.

The shrinking is done only on the pooling layers going from 28 in the
input to 14, after h_pool1, and finally 7, after h_pool2.

In the explanation, and in the drawings, the case is described for
padding="VALID", that is no zero-padding is done so the output
dimensions follows the formula m - k + 1, where m is the input size
and k is the kernel size, leading to the shrinking.

You can find a more clear explanation in "Chapter 9. Convolutional
Networks" (specifically page 350) of the book
http://www.deeplearningbook.org/.

Thanks for pointing it out, this will be better stated in future
editions of the book.

@dexter4455
Copy link
Author

Thank you very much, that makes sense now!

@jorditorresBCN
Copy link
Owner

From: ShuhaoWang to@shuhao.wang
In Chapter: 5. MULTI-LAYER NEURAL NETWORKS IN TENSORFLOW, we have indicated the padding mode is 'SAME', meaning the input and output sizes of CONV() should be the same. The padding size for the first CONV can be calculated to be 2, therefore, the output size of the first CONV is 28_28_32 (not 24_24_32). After the MAX_POOL, the size is 14_14_32 (not 12_12_32). Then the input and output sizes of the second CONV are 14_14_32 and 14_14_64, respectively (padding size = 2). Thus after the MAX_POOL, the output size is 7_7_64.

I think the padding mode in TensorFlow is pretty tricky. I hope you may explain more about it in your book.

Best regards,
Yours sincerely,
Shuhao

@jorditorresBCN
Copy link
Owner

From: Ricky Park:

In chapter 5, first convolution layer changes 28x28-->28x28-(pooling)->14x14 but not 28x28-->24x24-->12x12.
Because the padding in the code is 'SAME'.
So second convolution layer changes 14x14-->14x14-(pooling)->7x7
You can check tensorflow official tutorial and my jupyter notebook(https://github.com/rickiepark/tfk-notebooks/blob/master/first-contact-with-tensorflow/chapter5_convolution_neural_network.ipynb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants