Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InexactError when training "LeNet" on 1d image data #190

Open
cinvro opened this issue Apr 5, 2016 · 10 comments
Open

InexactError when training "LeNet" on 1d image data #190

cinvro opened this issue Apr 5, 2016 · 10 comments

Comments

@cinvro
Copy link

cinvro commented Apr 5, 2016

I am new to Mocha, and I am trying to modify the LeNet tutorial for my 1d image dataset, basically what I do is to slightly change the kernel size, and stride size as follows:


data_layer  = AsyncHDF5DataLayer(name="data", source="data/train.txt", batch_size=64, shuffle=true)
conv_layer  = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,1), bottoms=[:data], tops=[:conv])
pool_layer  = PoolingLayer(name="pool1", kernel=(2,1), stride=(2,1), bottoms=[:conv], tops=[:pool])
conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,1), bottoms=[:pool], tops=[:conv2])
pool2_layer = PoolingLayer(name="pool2", kernel=(2,1), stride=(2,1), bottoms=[:conv2], tops=[:pool2])
fc1_layer   = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1])
fc2_layer   = InnerProductLayer(name="ip2", output_dim=2, bottoms=[:ip1], tops=[:ip2])
loss_layer  = SoftmaxLossLayer(name="loss", bottoms=[:ip2,:label])

After the network is constructed, I get following error message:

04-Apr 23:17:53:INFO:root:## Performance on Validation Set after 0 iterations
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root:  Accuracy (avg over 15300) = 93.8627%
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root:
04-Apr 23:17:54:DEBUG:root:#DEBUG Entering solver loop
ERROR: LoadError: InexactError()
 in max_pooling_forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling/julia-impl.jl:34
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:93
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:84
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/net.jl:148
 in onestep_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:222
 in do_solve_loop at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:242
 in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:235
 in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib

Any idea why this happens?


My net looks like this:

net

@pluskid
Copy link
Owner

pluskid commented Apr 5, 2016

The line of code reporting InexactError is this line: https://github.com/pluskid/Mocha.jl/blob/master/src/layers/pooling/julia-impl.jl#L34

It is trying to assign a value to the mask, which is unsigned. If you try to assign an invalid value (e.g. a negative value), an InexactError will occur. My guessing was that the pooling range somehow goes out of range, making some negative value there. But looking at the visualization you pasted above, it seems perfectly valid. Can you maybe try to insert a print statement

println((maxh-1) * width + maxw-1)

right before that line to see what value we got that caused the error?

@cinvro
Copy link
Author

cinvro commented Apr 5, 2016

@pluskid you are right, I got -180, where maxh=0, maxw=0 and width=179.
What does that mean? Is that a problem of my data or a bug?

@pluskid
Copy link
Owner

pluskid commented Apr 6, 2016

It seems like some pooling region is empty. Just as a sanity check, can you change the kernel for the pooling layer from (2,1) to larger values like (3,1) to see if it runs? Thanks!

@cinvro
Copy link
Author

cinvro commented Apr 6, 2016

Thank you for the reply.
Yes. I got following error after changed the kernel size of pooling layer from (2,1) to (3,1).

ERROR: LoadError: AssertionError: is_similar_shape(params[j],net.states[i].parameters[j].blob)
 in load_network at /Users/cinvro/.julia/v0.4/Mocha/src/utils/io.jl:102
 in anonymous at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:158
 in jldopen at /Users/cinvro/.julia/v0.4/JLD/src/JLD.jl:245
 in load_snapshot at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:157
 in init_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:184
 in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:234
 in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib

@pluskid
Copy link
Owner

pluskid commented Apr 6, 2016

@cinvro That is due to previously saved snapshots. Can you remove the saved snapshot files and re-try again? Thanks!

@cinvro
Copy link
Author

cinvro commented Apr 6, 2016

@pluskid oh, I didn't realize that.
Now I get -179, where maxh=0, maxw=0 and width=178.

@pluskid
Copy link
Owner

pluskid commented Apr 9, 2016

@cinvro I checked the code and did not find the bug. It seems the pooling loop is not executed (otherwise maxh and maxw should not be zero). Can you at the same place print the values for hstart, hend, wstart, wend as well as val, maxval? On potential problem is that your matrix contains NaN. In this case, NaN > -Inf is false, so the pooling is unsuccessful.

@cinvro
Copy link
Author

cinvro commented Apr 12, 2016

@pluskid I got hstart=1,hend=1,wstart=89,wend=90 and maxval=-Inf.
I cannot print out val because it says val is undefined, which is very strange.

@cinvro
Copy link
Author

cinvro commented Apr 12, 2016

However, I can print out val inside the for loop, which gives me val = -Inf in this case.

@davidparks21
Copy link

I can reproduce this error when I do not set the neuron property on the convolutional layer. It took me a while to narrow it down, but once I set neuron=Neurons.ReLU() on the convolutional layer the InexactError (NaN value for maxval in function max_pooling_forward) went away.

I see that the code posted here also doesn't have a neuron defined on the convolutional layer, so I suspect the same is the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants