-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请教下rcf训练的一些问题 #50
Comments
我没做过类似这样的实验,就说一下自己知道的事情。
|
@yun-liu 非常感谢啊
layer { name: "upsample_8" type: "Deconvolution" bottom: "score-dsn4" top: "score-dsn4-up"
|
|
@yun-liu 哈哈,用了rcf的sigmoid_cross_entropy_loss_layer_caffe.cpp,提取的边缘果然犀利多了。我把后面的卷积层通道数目改了下,conv1和conv2的通道数还是和原来一样,conv3:144,conv4:160 conv5:160,这样子最终模型8.39MB小了很多,但是好像提取的噪声边缘也对了,可能训练次数多一点会好一点,看2万次的比6千次的就好一点,最大可能就是高层卷积通道数目少了,可能对全局的边缘提取效果差了,我试试把除了fuse_loss和dsn5_loss之外的其他loss的权重改小是不是也会有作用啊 |
恭喜恭喜 |
@yun-liu 效果持续提升ing,不过还是有一些问题请教下你
F0810 15:14:40.643846 12080 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered I0810 15:14:40.632021 12080 solver.cpp:224] Iteration 0 (0 iter/s, 0.790295s/50 iters), loss = 2907.62 |
|
我把我训练的模型和solver.prototxt放到百度云了,你有空能帮我看一下吗? 感觉很诡异啊
|
你的高层既然不能用5stage-vgg.caffemodel初始化,那有没有用随机初始化呢? |
我就直接 每指定默认就都是0? 按理说训练的足够多应该也会有用的,就算都用0初始化 |
@piaobuliao |
想问下 (1.90468 iter/s, 26.2511s/50 iters), loss = 15294.9,这个是每训练50次 平均的loss吧, 我绘制出图像是这种的,loss逐渐减小,这个应该是朝着收敛地方向吧。 dsn1_loss |
|
总的loss是平均后的loss,dsn1_loss每次迭代的loss,没有平均 |
你好,我想知道这个训练的输入大小是多少,因为数据扩充生成了很多不同大小的图,直接shuffle拼接导致大小不一,好像不能进行训练。 |
@zHanami 您好,抱歉,不太理解你的意思,为什么要大小一样?shuffle拼接是什么意思? |
@yun-liu 数据不是做了[0.5, 1, 1.5] Scale的数据增强吗,生成的每个Batch我的理解是NCHW,大小不一样如何去训练? |
@zHanami caffe除了batch_size,caffe还有个参数叫iter_size,这里设置batch_size为1,iter_size是10,效果是和batch_size=10是一样的。这样就不会有这个问题了 |
@yun-liu 原来是这样,非常感谢!是我才疏学浅了,我直接把图片和Label Resize成固定大小,当时还在想这么做岂不是rescale扩充白做了。。。最后还弄得不收敛~万分感谢,我再去试试。 |
@yun-liu 你好我是在https://github.com/happynear/caffe-windows的caffe代码上
添加image_labelmap_data_layer等层,但是没有加AutoCrop层,参考#24 用Crop层替代
成功编译出libcaffe.lib和caffe.exe(没有编译caffe的python版本),目前用http://mftp.mmcheng.net/liuyun/rcf/data/HED-BSDS.tar.gz 这个数据训练。
因为没有编译caffe的python版本,所以没有用solve.py,
直接"caffe.exe" train --solver=/solver.prototxt -gpu=0
一些log如下: fuse_loss在几千到几万波动
0805 23:26:13.153472 20696 solver.cpp:336] Iteration 0, Testing net (#0)
I0805 23:26:16.867991 20696 solver.cpp:224] Iteration 0 (-2.73941e-35 iter/s, 3.73482s/50 iters), loss = 542030
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #0: dsn1_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #1: dsn2_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #2: dsn3_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #3: dsn4_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #4: dsn5_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 solver.cpp:243] Train net output #5: fuse_loss = 125365 (* 1 = 125365 loss)
I0805 23:26:16.867991 20696 sgd_solver.cpp:137] Iteration 0, lr = 0.0001
I0805 23:26:17.039825 20696 sgd_solver.cpp:200] weight diff/data:nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 0.000000 nan nan 0.000000 nan nan nan 0.000000 nan nan nan 0.000000 nan nan nan 0.000000 0.000000
I0805 23:28:52.964601 20696 solver.cpp:224] Iteration 50 (0.320336 iter/s, 156.086s/50 iters), loss = 394900
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #0: dsn1_loss = 4003.08 (* 1 = 4003.08 loss)
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #1: dsn2_loss = 18340.7 (* 1 = 18340.7 loss)
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #2: dsn3_loss = 18340.7 (* 1 = 18340.7 loss)
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #3: dsn4_loss = 18340.7 (* 1 = 18340.7 loss)
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #4: dsn5_loss = 18340.7 (* 1 = 18340.7 loss)
I0805 23:28:52.964601 20696 solver.cpp:243] Train net output #5: fuse_loss = 3454.18 (* 1 = 3454.18 loss)
......
......
......
I0806 08:36:02.148245 20696 solver.cpp:224] Iteration 10900 (0.338876 iter/s, 147.546s/50 iters), loss = 372257
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #0: dsn1_loss = 2914.58 (* 1 = 2914.58 loss)
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #1: dsn2_loss = 18340.7 (* 1 = 18340.7 loss)
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #2: dsn3_loss = 18340.7 (* 1 = 18340.7 loss)
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #3: dsn4_loss = 18340.7 (* 1 = 18340.7 loss)
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #4: dsn5_loss = 18340.7 (* 1 = 18340.7 loss)
I0806 08:36:02.148245 20696 solver.cpp:243] Train net output #5: fuse_loss = 2914.54 (* 1 = 2914.54 loss)
I0806 08:36:02.148245 20696 sgd_solver.cpp:137] Iteration 10900, lr = 1e-05
I0806 08:36:02.288869 20696 sgd_solver.cpp:200] weight diff/data:nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 0.001091 nan nan 0.000000 nan nan nan 0.000000 nan nan nan 0.000000 nan nan nan 0.000000 0.000010
I0806 08:38:34.147261 20696 solver.cpp:224] Iteration 10950 (0.328973 iter/s, 151.988s/50 iters), loss = 385102
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #0: dsn1_loss = 12236.3 (* 1 = 12236.3 loss)
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #1: dsn2_loss = 51659.6 (* 1 = 51659.6 loss)
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #2: dsn3_loss = 51659.6 (* 1 = 51659.6 loss)
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #3: dsn4_loss = 51659.6 (* 1 = 51659.6 loss)
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #4: dsn5_loss = 51659.6 (* 1 = 51659.6 loss)
I0806 08:38:34.147261 20696 solver.cpp:243] Train net output #5: fuse_loss = 12235.3 (* 1 = 12235.3 loss)
I0806 08:38:34.147261 20696 sgd_solver.cpp:137] Iteration 10950, lr = 1e-05
然后在OpenCV上用训练了一万次的模型提取边缘试试看,结果是全黑的图像,这说明我训练次数不够多还是其他问题呢?用你的rcf_pretrained_bsds.caffemodel的提取边缘的效果很好。
所以我想请教你几个问题
必须要用solve.py的方式来训练吗? 我看solve.py主要做了两个事情:
一个是用interp_surgery函数给deconvolution层初始化,想问下这个是必须的吗?deconvolution能像普通的卷积一样能通过训练自己学习出来吗? 看https://www.zhihu.com/question/63890195/answer/214223863,deconvolution也可以用 weight_filler: { type: "bilinear" },这个是不是和interp_surgery函数同样的作用?
还有一个是从'5stage-vgg.caffemodel'恢复 weights做fine-tuning,我想问下不做fine-tuning,直接HED-BSDS数据训练可行吗?还有一种情况就是比如我自己修改了一个模型结构完全不能用'5stage-vgg.caffemodel'做fine-tuning的话,那怎么办?
还有我看了caffe-windows的sigmoid_cross_entropy_loss_layer_caffe.cpp是和官方的caffe的代码一样,但是和你项目里的
sigmoid_cross_entropy_loss_layer_caffe.cpp是不一样的,我一开始没注意到这个差异,想问下,训练rcf模型必须用你这个sigmoid_cross_entropy_loss_layer_caffe.cpp吗?
是不是训练的迭代次数不够多?
考虑到我不是fine-tuning,我把base_lr从 1e-6 增大到 1e-4,这个影响大吗?
The text was updated successfully, but these errors were encountered: