Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run train.py occur error #1

Open
zhenyezi opened this issue Sep 6, 2018 · 24 comments
Open

run train.py occur error #1

zhenyezi opened this issue Sep 6, 2018 · 24 comments

Comments

@zhenyezi
Copy link

zhenyezi commented Sep 6, 2018

when I run train.py ,I run into some error
File "/ghome/zhenye/ALFNet-master/keras_alfnet/data_generators.py", line 7, in
from .utils.cython_bbox import bbox_overlaps
ImportError: No module named cython_bbox
besides,
File "/ghome/zhenye/ALFNet-master/keras_alfnet/data_generators.py", line 8, in
from .utils.bbox import box_op
ImportError: No module named bbox
I want to ask the author whether miss the two functions or I miss some important operations?

@zhangxydlut
Copy link

I have the same problem.

@MADONOKOUKI
Copy link

@zhenyezi @zhenyezi
num of training samples: 2112
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "train.py", line 35, in
from keras_alfnet.model.model_1step import Model_1step
File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/model/model_1step.py", line 1, in
from .base_model import Base_model
File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/model/base_model.py", line 2, in
from keras_alfnet import data_generators
File "/var/docker/share/madono/summer/ALFNet/keras_alfnet/data_generators.py", line 7, in
from .utils.cython_bbox import bbox_overlaps
ImportError: No module named cython_bbox
I also have similar problems...

@pnnnnnnn
Copy link

git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git and cd py-faster-rcnn/lib and make
then copy the utils document from py-faster-rcnn to the utils document from ALFNet
then uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps"
it works for me...

@yongqiangzhang1
Copy link

@pnnnnnnn , Is the trained results right?

@pnnnnnnn
Copy link

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

@yongqiangzhang1
Copy link

@pnnnnnnn , what is the meaning of "uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps""? comment or uncomment?

@pnnnnnnn
Copy link

@pnnnnnnn , what is the meaning of "uncomment all "from .utils.bbox import box_op" and change "box_op" to "bbox_overlaps""? comment or uncomment?

oh, sorry, it's "comment"
comment all "from .utils.bbox import box_op"
change the remaining "box_op" to "bbox_overlaps"

@yongqiangzhang1
Copy link

@pnnnnnnn do you check the box_op and bbox_overlaps have the same function?

@pnnnnnnn
Copy link

@pnnnnnnn do you check the box_op and bbox_overlaps have the same function?

there's no box_op function

@VideoObjectSearch
Copy link

@yongqiangzhang1 @zhangxydlut @MADONOKOUKI @pnnnnnnn
Please try this compiled document
utils.zip

@yongqiangzhang1
Copy link

yongqiangzhang1 commented Sep 27, 2018

"No module named cython_bbox" and "No module named bbox" are solved by your compiled utils.zip files. But there is a new error from nms.gpu_nms import gpu_nms; ImportError: No module named gpu_nms, can you compile the nms and upload the compiled nms document. Thanks.

@VideoObjectSearch
Copy link

@yongqiangzhang1
You can have a try.
nms.zip

@yongqiangzhang1
Copy link

nms works, thank you very much.

@Chen94yue
Copy link

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

@pnnnnnnn
Copy link

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

@youtang1993
Copy link

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

Hi, still the question, did you get the same MR as the paper? The best score I have got is 16.33. A BIG GAP.

@pnnnnnnn
Copy link

pnnnnnnn commented Nov 5, 2018

@pnnnnnnn , Is the trained results right?

still training, for now i've trained for 70 epochs and the total loss dropped from 0.66 to 0.19

Did you get the same MR as the paper?

not yet(?), i've trained for 200 epochs(2k iterations per epoch, batchsize 4, gpu 1050ti) and got 16.53 on the best model, and now i'm decreasing the lr from 1e-4 to 1e-5 for 100 epochs

Hi, still the question, did you get the same MR as the paper? The best score I have got is 16.33. A BIG GAP.

the best i've got is 13.18, maybe it's because my small batchsize(only 4) that i can't reach 12.01

@ou525
Copy link

ou525 commented Dec 8, 2018

hi, when i run the test.py, also have the same problem. i use python3.5 @VideoObjectSearch
Traceback (most recent call last):
File "test.py", line 32, in
from keras_alfnet.model.model_1step import Model_1step
File "/home/ou/workplace/ALFNet/keras_alfnet/model/model_1step.py", line 1, in
from .base_model import Base_model
File "/home/ou/workplace/ALFNet/keras_alfnet/model/base_model.py", line 2, in
from keras_alfnet import data_generators
File "/home/ou/workplace/ALFNet/keras_alfnet/data_generators.py", line 7, in
from .utils.cython_bbox import bbox_overlaps
ImportError: /home/ou/workplace/ALFNet/keras_alfnet/utils/cython_bbox.so: undefined symbol: _Py_ZeroStruct

@m1nt07
Copy link

m1nt07 commented Jan 2, 2019

@yongqiangzhang1
You can have a try.
nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

i use CUDA9.0, how can i compile to make it work?

@xiaoshang123
Copy link

@yongqiangzhang1
You can have a try.
nms.zip
hi,when i use the nms.zip,i solve the problem "ImportError: No module named gpu_nms",but the new problem comes:
Traceback (most recent call last):
File "train.py", line 40, in
from keras_alfnet.model.model_2step import Model_2step
File "/home/by/ma/ALFNet-master/keras_alfnet/model/model_2step.py", line 7, in
from keras_alfnet import bbox_process
File "/home/by/ma/ALFNet-master/keras_alfnet/bbox_process.py", line 7, in
from nms_wrapper import nms
File "/home/by/ma/ALFNet-master/keras_alfnet/nms_wrapper.py", line 9, in
from nms.cpu_nms import cpu_nms
ImportError: /home/by/ma/ALFNet-master/keras_alfnet/nms/cpu_nms.so: undefined symbol: PyFPE_jbuf
how can i solve it? Thank you.

@weizheliu
Copy link

@yongqiangzhang1
You can have a try.
nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

i use CUDA9.0, how can i compile to make it work?

I meet the same problem, do you find the cuda 9 version of nms?

@whitenightwu
Copy link

@yongqiangzhang1
You can have a try.
nms.zip

hi, @VideoObjectSearch , when i use the nms.zip, i have the problem:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
i use CUDA9.0, how can i compile to make it work?

I meet the same problem, do you find the cuda 9 version of nms?

you can try nms.zip

@xiefeiwhu
Copy link

hi, when i run the test.py, also have the same problem. i use python3.5 @VideoObjectSearch
Traceback (most recent call last):
File "test.py", line 32, in
from keras_alfnet.model.model_1step import Model_1step
File "/home/ou/workplace/ALFNet/keras_alfnet/model/model_1step.py", line 1, in
from .base_model import Base_model
File "/home/ou/workplace/ALFNet/keras_alfnet/model/base_model.py", line 2, in
from keras_alfnet import data_generators
File "/home/ou/workplace/ALFNet/keras_alfnet/data_generators.py", line 7, in
from .utils.cython_bbox import bbox_overlaps
ImportError: /home/ou/workplace/ALFNet/keras_alfnet/utils/cython_bbox.so: undefined symbol: _Py_ZeroStruct

hi, i meet the same question, have you solved it?

@nankeermeng
Copy link

@yongqiangzhang1 @pnnnnnnn
follow the code I can train 150 epochs, but when i run the test.py using the train result resnet_e3_l1.15433712553.hdf5 , I cannot get test result, the val_det.txt is empty, why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests