Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetson TX2 issues running DeDe #278

Closed
2 tasks
RiC0MD opened this issue Mar 17, 2017 · 20 comments
Closed
2 tasks

Jetson TX2 issues running DeDe #278

RiC0MD opened this issue Mar 17, 2017 · 20 comments
Assignees
Labels

Comments

@RiC0MD
Copy link

RiC0MD commented Mar 17, 2017

Configuration

  • Version of DeepDetect:
    Ubuntu 16.04.2 LTS on Jetson TX2
  • Commit (shown by the server when starting):
    b42115e

Your question / the problem you're facing:

Error message (if any) / steps to reproduce the problem:

  • list of API calls:
    https://deepdetect.com/tutorials/txt-training/
    If I follow the directions exactly, and leave the archive in models/n20, execute the two wget API calls, results in the server consuming 100% of the CPU, and then eventually crashing, with no output to the console (Example 1 below)

Removing the archive, cleaning and rerunning, looks like its processing, I can clearly see it mentioning the Jetson cuda cores, but the training results in an error, Example 2 below.

  • Server log output:
    Example 1:

INFO - 18:07:27 - Creating layer / name=inputl / type=MemoryData
INFO - 18:07:27 - Creating Layer inputl
INFO - 18:07:27 - inputl -> data
INFO - 18:07:27 - inputl -> label
INFO - 18:07:27 - Setting up inputl
INFO - 18:07:27 - Top shape: 359 88631 1 1 (31818529)
INFO - 18:07:27 - Top shape: 359 (359)
INFO - 18:07:27 - Memory required for data: 127275552
INFO - 18:07:27 - Creating layer / name=ip0 / type=InnerProduct
INFO - 18:07:27 - Creating Layer ip0
INFO - 18:07:27 - ip0 <- data
Killed

Example 2:
DeepDetect [ commit b42115e ]

INFO - 18:10:13 - Running DeepDetect HTTP server on localhost:8080
loaded vocabulary of size=88631
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0317 18:10:38.373723 14722 caffelib.cc:120] instantiating model template mlp
I0317 18:10:38.373878 14722 caffelib.cc:124] source=../templates/caffe//mlp/
I0317 18:10:38.373900 14722 caffelib.cc:125] dest=models/n20/mlp.prototxt

INFO - 18:10:38 - Fri Mar 17 18:10:38 2017 UTC - 127.0.0.1 "PUT /services/n20" 201 2099

INFO - 18:10:38 - Fri Mar 17 18:10:38 2017 UTC - 127.0.0.1 "POST /train" 201 0
I0317 18:10:38.557075 14736 txtinputfileconn.cc:68] txtinputfileconn: list subdirs size=20
I0317 18:10:51.864900 14736 txtinputfileconn.cc:186] vocabulary size=88631
data split test size=3770 / remaining data size=15078
vocab size=88631
I0317 18:11:04.393987 14736 caffelib.cc:2583] user batch_size=300 / inputc batch_size=15078
I0317 18:11:04.394039 14736 caffelib.cc:2620] batch_size=359 / test_batch_size=290 / test_iter=13

INFO - 18:11:04 - Device id: 0
INFO - 18:11:04 - Major revision number: 6
INFO - 18:11:04 - Minor revision number: 2
INFO - 18:11:04 - Name: GP10B
INFO - 18:11:04 - Total global memory: 8235577344
INFO - 18:11:04 - Total shared memory per block: 49152
INFO - 18:11:04 - Total registers per block: 32768
INFO - 18:11:04 - Warp size: 32
INFO - 18:11:04 - Maximum memory pitch: 2147483647
INFO - 18:11:04 - Maximum threads per block: 1024
INFO - 18:11:04 - Maximum dimension of block: 1024, 1024, 64
INFO - 18:11:04 - Maximum dimension of grid: 2147483647, 65535, 65535
INFO - 18:11:04 - Clock rate: 1300500
INFO - 18:11:04 - Total constant memory: 65536
INFO - 18:11:04 - Texture alignment: 512
INFO - 18:11:04 - Concurrent copy and execution: Yes
INFO - 18:11:04 - Number of multiprocessors: 2
INFO - 18:11:04 - Kernel execution timeout: No
INFO - 18:11:04 - Initializing solver from parameters:
INFO - 18:11:04 - Creating training net specified in net_param.
INFO - 18:11:04 - The NetState phase (0) differed from the phase (1) specified by a rule in layer inputlt
INFO - 18:11:04 - The NetState phase (0) differed from the phase (1) specified by a rule in layer losst
INFO - 18:11:04 - Initializing net from parameters:

INFO - 18:11:04 - Creating layer / name=inputl / type=MemoryData
INFO - 18:11:04 - Creating Layer inputl
INFO - 18:11:04 - inputl -> data
INFO - 18:11:04 - inputl -> label
INFO - 18:11:04 - Setting up inputl
INFO - 18:11:04 - Top shape: 359 88631 1 1 (31818529)
INFO - 18:11:04 - Top shape: 359 (359)
INFO - 18:11:04 - Memory required for data: 127275552
INFO - 18:11:04 - Creating layer / name=ip0 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip0
INFO - 18:11:04 - ip0 <- data
INFO - 18:11:04 - ip0 -> ip0
INFO - 18:11:04 - Setting up ip0
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 127562752
INFO - 18:11:04 - Creating layer / name=act0 / type=ReLU
INFO - 18:11:04 - Creating Layer act0
INFO - 18:11:04 - act0 <- ip0
INFO - 18:11:04 - act0 -> ip0 (in-place)
INFO - 18:11:04 - Setting up act0
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 127849952
INFO - 18:11:04 - Creating layer / name=drop0 / type=Dropout
INFO - 18:11:04 - Creating Layer drop0
INFO - 18:11:04 - drop0 <- ip0
INFO - 18:11:04 - drop0 -> ip0 (in-place)
INFO - 18:11:04 - Setting up drop0
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 128137152
INFO - 18:11:04 - Creating layer / name=ip1 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip1
INFO - 18:11:04 - ip1 <- ip0
INFO - 18:11:04 - ip1 -> ip1
INFO - 18:11:04 - Setting up ip1
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 128424352
INFO - 18:11:04 - Creating layer / name=act1 / type=ReLU
INFO - 18:11:04 - Creating Layer act1
INFO - 18:11:04 - act1 <- ip1
INFO - 18:11:04 - act1 -> ip1 (in-place)
INFO - 18:11:04 - Setting up act1
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 128711552
INFO - 18:11:04 - Creating layer / name=drop1 / type=Dropout
INFO - 18:11:04 - Creating Layer drop1
INFO - 18:11:04 - drop1 <- ip1
INFO - 18:11:04 - drop1 -> ip1 (in-place)
INFO - 18:11:04 - Setting up drop1
INFO - 18:11:04 - Top shape: 359 200 (71800)
INFO - 18:11:04 - Memory required for data: 128998752
INFO - 18:11:04 - Creating layer / name=ip2 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip2
INFO - 18:11:04 - ip2 <- ip1
INFO - 18:11:04 - ip2 -> ip2
INFO - 18:11:04 - Setting up ip2
INFO - 18:11:04 - Top shape: 359 20 (7180)
INFO - 18:11:04 - Memory required for data: 129027472
INFO - 18:11:04 - Creating layer / name=loss / type=SoftmaxWithLoss
INFO - 18:11:04 - Creating Layer loss
INFO - 18:11:04 - loss <- ip2
INFO - 18:11:04 - loss <- label
INFO - 18:11:04 - loss -> loss
INFO - 18:11:04 - Creating layer / name=loss / type=Softmax
INFO - 18:11:04 - Setting up loss
INFO - 18:11:04 - Top shape: (1)
INFO - 18:11:04 - with loss weight 1
INFO - 18:11:04 - Memory required for data: 129027476
INFO - 18:11:04 - loss needs backward computation.
INFO - 18:11:04 - ip2 needs backward computation.
INFO - 18:11:04 - drop1 needs backward computation.
INFO - 18:11:04 - act1 needs backward computation.
INFO - 18:11:04 - ip1 needs backward computation.
INFO - 18:11:04 - drop0 needs backward computation.
INFO - 18:11:04 - act0 needs backward computation.
INFO - 18:11:04 - ip0 needs backward computation.
INFO - 18:11:04 - inputl does not need backward computation.
INFO - 18:11:04 - This network produces output loss
INFO - 18:11:04 - Network initialization done.
INFO - 18:11:04 - Creating test net (#0) specified by net_param
INFO - 18:11:04 - The NetState phase (1) differed from the phase (0) specified by a rule in layer inputl
INFO - 18:11:04 - The NetState phase (1) differed from the phase (0) specified by a rule in layer loss
INFO - 18:11:04 - Initializing net from parameters:

INFO - 18:11:04 - Creating layer / name=inputlt / type=MemoryData
INFO - 18:11:04 - Creating Layer inputlt
INFO - 18:11:04 - inputlt -> data
INFO - 18:11:04 - inputlt -> label
INFO - 18:11:04 - Setting up inputlt
INFO - 18:11:04 - Top shape: 290 88631 1 1 (25702990)
INFO - 18:11:04 - Top shape: 290 (290)
INFO - 18:11:04 - Memory required for data: 102813120
INFO - 18:11:04 - Creating layer / name=ip0 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip0
INFO - 18:11:04 - ip0 <- data
INFO - 18:11:04 - ip0 -> ip0
INFO - 18:11:04 - Setting up ip0
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 103045120
INFO - 18:11:04 - Creating layer / name=act0 / type=ReLU
INFO - 18:11:04 - Creating Layer act0
INFO - 18:11:04 - act0 <- ip0
INFO - 18:11:04 - act0 -> ip0 (in-place)
INFO - 18:11:04 - Setting up act0
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 103277120
INFO - 18:11:04 - Creating layer / name=drop0 / type=Dropout
INFO - 18:11:04 - Creating Layer drop0
INFO - 18:11:04 - drop0 <- ip0
INFO - 18:11:04 - drop0 -> ip0 (in-place)
INFO - 18:11:04 - Setting up drop0
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 103509120
INFO - 18:11:04 - Creating layer / name=ip1 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip1
INFO - 18:11:04 - ip1 <- ip0
INFO - 18:11:04 - ip1 -> ip1
INFO - 18:11:04 - Setting up ip1
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 103741120
INFO - 18:11:04 - Creating layer / name=act1 / type=ReLU
INFO - 18:11:04 - Creating Layer act1
INFO - 18:11:04 - act1 <- ip1
INFO - 18:11:04 - act1 -> ip1 (in-place)
INFO - 18:11:04 - Setting up act1
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 103973120
INFO - 18:11:04 - Creating layer / name=drop1 / type=Dropout
INFO - 18:11:04 - Creating Layer drop1
INFO - 18:11:04 - drop1 <- ip1
INFO - 18:11:04 - drop1 -> ip1 (in-place)
INFO - 18:11:04 - Setting up drop1
INFO - 18:11:04 - Top shape: 290 200 (58000)
INFO - 18:11:04 - Memory required for data: 104205120
INFO - 18:11:04 - Creating layer / name=ip2 / type=InnerProduct
INFO - 18:11:04 - Creating Layer ip2
INFO - 18:11:04 - ip2 <- ip1
INFO - 18:11:04 - ip2 -> ip2
INFO - 18:11:04 - Setting up ip2
INFO - 18:11:04 - Top shape: 290 20 (5800)
INFO - 18:11:04 - Memory required for data: 104228320
INFO - 18:11:04 - Creating layer / name=losst / type=Softmax
INFO - 18:11:04 - Creating Layer losst
INFO - 18:11:04 - losst <- ip2
INFO - 18:11:04 - losst -> losst
INFO - 18:11:04 - Setting up losst
INFO - 18:11:04 - Top shape: 290 20 (5800)
INFO - 18:11:04 - Memory required for data: 104251520
INFO - 18:11:04 - losst does not need backward computation.
INFO - 18:11:04 - ip2 does not need backward computation.
INFO - 18:11:04 - drop1 does not need backward computation.
INFO - 18:11:04 - act1 does not need backward computation.
INFO - 18:11:04 - ip1 does not need backward computation.
INFO - 18:11:04 - drop0 does not need backward computation.
INFO - 18:11:04 - act0 does not need backward computation.
INFO - 18:11:04 - ip0 does not need backward computation.
INFO - 18:11:04 - inputlt does not need backward computation.
INFO - 18:11:04 - This network produces output label
INFO - 18:11:04 - This network produces output losst
INFO - 18:11:04 - Network initialization done.
I0317 18:11:05.002513 14736 caffelib.cc:1614] filling up net prior to training
INFO - 18:11:04 - Solver scaffolding done.
ERROR - 18:13:09 - service n20 training status call failed

ERROR - 18:13:09 - {"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"./include/caffe/syncedmem.hpp:26 / Check failed (custom): *ptr"}

INFO - 18:13:09 - Fri Mar 17 18:13:09 2017 UTC - 127.0.0.1 "GET /train?service=n20&job=1" 200 2

Very new to deep learning/machine learning, hoping to learn more by getting everything running on my jetson dev kit and utilize it via C# from my code, even though I was able to get everything compiled, seems like something might still be amiss.

ctest

Test project /usr/src/deepdetect/build
Start 1: ut_apidata
1/6 Test #1: ut_apidata ....................... Passed 0.50 sec
Start 2: ut_conn
2/6 Test #2: ut_conn ..........................***Failed 1.40 sec
Start 3: ut_jsonapi
3/6 Test #3: ut_jsonapi ....................... Passed 1.64 sec
Start 4: ut_caffe_mlp
4/6 Test #4: ut_caffe_mlp ..................... Passed 0.14 sec
Start 5: ut_caffeapi
5/6 Test #5: ut_caffeapi ......................***Exception: Other 1.82 sec
Start 6: ut_httpapi
6/6 Test #6: ut_httpapi .......................***Exception: Other 0.46 sec

50% tests passed, 3 tests failed out of 6

Total Test time (real) = 5.97 sec

The following tests FAILED:
2 - ut_conn (Failed)
5 - ut_caffeapi (OTHER_FAULT)
6 - ut_httpapi (OTHER_FAULT)
Errors while running CTest

Any help would be greatly appreciated,
Thank you!

@beniz
Copy link
Collaborator

beniz commented Mar 17, 2017

Hi, the error involving the ptr is (with very high certainty) because the board lacks memory (RAM) when creating the features out of the text. Try min_count:10 and min_word_length:4 (or higher values).

FYI, DD runs fine on TX1, so there shouldn't be too many issues with TX2. The Jetson is nice for prediction, the GPU may not have enough memory for training the types of default architectures for images for instance. I would recommend you start by trying out https://deepdetect.com/tutorials/imagenet-classifier/ if you haven't already, as well as https://deepdetect.com/tutorials/object-detector/.

Please provide the output of ut_caffeapi. To do this, cd tests then ./ut_caffeapi. You can pipe the output in a file and provide gist to its content, that'd be useful to validate/invalidate the hypothesis above about the GPU memory.

Btw, thank you for using the issue template, makes things much cleaner and faster for us to understand!

@beniz beniz self-assigned this Mar 17, 2017
@beniz beniz added the kind:GPU label Mar 17, 2017
@RiC0MD
Copy link
Author

RiC0MD commented Mar 17, 2017

./ut_caffeapi

Running main() from gtest_main.cc
[==========] Running 13 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 13 tests from caffeapi
[ RUN ] caffeapi.service_train

INFO - 18:43:24 - Device id: 0
INFO - 18:43:24 - Major revision number: 6
INFO - 18:43:24 - Minor revision number: 2
INFO - 18:43:24 - Name: GP10B
INFO - 18:43:24 - Total global memory: 8235577344
INFO - 18:43:24 - Total shared memory per block: 49152
INFO - 18:43:24 - Total registers per block: 32768
INFO - 18:43:24 - Warp size: 32
INFO - 18:43:24 - Maximum memory pitch: 2147483647
INFO - 18:43:24 - Maximum threads per block: 1024
INFO - 18:43:24 - Maximum dimension of block: 1024, 1024, 64
INFO - 18:43:24 - Maximum dimension of grid: 2147483647, 65535, 65535
INFO - 18:43:24 - Clock rate: 1300500
INFO - 18:43:24 - Total constant memory: 65536
INFO - 18:43:24 - Texture alignment: 512
INFO - 18:43:24 - Concurrent copy and execution: Yes
INFO - 18:43:24 - Number of multiprocessors: 2
INFO - 18:43:24 - Kernel execution timeout: No
INFO - 18:43:24 - Initializing solver from parameters:
INFO - 18:43:24 - Creating training net specified in net_param.
INFO - 18:43:24 - The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
INFO - 18:43:24 - The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
INFO - 18:43:24 - Initializing net from parameters:

INFO - 18:43:24 - Creating layer / name=mnist / type=Data
INFO - 18:43:24 - Creating Layer mnist
INFO - 18:43:24 - mnist -> data
terminate called after throwing an instance of 'CaffeErrorException'
what(): ./include/caffe/util/db_lmdb.hpp:15 / Check failed (custom): (mdb_status) == (0)
INFO - 18:43:24 - mnist -> labelAborted (core dumped)

Not sure if the above helps or not, I will give the imagenet tests a try and report back,
Thanks for the help!

@beniz
Copy link
Collaborator

beniz commented Mar 17, 2017

TX2 bears a Pascal GPU, and you'd need to compile DD / Caffe with the correct CUDA compute capability code. The compute capability for TX1 and its Maxwell GPU is 5.3 (see https://developer.nvidia.com/cuda-gpus). There's no mention of the TX2 yet on this page, but I'd guess this should be 6.3. Look at the documentation that comes with the card and/or ask on Nvidia forums (I've quickly looked with Google, but haven't found anything relevant).

To specify the compute capability, look at the README, but basically you'd need to add

-DCUDA_ARCH="-gencode arch=compute_63,code=sm_63"

to your cmake call. Make sure to run make clean before you run cmake again with the line above.

@RiC0MD
Copy link
Author

RiC0MD commented Mar 17, 2017

Was able to dig up, it seems that the TX2 is 62, and when I tried

-DCUDA_ARCH="-gencode arch=compute_63,code=sm_63"

I saw it dropping back to 62 as well, so compiled with
-DCUDA_ARCH="-gencode arch=compute_62,code=sm_62"

Same results from ctest and from ./ut_caffeapi as above.

Tried my same text test from before, and atleast saw it get to about 20% training progress before the process stopped, and then tried your suggestion of changes, no hard crash, but the job failed with the following messages at about the 15% mark:

INFO - 23:02:44 - Ignoring source layer inputl
E0317 23:02:45.111093 8500 caffelib.cc:1938] Error while proceeding with test forward pass
INFO - 23:02:44 - Ignoring source layer loss
INFO - 23:02:45 - Fri Mar 17 23:02:45 2017 UTC - 127.0.0.1 "GET /train?service=n20&job=1" 200 0
ERROR - 23:02:45 - service n20 training status call failed
ERROR - 23:02:45 - {"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"src/caffe/layers/relu_layer.cu:26 / Check failed (custom): (error) == (cudaSuccess)"}

Will be doing the image test next here once the images are done downloading, in the mean time anything else I should check, verify, etc?
Thanks again for the help!

@beniz
Copy link
Collaborator

beniz commented Mar 18, 2017

Have you tried the image classifier test ? You don't need to download much for this.

@beniz
Copy link
Collaborator

beniz commented Mar 18, 2017

You can also provide the full build log. We have some of the newest cards but no TX2 handy yet.

@RiC0MD
Copy link
Author

RiC0MD commented Mar 18, 2017

Here's a full build log (inc me updating vars to get things to build, and showing 63 vs 62, etc)
https://gist.github.com/RiC0MD/6e5ea5c23dc8aea14ac22ffd0e002824

Ran out of time today, will try the image tests tomorrow when I awake.
Thanks again!

@RiC0MD
Copy link
Author

RiC0MD commented Mar 18, 2017

So was doing the full training for the image side, was going good for a good long while (couple of hours), and then I get this on server out:

INFO - 21:26:00 - This network produces output loss3/top-1
INFO - 21:26:00 - This network produces output probt
INFO - 21:26:00 - Network initialization done.
E0318 21:26:00.624101 24078 caffelib.cc:1745] exception while forward/backward pass through the network
INFO - 21:26:00 - Solver scaffolding done.
ERROR - 21:26:00 - service imageserv training status call failed

ERROR - 21:26:00 - {"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"src/caffe/util/im2col.cu:61 / Check failed (custom): (error) == (cudaSuccess)"}

From the watch script on the example page:
{"status":{"code":200,"msg":"OK"},"head":{"method":"/train","job":1,"status":"running","time":10147.0},"body":{"measure":{}}}
{"status":{"code":200,"msg":"OK"},"head":{"method":"/train","job":1,"status":"running","time":10167.0},"body":{"measure":{}}}
{"status":{"code":200,"msg":"OK"},"head":{"method":"/train","job":1,"status":"running","time":10187.0},"body":{"measure":{}}}
{"status":{"code":200,"msg":"OK"},"head":{"method":"/train","job":1,"status":"error"},"body":{}}

DD has stayed running, but the training aborts there.

@beniz
Copy link
Collaborator

beniz commented Mar 19, 2017

You should really start with the prediction tutorials from pre-trained models. CudaSuccess errors are usually due to a lack of memory on your GPU. This is very likely happening at the testing phase. Try setting the test_initialization parameter to true, and the error will happen much quicker.

@RiC0MD
Copy link
Author

RiC0MD commented Mar 20, 2017

Sorry for the delay, just loaded the prebuilt clothing model from the examples and seems to be working, the the blogspot example link, seems to be around 2.7-3.1s for a response, would this be typical response time? I do see the CPU spiking to around 50-70% when I run a query?

@beniz
Copy link
Collaborator

beniz commented Mar 20, 2017

you are certainly not running on the GPU

@RiC0MD
Copy link
Author

RiC0MD commented Mar 20, 2017

That was my fear, seems a bit sluggish, anything I can check? Be happy to run any tests, or anything you'd like.
Thanks!

@beniz
Copy link
Collaborator

beniz commented Mar 20, 2017

can you post the full API calls and the server logs ? basically three leads, the GPU is not activated or the build is incorrect or there's a missing option in the calls.

I can't find doc on the Caffe and TF builds on tx2 which may mean it s not a problem. On TX1 there's a script in the home that activates the GPU at full speed. Look for the .sh files

@RiC0MD
Copy link
Author

RiC0MD commented Mar 20, 2017

Here's the calls and their respective outputs, I've ran jetson clocks which is the only sh I can find, but I have ran a couple of the examples that ship with Jetpack, and they seem to work okay.
https://gist.github.com/RiC0MD/4a424b21d675c1d15340b402e4176d52

Let me know if you'd like to look at anything else, I think at this point I"m going to reflash it with latest jetpack again since I've tinkered with it so much, just to make sure the environment is sane before I push to much further

@beniz
Copy link
Collaborator

beniz commented Mar 20, 2017

Qdd gpu:on to all calls, look at the API. Also use multiple calls to measure the response time since the first call loads the model and is always slower. If the Nvidia samples works and report that the GPU is working then it's a build or parameter issue.

You don't need to reflash IMO.

@RiC0MD
Copy link
Author

RiC0MD commented Mar 20, 2017

Hmm, Ill play around with compiling caffee outside of DD and see what I can dig up and report back.

Thanks again!

@beniz
Copy link
Collaborator

beniz commented Mar 21, 2017

If you can temporarily share access to your device, you can join gitter and PM me for details so I may help directly.

@RiC0MD
Copy link
Author

RiC0MD commented Mar 21, 2017

Was actually about to suggest that myself, did some other tests, gave myself a clean start again just to make sure I didn't break anything from my tinkering, and rebuilt everything. Building caffe direct, seems to pass all the tests and I can see it using the cuda cores, but rebuild of DD with non-static and so forth, seemed to give the same results. Let me work on getting the unit so it can be reached from the outside world, and then Ill PM you the details. I have to be up in a few hours for a meeting and still haven't slept, so Ill be back as soon as I can :)
Thanks again for the help!

@beniz
Copy link
Collaborator

beniz commented Mar 22, 2017

Build on TX2 that is working fine for me (~58ms on single image prediction with Googlenet and Caffe):

cmake .. -DUSE_CUDNN=ON -DCUDA_ARCH="-gencode arch=compute_62,code=sm_62" -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF

@beniz
Copy link
Collaborator

beniz commented Mar 29, 2017

It seems it was working for me, so closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants