hardware resources recommendation #519

Cheng-Wang · 2014-06-19T08:38:05Z

Dear all,

I am going to conduct some experiments based on caffe for training up to 5 million images. I have a chance to apply access permission of hardware resources from our Lab. I have two options:

(1) A 1000 core compute cluster comprising 25 nodes of 40 cores and 1 TB RAM each
(2) HP Server with 2 TB RAM and 64 cores and NVIDIA Tesla K20X

I can choose one of them, can you give me some suggestions regarding this problem from perspective of computing capability as well as the amount of configuration work? If I choose the first solution, how can I distribute my training work to different nodes ?

Thank you in advance !

Yangqing · 2014-06-23T18:19:11Z

It seems that the former and latter are not really comparable... I guess the answer is "it depends" - Caffe uses single machine with GPU (and hopefully with multiple GPUs in the future), so the second one fits this purpose better. However, the first option apparently is more powerful and may serve many other tasks - distributed computing, computation without GPU, etc.

(Disclaimer: I am just giving my best guess and kindly don't hold me responsible for any decisions. You might want to consult with some domain experts in your field.)

Yangqing closed this as completed Jun 23, 2014

kloudkl mentioned this issue Aug 6, 2014

Try to extract Convolution code from cuda-convnet2 #830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hardware resources recommendation #519

hardware resources recommendation #519

Cheng-Wang commented Jun 19, 2014

Yangqing commented Jun 23, 2014

hardware resources recommendation #519

hardware resources recommendation #519

Comments

Cheng-Wang commented Jun 19, 2014

Yangqing commented Jun 23, 2014