-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wish to run omp_kmeans on 100G dataset #1
Comments
Hi, there. The build does not require any BLAS or LAPACK libraries, so don't worry about those. Are you trying to use the CUDA version? If so, your dataset must fit in the RAM available to your GPU, which typically maxes out at 4 to 8 GB. A dataset of 100 GB is simply too large. I'd go so far as to say that CUDA won't bring you much benefit if you can't fit your dataset in the GPU memory because the time it takes to copy the data back and forth between the CPU memory and GPU memory would cripple the performance of the application. Let me know if I can be of anymore help. Serban On Oct 20, 2012, at 8:24 AM, meloom notifications@github.com wrote:
|
Hi, I did the following instruction ./omp_main -i ~/feature.txt -n 50 -p 12 -o Getting the following result(when the total Ram is over 3.7GB). Segmentation fault The feature.txt consists of 87 GB of data. each vector has about 300,000 features. Thank you in advance. |
Hi there, I think I found out where caused the error. In File_io.c, the program malloc a whole bulk of memory to objects, error occurs when the continuous memory allocated is too large. Then instead, I allocated memory for each object. Thank you anyway for replying. |
Nvidia is ramping up their deep learning efforts and you can get up to 96GB of graphic memory. It would be really cool if you could consider looking into eliminating the 32bit restriction for the cuda code. For example I noticed on a g2.2xlarge machine with Nvidia CUDA AMI (https://aws.amazon.com/marketplace/pp/B01LZMLK1K) that the read call in cuda_io.cu (binary file) was limited to read 2^31 bytes. It's a bit weird, because the machine supports 64bit |
FWIW g2.2xlarge uses a single GPU with 4GB of RAM:
|
Also, 64-bit performance would suffer heavily because it needs the double-precision unit. Most GPUs (but the newest/upcoming Teslas) have miniscule capabilities for that, so a server CPU may easily outperform them. I agree the upcoming Pascals will be better suited for 64-bit though (currently ~5 TFlops: http://www.nvidia.com/object/tesla-p100.html). EDIT: it can in fact address 64-bit with multiple-instruction sequences, but that again may decrease the performance: https://developer.nvidia.com/cuda-faq EDIT2: double-precision performance refers to the ALU, for integers you'd need to rely on multi-instruction sequences. EDIT3: well, double-precision can be used to manipulate any integers up to 2^53 without loss of precision. It's more of a hack though, and may not be well-suited for memory-addressing. |
They also recently released p2 instances, although the rollout doesn't seem to have finished in practice: https://aws.amazon.com/ec2/instance-types/p2/ |
Sorry for the shameless promotion, but all people stuck with 4GB memory limit should try https://github.com/src-d/kmcuda It supports as much memory as your GPU has, runs on multiple GPUs in parallel and is capable to handle the data in float16 format with Kahan summation (hence doubled data size) . Yet still 100GB is too much, of course. I would do the following. Pick "best" X GB from 100GB where X is the amount of mem your GPU has, cluster them, and then use the centroids to assign the rest of the dataset. |
Hi, I am planning to run this program on a dataset sized almost 100GB on my server(more than 200GB of mem).
Could you please tell how to implement is cause I constantly getting a 'segmentation fault' error message when the memory exceed 4GB.
I have checked the BLAS and LAPACK libraries are all 64 bits version.
omp_kmeans is also compiled using a 64bit gcc compiler.
Thank you for your kindness.
The text was updated successfully, but these errors were encountered: