Wish to run omp_kmeans on 100G dataset #1

ghost · 2012-10-20T15:24:04Z

Hi, I am planning to run this program on a dataset sized almost 100GB on my server(more than 200GB of mem).
Could you please tell how to implement is cause I constantly getting a 'segmentation fault' error message when the memory exceed 4GB.
I have checked the BLAS and LAPACK libraries are all 64 bits version.
omp_kmeans is also compiled using a 64bit gcc compiler.

Thank you for your kindness.

serban · 2012-10-21T17:44:56Z

Hi, there.

The build does not require any BLAS or LAPACK libraries, so don't worry about those.

Are you trying to use the CUDA version? If so, your dataset must fit in the RAM available to your GPU, which typically maxes out at 4 to 8 GB. A dataset of 100 GB is simply too large. I'd go so far as to say that CUDA won't bring you much benefit if you can't fit your dataset in the GPU memory because the time it takes to copy the data back and forth between the CPU memory and GPU memory would cripple the performance of the application.

Let me know if I can be of anymore help.

Serban

On Oct 20, 2012, at 8:24 AM, meloom notifications@github.com wrote:

Hi, I am planning to run this program on a dataset sized almost 100GB on my server(more than 200GB of mem).
Could you please tell how to implement is cause I constantly getting a 'segmentation fault' error message when the memory exceed 4GB.
I have checked the BLAS and LAPACK libraries are all 64 bits version.
omp_kmeans is also compiled using a 64bit gcc compiler.

Thank you for your kindness.

—
Reply to this email directly or view it on GitHub.

ghost · 2012-10-22T14:06:30Z

Hi,
Thank you for replying.

I did the following instruction

./omp_main -i ~/feature.txt -n 50 -p 12 -o

Getting the following result(when the total Ram is over 3.7GB).

Segmentation fault

The feature.txt consists of 87 GB of data. each vector has about 300,000 features.
I am not using CUDA.
Could you please tell me how to fix this error?

Thank you in advance.

ghost · 2012-10-24T08:00:57Z

Hi there,

I think I found out where caused the error.

In File_io.c, the program malloc a whole bulk of memory to objects, error occurs when the continuous memory allocated is too large.

Then instead， I　allocated memory for each object.

Thank you anyway for replying.

cvnerds · 2016-10-06T11:39:38Z

Nvidia is ramping up their deep learning efforts and you can get up to 96GB of graphic memory. It would be really cool if you could consider looking into eliminating the 32bit restriction for the cuda code. For example I noticed on a g2.2xlarge machine with Nvidia CUDA AMI (https://aws.amazon.com/marketplace/pp/B01LZMLK1K) that the read call in cuda_io.cu (binary file) was limited to read 2^31 bytes. It's a bit weird, because the machine supports 64bit

dinvlad · 2016-10-06T16:07:21Z

FWIW g2.2xlarge uses a single GPU with 4GB of RAM:

High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

dinvlad · 2016-10-06T16:11:19Z

Also, 64-bit performance would suffer heavily because it needs the double-precision unit. Most GPUs (but the newest/upcoming Teslas) have miniscule capabilities for that, so a server CPU may easily outperform them. I agree the upcoming Pascals will be better suited for 64-bit though (currently ~5 TFlops: http://www.nvidia.com/object/tesla-p100.html).

EDIT: it can in fact address 64-bit with multiple-instruction sequences, but that again may decrease the performance: https://developer.nvidia.com/cuda-faq

EDIT2: double-precision performance refers to the ALU, for integers you'd need to rely on multi-instruction sequences.

EDIT3: well, double-precision can be used to manipulate any integers up to 2^53 without loss of precision. It's more of a hack though, and may not be well-suited for memory-addressing.

cvnerds · 2016-10-06T16:23:52Z

They also recently released p2 instances, although the rollout doesn't seem to have finished in practice: https://aws.amazon.com/ec2/instance-types/p2/

vmarkovtsev · 2017-01-22T20:14:28Z

Sorry for the shameless promotion, but all people stuck with 4GB memory limit should try https://github.com/src-d/kmcuda It supports as much memory as your GPU has, runs on multiple GPUs in parallel and is capable to handle the data in float16 format with Kahan summation (hence doubled data size) . Yet still 100GB is too much, of course. I would do the following. Pick "best" X GB from 100GB where X is the amount of mem your GPU has, cluster them, and then use the centroids to assign the rest of the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wish to run omp_kmeans on 100G dataset #1

Wish to run omp_kmeans on 100G dataset #1

ghost commented Oct 20, 2012

serban commented Oct 21, 2012

ghost commented Oct 22, 2012

ghost commented Oct 24, 2012

cvnerds commented Oct 6, 2016

dinvlad commented Oct 6, 2016

dinvlad commented Oct 6, 2016 •

edited

Loading

cvnerds commented Oct 6, 2016

vmarkovtsev commented Jan 22, 2017

Wish to run omp_kmeans on 100G dataset #1

Wish to run omp_kmeans on 100G dataset #1

Comments

ghost commented Oct 20, 2012

serban commented Oct 21, 2012

ghost commented Oct 22, 2012

ghost commented Oct 24, 2012

cvnerds commented Oct 6, 2016

dinvlad commented Oct 6, 2016

dinvlad commented Oct 6, 2016 • edited Loading

cvnerds commented Oct 6, 2016

vmarkovtsev commented Jan 22, 2017

dinvlad commented Oct 6, 2016 •

edited

Loading