Memory allocation vs file load time: the cost of NUMA? #1665

ctb · 2017-04-05T13:12:22Z

In the vein of @betatim's I/O experiments #1554, I did some benchmarking on my laptop with hashsizes and load times; results here:

https://gist.github.com/ctb/beb9a7eebf1a1282562ff0272efb4dd6

On my laptop between the ranges of 5 MB and 2 GB we get a nearly linear slowdown (m=0.7) in sequencefile => counttable.

This is likely due to some nasty combination of NUMA and cache (in)coherence of our Bloom filters/CountMin sketches.

ctb · 2017-04-05T13:20:15Z

The tl;dr is that if we do nothing but increase memory/table size by 10x, we slow down k-mer loading by a factor of 7x.

See comments #1553 (comment) and a rundown of two possible solutions in #1553 (comment). Note that speeding up insertion will still not change the speed of retrieval, which should be subject to the same badness.

The ultimate goal should probably be to use more cache-coherent data structures (see some comments in e.g. #1215 and #1198).

ctb · 2017-04-05T13:26:34Z

@rob-p has some cool stuff going on with efficient alternatives to (counting) Bloom filters; I wonder if he would be interested in solving our problems for us? Should he stumble across this comment, I would direct his attention to our newly factored out Storage class in storage.hh, and point out that we only need a get_count and an add implementation to try out new data types.

ctb · 2017-04-05T15:29:38Z

Ahh, I see @rob-p's cool stuff is available! https://github.com/splatlab/cqf thx @luizirber

standage · 2017-04-05T15:48:37Z

This paper popped up on Biorxiv the other day, linking to this technical report describing the CQF in detail.

betatim · 2017-04-06T08:35:26Z

Some benchmarking of my own: https://gist.github.com/betatim/24886949ded51b90a3ca066c2a7f0439

I ran the benchmark with the process pinned to a CPU (and then varied which one it is pinned to). The thinking is that by pinning our (single threaded) process to a certain CPU we don't end up being scheduled on a different socket with caches that contain nothing useful for us. The best you can do is pin the process to a CPU and that doesn't seem to change anything. So I am unconvinced by the NUMA explanation (or missing something).

Last plot shows the runtime when you subtract the time to run the same benchmark on a file containing only 20kmers. This is an attempt at measuring how long it takes to allocate and zero out the memory we request. I think for reasonable sizes (read: large) of memory this takes a linear amount of time. So some of the linear slow down is because it takes time to allocate the memory.

betatim · 2017-04-06T08:45:44Z

Edited the gist to add a few more points for the "delta" plot. I'd now claim that up to ~1.5GB it takes a constant amount of time to process the data once you subtract the 'startup' time.

betatim · 2017-04-06T11:22:42Z

What happens if you change -N from 1 to 4?

not quite sure what this tells us. Somehow measures the extra cost of computing N indices and accessing that bit of memory

ctb · 2017-04-06T12:33:34Z

well, more N means more compute, so this all makes straightforward sense to me...

luizirber · 2017-04-12T18:24:02Z

Another point: https://fgiesen.wordpress.com/2017/04/11/memory-bandwidth/
But it doesn't help us much at all: we don't have extra compute to do while waiting for memory access...

ctb added the optimization label Apr 5, 2017

ctb mentioned this issue Apr 5, 2017

Try out the Counting Quotient Filter as a replacement? for Bloom filter/CMS. #1667

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory allocation vs file load time: the cost of NUMA? #1665

Memory allocation vs file load time: the cost of NUMA? #1665

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

standage commented Apr 5, 2017

betatim commented Apr 6, 2017

betatim commented Apr 6, 2017 •

edited

Loading

betatim commented Apr 6, 2017

ctb commented Apr 6, 2017

luizirber commented Apr 12, 2017

Memory allocation vs file load time: the cost of NUMA? #1665

Memory allocation vs file load time: the cost of NUMA? #1665

Comments

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

ctb commented Apr 5, 2017

standage commented Apr 5, 2017

betatim commented Apr 6, 2017

betatim commented Apr 6, 2017 • edited Loading

betatim commented Apr 6, 2017

ctb commented Apr 6, 2017

luizirber commented Apr 12, 2017

betatim commented Apr 6, 2017 •

edited

Loading