GitHub - saffroy/driller: Show how independent Unix processes can make their memory space readable by other processes.

saffroy / driller Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Show how independent Unix processes can make their memory space readable by other processes.

GPL-2.0, LGPL-2.1 licenses found

Licenses found

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.hgignore		.hgignore
.hgtags		.hgtags
COPYING		COPYING
COPYING.LIB		COPYING.LIB
Makefile		Makefile
README		README
TODO		TODO
depend.sh		depend.sh
dlmalloc.c		dlmalloc.c
dlmalloc.h		dlmalloc.h
driller.c		driller.c
driller.h		driller.h
driller.txt		driller.txt
driller_internal.h		driller_internal.h
fdproxy.c		fdproxy.c
fdproxy.h		fdproxy.h
fdproxy_internal.h		fdproxy_internal.h
hsearch_r.c		hsearch_r.c
linux.c		linux.c
log.h		log.h
map_cache.c		map_cache.c
map_cache.h		map_cache.h
mmpi.c		mmpi.c
mmpi.h		mmpi.h
mmpi_internal.h		mmpi_internal.h
search.h		search.h
solaris.c		solaris.c
spinlock.h		spinlock.h
test_dlmalloc.sh		test_dlmalloc.sh
test_driller.c		test_driller.c
test_driller.sh		test_driller.sh
test_fdproxy.c		test_fdproxy.c
test_fdproxy.sh		test_fdproxy.sh
test_mmpi.c		test_mmpi.c
test_mmpi.sh		test_mmpi.sh
test_spinlock.c		test_spinlock.c
test_spinlock.sh		test_spinlock.sh
tunables.h		tunables.h

Repository files navigation

The Driller lib
---------------

The set of files in this tarball implement a way for a group of
cooperating processes to directly access memory segments of one
another. This can form the basis of a message passing system that
avoids costly data copy, as is demonstrated with mmpi ("mini-MPI"), a
very simple API loosely inspired from MPI (it only has send, recv and
barrier primitives).

The name "driller" comes from the idea that we could drill holes in
the closed container that forms the memory of a regular process.

The code has been designed for Linux and tested on 2.6 only, though
it might be portable to some Unix variants. It has small portions of
architecture specific code, but has been tested on x86 and x64, and
it should not be difficult to port to other architectures.

How to test it
--------------

 - run "make" to build all library files and tests, or
   "make DEBUG_FLAGS=-DDEBUG" for more verbose output during tests
 - run "./test_driller.sh" for a basic sanity test
 - run "./test_mmpi.sh" for a performance test with mmpi

This performance test measures the throughput between two processes
for various buffer sizes and a constant volume. Results are usually as
follows (on a dual core Athlon64 X2 3800+):

now time send/recv throughput (128 MB per iteration)...
throughput:  108.8 MB/s (1.18s) chunk      256
throughput:  305.3 MB/s (0.42s) chunk      512
throughput:  563.6 MB/s (0.23s) chunk     1024
throughput:  802.7 MB/s (0.16s) chunk     2048
throughput: 1604.5 MB/s (0.08s) chunk     4096
throughput: 2249.5 MB/s (0.06s) chunk     8192
throughput: 4254.0 MB/s (0.03s) chunk    16384
throughput: 5238.8 MB/s (0.02s) chunk    32768
throughput: 2686.8 MB/s (0.05s) chunk    65536
throughput: 2939.2 MB/s (0.04s) chunk   131072
throughput: 1587.3 MB/s (0.08s) chunk   262144
throughput:  988.8 MB/s (0.13s) chunk   524288
throughput: 1042.3 MB/s (0.12s) chunk  1048576
throughput: 1056.6 MB/s (0.12s) chunk  2097152
throughput: 1057.7 MB/s (0.12s) chunk  4194304
throughput: 1020.2 MB/s (0.13s) chunk  8388608

Performance is not very stable over successive runs, for now I suspect
that mmpi is too sensitive to other tasks being scheduled (tests are
performed on a workstation running quite a few interactive programs
under X11).

A similar test on the same host with MPICH2 1.0.6 gives this:

now time send/recv throughput (128 MB per iteration)...
throughput:  261.7 MB/s (0.49s) chunk      256
throughput:  354.6 MB/s (0.36s) chunk      512
throughput:  425.8 MB/s (0.30s) chunk     1024
throughput:  480.7 MB/s (0.27s) chunk     2048
throughput:  507.4 MB/s (0.25s) chunk     4096
throughput:  526.5 MB/s (0.24s) chunk     8192
throughput:  375.4 MB/s (0.34s) chunk    16384
throughput:  405.9 MB/s (0.32s) chunk    32768
throughput:  428.4 MB/s (0.30s) chunk    65536
throughput:  444.9 MB/s (0.29s) chunk   131072
throughput:  447.3 MB/s (0.29s) chunk   262144
throughput:  451.6 MB/s (0.28s) chunk   524288
throughput:  446.8 MB/s (0.29s) chunk  1048576
throughput:  456.6 MB/s (0.28s) chunk  2097152
throughput:  449.3 MB/s (0.28s) chunk  4194304
throughput:  448.4 MB/s (0.29s) chunk  8388608

The results with MPICH2 are more stable over successive runs, but much
lower for chunk sizes above 4KB.

How it works
------------

The code uses the following tricks to achieve its goals:

 - under Linux, a process can examine the layout of its own memory
 space by reading /proc/self/maps (on Linux 2.6, the stack and heap
 are clearly identified)

 - an existing memory segment can be atomically replaced by a call to
 mmap at the same address; thus, we can copy an existing segment to a
 file, and then map this file over the original segment, which gives
 us the same data, except it is now in a file

 - unix sockets can be used to pass file descriptors from one process
 to another; for example one created with the trick above, so another
 process can map the same segment in its own memory space

 - under Linux, memory is usually allocated by calling malloc (in the
 C library) or mmap, both of which can be intercepted (using malloc
 hooks and symbol overloading); this means that these memory segments
 can be memory-mapped files if we want them to

These tricks make it possible for a process to simply replace most of
its memory segments by memory-mapped files. The file descriptors for
all replaced regions can then be passed to cooperating process, which,
after an mmap(), can directly access the same memory. When the first
process modifies or destroys its mapping, this is notified to other
processes, which will drop any reference to the file and eventually
free associated ressources.

The code implementing these mechanisms is split in three parts:
 - fdproxy.c: this implements file descriptor passing; one of the
 participants forks a server process that will forward file
 descriptors to any client
 - driller.c: this replaces the memory segments of a process with
 memory-mapped files, and tracks calls to mmap or brk, which modify
 the process memory layout
 - map_cache.c: this provides a cache structure for memory segments
 that a process can have remapped from another process

Since the glibc implementation of malloc cannot be forced to use our
overloaded version of mmap, an alternate allocator had to be used, and
Doug Lea's malloc in dlmalloc.c is a convenient substitute.

The file mmpi.c implements a very simple message passing API that uses
the files above and requires at most one buffer copy to transfer a
message. Of course, the cost of a few system calls cannot always be
avoided, but in favorable cases, it can be spread over many messages.

Licensing
---------

The files dlmalloc.c and dlmalloc.h are released by Doug Lea to the
public domain.

All other files are Copyright 2007 Jean-Marc Saffroy.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License version
2.1 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this program.  If not, see
<http://www.gnu.org/licenses/>.

Contact
-------

Jean-Marc Saffroy <saffroy@gmail.com>

Acknowledgements
----------------

Thanks to Gaël Roualland for testing and debugging this code on IA32.