Burst Buffer mode broken #314

johnbent · 2013-09-18T21:34:16Z

All,

The IOStore code (while awesome) sadly broke burst buffer mode. IOStore currently rejects any and all paths that aren't predefined in the plfsrc. Burst buffer mode works by storing shadow paths in metalinks. So one node has /bb1 as a shadow and another node has /bb2 as a shadow. Now when the first node wants to read a file, it goes to the canonical container where it finds a metalink to /bb2. When it tries to actually do operations on /bb2, it fails because the plfsrc on the first node doesn't list /bb2 in the plfsrc.

And we don't want the first node to list /bb2 in its plfsrc. The only place that it could put it would be in canonical_backends or shadow_backends. Canonical is no good since we don't (currently) want canonical containers stored in burst buffers. Shadow is no good because we want each node to use a particular shadow (or subset) so we need a different shadow_backends defined with only a subset of shadows on each plfsrc.

One solution is to modify IOStore to allow previously unknown paths but this means that we'll have to put the IOStore type (glib,posix,etc) into the metalink.

Another solution is to create a new plfsrc directive called read_only_shadow_backends where we can list the other burst buffers that aren't used for writing. Then the first node will have /bb2 as a readonly_shadow and the second node will have /bb1 as a readonly_shadow.

chuckcranor · 2013-09-19T17:55:33Z

I looked at this a bit. If we've got:

node1: {canonical=/m/pana0,shadow=/bb_n1}
node2: {canonical=/m/pana0,shadow=/bb_n2}

if node1 does a read and gets a meta link in the canonical container in
/m/pana0 that points to /bb_n2, then what is the read supposed to do?

should the data in a node's burst buffer be available to the other
remote nodes prior to the async transfer completeing?

notes on the current code path:

it does support putting IOStore type in the metalink, but for
posix mounts it optimizes the "posix:" out to save space and
maintain backwards compat with pre-IOStore plfs. if it put
a pvfs, hdfs, or iofs shadow metalink in, then it would have
the prefix.
the failure you are going to get would be:
readMetalink() gets the metalink
- parses the metalink and calls
plfs_phys_backlookup(cp, pmnt, backout, NULL) to look it up

that plfs_phys_backlookup() is going to fail because the backend
isn't listed in plfsrc.

basically it is trying to map the metalink back to a specific plfs_backend
in the given PlfsMount and not finding it.

it is part of the code that lets plfs run multiple logical mount points
at the same time with non-POSIX filesystems (i.e. filesystems that required
you to "attach" to them before using them).

the code assumes that the Metalink is pointing to something that is in
the current PlfsMount, and thus it's plfs_backend has already been allocated
and properly attached to (so there is no further init needed to perform
I/O). So even if we hit a Metalink with "pvfs://foo/bar/we/have/not/seen"
in it we wouldn't be able to do I/O to it because it wouldn't be attached
(pvfs client may not even be init'd).

i'm thinking it is not entirely a good idea to let PLFS do backend I/O
to filesystems not listed in plfsrc anyway, since you easily get into
a case where you've got a bad plfsrc and not even know it. so the option
of listing a read-only shadow backend seems like the way to go to make
this work.

internally, the way it could work is that these backends would be
listed in PlfsMount->backends[] array, but not appear in either
PlfsMount->shadow_backends[] nor PlfsMount->canonical_backends[].
there are prob some sanity checks in insert_mount_point that would
have to get updated.

chuck

On Wed, Sep 18, 2013 at 02:34:17PM -0700, John Bent wrote:

The IOStore code (while awesome) sadly broke burst buffer mode. IOStore currently rejects any and all paths that aren't predefined in the plfsrc. Burst buffer mode works by storing shadow paths in metalinks. So one node has /bb1 as a shadow and another node has /bb2 as a shadow. Now when the first node wants to read a file, it goes to the canonical container where it finds a metalink to /bb2. When it tries to actually do operations on /bb2, it fails because the plfsrc on the first node doesn't list /bb2 in the plfsrc.

And we don't want the first node to list /bb2 in its plfsrc. The only place that it could put it would be in canonical_backends or shadow_backends. Canonical is no good since we don't (currently) want canonical containers stored in burst buffers. Shadow is no good because we want each node to use a particular shadow (or subset) so we need a different shadow_backends defined with only a subset of shadows on each plfsrc.

One solution is to modify IOStore to allow previously unknown paths but this means that we'll have to put the IOStore type (glib,posix,etc) into the metalink.

Another solution is to create a new plfsrc directive called read_only_shadow_backends where we can list the other burst buffers that aren't used for writing. Then the first node will have /bb2 as a readonly_shadow and the second node will have /bb1 as a readonly_shadow.

brettkettering · 2013-10-16T14:18:05Z

I think we need to establish well-defined requirements for what burst buffer mode is in PLFS. Then, we need to design an implementation that makes it overt and supported. We don't want to rely on hide it under the covers. We want the person who defines the PLFS mounts to be able to specify a mount with the supported components (posix, glibc, hdfs, burst buffer, etc.) that create the functionality in a PLFS mount that is needed for a given installation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Burst Buffer mode broken #314

Burst Buffer mode broken #314

johnbent commented Sep 18, 2013

chuckcranor commented Sep 19, 2013

brettkettering commented Oct 16, 2013

Burst Buffer mode broken #314

Burst Buffer mode broken #314

Comments

johnbent commented Sep 18, 2013

chuckcranor commented Sep 19, 2013

brettkettering commented Oct 16, 2013