Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burst Buffer mode broken #314

Open
johnbent opened this issue Sep 18, 2013 · 2 comments
Open

Burst Buffer mode broken #314

johnbent opened this issue Sep 18, 2013 · 2 comments
Labels

Comments

@johnbent
Copy link
Member

All,

The IOStore code (while awesome) sadly broke burst buffer mode. IOStore currently rejects any and all paths that aren't predefined in the plfsrc. Burst buffer mode works by storing shadow paths in metalinks. So one node has /bb1 as a shadow and another node has /bb2 as a shadow. Now when the first node wants to read a file, it goes to the canonical container where it finds a metalink to /bb2. When it tries to actually do operations on /bb2, it fails because the plfsrc on the first node doesn't list /bb2 in the plfsrc.

And we don't want the first node to list /bb2 in its plfsrc. The only place that it could put it would be in canonical_backends or shadow_backends. Canonical is no good since we don't (currently) want canonical containers stored in burst buffers. Shadow is no good because we want each node to use a particular shadow (or subset) so we need a different shadow_backends defined with only a subset of shadows on each plfsrc.

One solution is to modify IOStore to allow previously unknown paths but this means that we'll have to put the IOStore type (glib,posix,etc) into the metalink.

Another solution is to create a new plfsrc directive called read_only_shadow_backends where we can list the other burst buffers that aren't used for writing. Then the first node will have /bb2 as a readonly_shadow and the second node will have /bb1 as a readonly_shadow.

@chuckcranor
Copy link
Contributor

I looked at this a bit. If we've got:

node1: {canonical=/m/pana0,shadow=/bb_n1}
node2: {canonical=/m/pana0,shadow=/bb_n2}

if node1 does a read and gets a meta link in the canonical container in
/m/pana0 that points to /bb_n2, then what is the read supposed to do?

should the data in a node's burst buffer be available to the other
remote nodes prior to the async transfer completeing?

notes on the current code path:

  1. it does support putting IOStore type in the metalink, but for
    posix mounts it optimizes the "posix:" out to save space and
    maintain backwards compat with pre-IOStore plfs. if it put
    a pvfs, hdfs, or iofs shadow metalink in, then it would have
    the prefix.

  2. the failure you are going to get would be:
    readMetalink() gets the metalink
    - parses the metalink and calls
    plfs_phys_backlookup(cp, pmnt, backout, NULL) to look it up

    that plfs_phys_backlookup() is going to fail because the backend
    isn't listed in plfsrc.

basically it is trying to map the metalink back to a specific plfs_backend
in the given PlfsMount and not finding it.

it is part of the code that lets plfs run multiple logical mount points
at the same time with non-POSIX filesystems (i.e. filesystems that required
you to "attach" to them before using them).

the code assumes that the Metalink is pointing to something that is in
the current PlfsMount, and thus it's plfs_backend has already been allocated
and properly attached to (so there is no further init needed to perform
I/O). So even if we hit a Metalink with "pvfs://foo/bar/we/have/not/seen"
in it we wouldn't be able to do I/O to it because it wouldn't be attached
(pvfs client may not even be init'd).

i'm thinking it is not entirely a good idea to let PLFS do backend I/O
to filesystems not listed in plfsrc anyway, since you easily get into
a case where you've got a bad plfsrc and not even know it. so the option
of listing a read-only shadow backend seems like the way to go to make
this work.

internally, the way it could work is that these backends would be
listed in PlfsMount->backends[] array, but not appear in either
PlfsMount->shadow_backends[] nor PlfsMount->canonical_backends[].
there are prob some sanity checks in insert_mount_point that would
have to get updated.

chuck

On Wed, Sep 18, 2013 at 02:34:17PM -0700, John Bent wrote:

The IOStore code (while awesome) sadly broke burst buffer mode. IOStore currently rejects any and all paths that aren't predefined in the plfsrc. Burst buffer mode works by storing shadow paths in metalinks. So one node has /bb1 as a shadow and another node has /bb2 as a shadow. Now when the first node wants to read a file, it goes to the canonical container where it finds a metalink to /bb2. When it tries to actually do operations on /bb2, it fails because the plfsrc on the first node doesn't list /bb2 in the plfsrc.

And we don't want the first node to list /bb2 in its plfsrc. The only place that it could put it would be in canonical_backends or shadow_backends. Canonical is no good since we don't (currently) want canonical containers stored in burst buffers. Shadow is no good because we want each node to use a particular shadow (or subset) so we need a different shadow_backends defined with only a subset of shadows on each plfsrc.

One solution is to modify IOStore to allow previously unknown paths but this means that we'll have to put the IOStore type (glib,posix,etc) into the metalink.

Another solution is to create a new plfsrc directive called read_only_shadow_backends where we can list the other burst buffers that aren't used for writing. Then the first node will have /bb2 as a readonly_shadow and the second node will have /bb1 as a readonly_shadow.

@brettkettering
Copy link
Contributor

I think we need to establish well-defined requirements for what burst buffer mode is in PLFS. Then, we need to design an implementation that makes it overt and supported. We don't want to rely on hide it under the covers. We want the person who defines the PLFS mounts to be able to specify a mount with the supported components (posix, glibc, hdfs, burst buffer, etc.) that create the functionality in a PLFS mount that is needed for a given installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants