Skip to content

As a Singly Sysadmin, I want to allocate storage for Lockers

othiym23 edited this page Aug 16, 2011 · 6 revisions

For a Singly hosted locker, it's a requirement to keep locker owners' personal data private, and to ensure that storage is robust and reliable. As systems managers, we need that data to be distributed and available, so data is not tied to any particular locker host. In order to enable this, we need to start building a secure, distributed storage cluster that allows us to satisfy the following criteria:

  • All of a locker owner's personal, mutable data (which should be confined to the Me/ subdirectory) must not be stored to disk in cleartext.
  • To avoid rewriting the entire storage strategy for lockers, the storage for Me/ should look like a POSIX filesystem.
  • Locker data thus stored must be accessible from multiple locker hosts.
  • Permissions, capabilities, and authorization remain open questions, both within and between lockers.
  • Managing all of the encryption keys for secured storage (whether via PKI or other means) remains an open question.

These requirements thus entail:

  • A distributed file system.
  • Some form of encryption that individually protects each locker's Me/ directory.

(First) Steps

  • Write simple scripts to create new eCryptfs-secured Me/ directories for each new locker.
  • Create a simple (HIGHLY INSECURE) system to store credentials used for accessing secure storage.
  • Set up a minimal GlusterFS cluster.
  • Write scripts to migrate legacy lockers into distributed secure storage.
  • Modify locker start scripts to load credentials into kernel keyring before starting locker.
  • Modify locker creation scripts to create new storage directories for each locker.
  • Modify Integral to pass necessary credential information to locker creation script.

Stakeholders

  • Simon
  • Chris
  • Temas

Dependencies

Potential issues

  • Key management is completely insecure.
  • How reliable is GlusterFS?
  • Current method doesn't escrow credentials at all -- a loss of the credentials means the locker content is lost.
  • CPU overhead of decryption might affect the number of lockers deployed per locker host.
  • This will distribute the data, but doesn't count as backup -- a bad day at Amazon could wipe everything out.
  • This might result in some more scope creep into testing infrastructure.

Acceptance Criteria

  • The existing lockers will be migrated to distributed, encrypted lockers.
  • New lockers created through Integral will be encrypted and distributed by default.
  • All architecture and scripts are documented on the wiki's storage page.
  • TESTING CODE FOR THE SCRIPTS IS CHECKED IN WITH THE SCRIPS.
  • (Optional) Stats will be available to Integral representing the size occupied by the lockers en toto, as well as per locker.
Clone this wiki locally