Data model inversion? #44

amalloy · 2010-09-28T22:13:49Z

It's possible I've got this all wrong, but I've thought about it for a while and I'm reasonably certain I understand Cassandra. In "WTF is a SuperColumn", the data model is described as Keyspace.SuperColumnFamily[Row][Super][Column] = value. Pandra does not have a Row type; instead each SCF has a keyID, which is the Row index (first key in brackets). This means that, in order to add rows to a SuperColumnFamily, we must create a NEW SuperColumnFamily object for every entry, with the same keyID (since Pandra uses that to mean the second member of the dotted pair) but a different name. This is backward: there should be a Row class or some such, which does what SCF does now (hold SuperColumns), and the SuperColumnFamily class should be repurposed to be solely a Map<String, Row>.

More than just a naming issue, this implementation has technical implications. Specifically, in PandraSuperColumnFamily::save(), there is a comment /* @todo there must be a better way */, followed by looping over all of the SuperColumn children. There is a better way! The Thrift method batch_mutate takes a keyspace, and a map<string, map<string, list>>. Mutation, meanwhile, can describe a SuperColumn insertion, which itself is a list of Column insertions. Pandra is not making use of all of these levels of hierarchy: every save() call in Pandra's API could be implemented as a single Thrift call, with no need for multiple requests.

My rough sketch of an implementation would be:

class SCF {
  function save() {
    $mutations = array();
    foreach ($this->getRows() as $key => $superCol) {
      $mutations[$key] = array($superCol->getMutation()); // see below
    }
    $realParam = array($this->name => $mutations); // wrap it up to save just this SCF
    $client->batch_mutate($this->keyspace, $realParam);
  }
}
class SuperColumn {
  function getMutation() {
    $cols = array();
    foreach ($this->getColumns() as $name => $value) {
      $cols[] = new ThriftColumn($name, $value);
    }
    return new ThriftMutation(INSERT, new ThriftSuperColumn($this->name, $cols));
  }
}

Obviously this glosses over quite a few details, like deletions, but I think the structure is right. I definitely sympathize with your erroneous (but see disclaimer at top!) implementation: even when you know exactly what to do it's hard to think about SuperColumnFamilies!

mjpearson · 2010-10-07T00:00:49Z

Thanks for the deeper level of thought :) A row (or keyspace) container would be best, it would be great to tie in authentication and connection pooling to core also. I'm fine to do this, but did you have any other specifics in mind?

amalloy · 2010-10-07T00:03:41Z

No specifics, particularly. I've just started to get into Pandra's internals, so I don't have enough insight to be helpful there; I really just noticed the weird API because it conflicted with what I knew about Cassandra. I'll let you know if I have any clever ideas while I'm working on 0.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model inversion? #44

Data model inversion? #44

amalloy commented Sep 28, 2010

mjpearson commented Oct 7, 2010

amalloy commented Oct 7, 2010

Data model inversion? #44

Data model inversion? #44

Comments

amalloy commented Sep 28, 2010

mjpearson commented Oct 7, 2010

amalloy commented Oct 7, 2010