Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model inversion? #44

Open
amalloy opened this issue Sep 28, 2010 · 2 comments
Open

Data model inversion? #44

amalloy opened this issue Sep 28, 2010 · 2 comments

Comments

@amalloy
Copy link

amalloy commented Sep 28, 2010

It's possible I've got this all wrong, but I've thought about it for a while and I'm reasonably certain I understand Cassandra. In "WTF is a SuperColumn", the data model is described as Keyspace.SuperColumnFamily[Row][Super][Column] = value. Pandra does not have a Row type; instead each SCF has a keyID, which is the Row index (first key in brackets). This means that, in order to add rows to a SuperColumnFamily, we must create a NEW SuperColumnFamily object for every entry, with the same keyID (since Pandra uses that to mean the second member of the dotted pair) but a different name. This is backward: there should be a Row class or some such, which does what SCF does now (hold SuperColumns), and the SuperColumnFamily class should be repurposed to be solely a Map<String, Row>.

More than just a naming issue, this implementation has technical implications. Specifically, in PandraSuperColumnFamily::save(), there is a comment /* @todo there must be a better way */, followed by looping over all of the SuperColumn children. There is a better way! The Thrift method batch_mutate takes a keyspace, and a map<string, map<string, list>>. Mutation, meanwhile, can describe a SuperColumn insertion, which itself is a list of Column insertions. Pandra is not making use of all of these levels of hierarchy: every save() call in Pandra's API could be implemented as a single Thrift call, with no need for multiple requests.

My rough sketch of an implementation would be:

class SCF {
  function save() {
    $mutations = array();
    foreach ($this->getRows() as $key => $superCol) {
      $mutations[$key] = array($superCol->getMutation()); // see below
    }
    $realParam = array($this->name => $mutations); // wrap it up to save just this SCF
    $client->batch_mutate($this->keyspace, $realParam);
  }
}
class SuperColumn {
  function getMutation() {
    $cols = array();
    foreach ($this->getColumns() as $name => $value) {
      $cols[] = new ThriftColumn($name, $value);
    }
    return new ThriftMutation(INSERT, new ThriftSuperColumn($this->name, $cols));
  }
}

Obviously this glosses over quite a few details, like deletions, but I think the structure is right. I definitely sympathize with your erroneous (but see disclaimer at top!) implementation: even when you know exactly what to do it's hard to think about SuperColumnFamilies!

@mjpearson
Copy link
Owner

Thanks for the deeper level of thought :) A row (or keyspace) container would be best, it would be great to tie in authentication and connection pooling to core also. I'm fine to do this, but did you have any other specifics in mind?

@amalloy
Copy link
Author

amalloy commented Oct 7, 2010

No specifics, particularly. I've just started to get into Pandra's internals, so I don't have enough insight to be helpful there; I really just noticed the weird API because it conflicted with what I knew about Cassandra. I'll let you know if I have any clever ideas while I'm working on 0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants