Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync #350

Closed
dannylamb opened this issue Aug 26, 2016 · 27 comments
Closed

Sync #350

dannylamb opened this issue Aug 26, 2016 · 27 comments

Comments

@dannylamb
Copy link
Contributor

dannylamb commented Aug 26, 2016

We need to pull it out of Alpaca and give it a project of its own. I'll try and outline what I think I'll need to make this happen.

  1. Every node (e.g. Fedora and Drupal) must produce and consume JSON-LD RDF.
  2. Every node (e.g. Fedora and Drupal) must support conditional updates
  3. Every node(e.g. Fedora and Drupal) must both hold causality tracking information per resource
  4. Every node(e.g. Fedora and Drupal) must atomically update causality tracking information with every update
  5. Every node (e.g. Fedora and Drupal) must emit messages on write events

I'm thinking of using Interval Tree Clocks to handle causality. That way there can be any number of nodes, and would encourage other distributions / integrations. If that fails my backup is just to index two vector clocks (one for Fedora and one for Drupal) in meta data on both nodes. This would be considered system meta data. The end user doesn't really need to know anything about this.

Then each node will broadcast a sync event on write operations. This message must contain the entire representation as the body, and the body must correspond to the version/vclock identified in a custom header for causality tracking. Each node that is listening will:

  1. Read the message and GET the representation from itself, extracting an ETag for optimistic locking. The causality tracking metadata will need to be included in the representation as well.
  2. Compare the causality tracking metadata to that provided in the message to generate a pre-order
  3. If the message contains a newer version, state will be updated (along with the causality tracking metadata), providing the ETag for a conditional update. If the message is stale, ignore the message and exit.
  4. If 412 is returned, retry, with some sort of strategy (every so many seconds, and start backing off for longer periods)
  5. If failure persists for a specified amount of time/retries, then publish to a dead letter queue and notify the administrator (update db table for dashboard or send an email).
@ruebot
Copy link
Member

ruebot commented Aug 26, 2016

I'm happy to do the git-fu breaking apart portion of it, if that helps things along.

...is this what we're going to call Salmon?

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

One thing about vector clocks is that they will grow in length over time. That is, they will become arbitrarily long. It is also possible to compact these vclocks under certain conditions, making their length considerably shorter. We should think about a strategy for handling this.

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

I wonder if the PROV ontology would be helpful in describing the v-clocks (which would be a serialization of the poset used for handling conflicts)

@dannylamb
Copy link
Contributor Author

@acoburn I'll have to give that a read. I thought I'd never find an ontology that I could use for causality tracking!

Also, it's my understanding that implementing this with interval tree clocks would eliminate the need for pruning, which apparently an have some drastic side effects if you're not careful.

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

@dannylamb it is interesting to note that at least one of the PROV documents that made it to W3C recommendation status explicitly calls out lamport clocks and all the things we discussed yesterday on IRC.

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

There are clearly practical and implementation considerations here, but PROV would at least give us a place to start for modeling this. And I'd note that I've done some work with both prov and premis w/r/t serialization of event-based data, and PROV is a way more flexible model.

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

@dannylamb that's fabulous if we don't have to do pruning! So, for the interval tree clocks, are you planning to use the C, the Java or the Erlang implementation? :-) (my three favorite languages)

@dannylamb
Copy link
Contributor Author

I don't know how well the interplay between the three are, but I was thinking using Java, and if we need it in drupal land we could make a PHP extension using the C library.

If they cant understand each other's serializations, I'll have to go back to the drawing board.

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

@dannylamb oh please don't suggest writing a custom PHP extension :-) Unless the C interface has gotten radically better in the last few years, the last time I did that, it was like donating a pound of flesh to the poorly-documented-API-gods.

@dannylamb
Copy link
Contributor Author

dannylamb commented Aug 26, 2016

@acoburn That bad eh? Never tried, but sniffed around on the net and it didn't look like it was going to be easy.

I know that the comparison and merging of the clocks can be done in the middle. But could the creation of new clocks and their updates be done outside of Drupal and Fedora? I don't know how this would work without the clocks getting updated atomically with each resource update (and it living with each representation).

@dannylamb
Copy link
Contributor Author

Then we could just use one implementation

@acoburn
Copy link
Contributor

acoburn commented Aug 26, 2016

@dannylamb this sounds like a job for zookeeper! 🚀

@dannylamb
Copy link
Contributor Author

@acoburn Yes, we should definitely stick to a proven tool for handling this sort of thing. I'm looking more into it, since Zookeeper opens up a lot of options.

My first instinct is to use it to store commands and have those get distributed out and consumed. But given the size limitations per znode, we'd have to find some way to enrich messages with large binaries for things like PUTs.

We could also do 2 phase commit to do transactions together, i suppose. If we need that sort of thing.

A lot of the recipes seem to be built around barriers and locks, and i'm specifically trying to avoid blocking. So it's a lot to take in...

Anyone out there with experience? I know you've dealt with it some too, @DiegoPino

@ruebot ruebot added the Alpaca label Aug 29, 2016
@acoburn
Copy link
Contributor

acoburn commented Aug 29, 2016

@dannylamb while my thoughts on this are not very well formed, I'd be really careful about putting any substantial content in zk (or similar distributed consensus engine). E.g. you could use the Fedora path as an ephemeral zk node (or a hash of the path so you don't have to worry about nested znodes) and for the znode content, I'd think that just the value of the interval clock would be sufficient. In any case, I wouldn't put binary objects into zk.

@DiegoPino
Copy link
Contributor

@acoburn @daniel-dgi @whikloj @ruebot @br2490 @bryjbrown i wonder what the issues could be if we use directly a keyStore that implements under the hood version vectors (or even more complex concurrent update logics) to manage our needs. I'm speaking about systems like RIAK http://basho.com/posts/technical/vector-clocks-revisited-part-2-dotted-version-vectors/

And excuse my ignorance (i'm really ignorant about this, just searching for the simples way of having a working prototype here).
I was thinking "what if we just "put" key->values into RIAK (each source does this, drupal and fedora key -> fedora resource path, value, you choose). Internally RIAK will apply Version Vectors for those updates right? In case of conflict it will also keep history. So instead of handling sync, version vectors, pre ordering, etc in our system, we delegate to another one(which can be even not used if we don't care about bi-sync for simple very simple repos?) http://docs.basho.com/riak/kv/2.1.4/developing/usage/conflict-resolution/

Also, this could avoid putting version vector info in our resources, i'm kinda not convinced about putting that info into RDF

Maybe i'm not understanding the problem, which is very possible

@acoburn
Copy link
Contributor

acoburn commented Aug 29, 2016

Just an FYI, I've been using Riak with our Fedora repo for three years. (and in that time, the cluster has never been down despite numerous single-node failures). The only "hard" think about Riak (though this applies to ZK, too), is that it really only makes sense to install it on a cluster (i.e. N >= 5). And so people installing a system may think: "what? I need five servers just to install a database?". OTOH, one may not need "full riak" for this and just go with riak-core and webmachine, but then you start talking about actually writing Erlang/OTP... (and if you think OSGi has a learning curve)

@dannylamb
Copy link
Contributor Author

Hrm... I meant storing commands in zk and finding a way to coordinate their execution. It was never my intent to use it as a datastore. Or neccessarily to introduce another.

Just storing the causality tracking info seems great, but I can't figure out how to make sure we update that info in zk and content in Fedora/Drupal without introducing race conditions.

Guess I got a little sidetracked with all you can do in zk.

And yeah, gonna be a hard sell for multiple machines. Even harder than a sell for erlang (which I would love. but that's crazy old me).

@dannylamb
Copy link
Contributor Author

Hrm again... I'm guessing we could just lock on the znode while doing our updates. That sort of goes against my earlier intent to avoid locks. But it's an option.

Of course, so is saying 'screw it' and just flush to Fedora. But I'd like to keep feeling this out.

@dannylamb
Copy link
Contributor Author

dannylamb commented Aug 30, 2016

Thoughts on zk and sync after sleeping on it:

Use zk as distributed lock service. One lock per resource in fedora/drupal. Store causality tracking information in the znode for the resource. Probably should use curator if possible.

We'd handle incoming writes for both Drupal and Fedora like so:

  • Using HTTP middlewares, intercept incoming write requests and call out to zk, obtaining the lock for the resource and acquiring the causality tracking information from the znode for the resource.
  • If the causality tracking information is contained in a custom request header (X-Islandora-Sync or whatevs), compare it to what was extracted from zk.
    • Ignore stale messages (not sure what return code to give this, 204?)
    • Return 409 on concurrent messages
    • Let new messages pass through to intended receiver
  • Update causality tracking in resource's znode.
  • No matter what, release the lock in zk.

Sync would work like this:

  • Write operations for both Drupal and Fedora will emit messages to individual queues.
  • Messages will contain the state of the resource and causality tracking information
  • One consumer will read from the queue Drupal publishes, and send updates to Fedora. One consumer will read from the queue Fedora publishes and send updates to Drupal. The general algorithm they'll follow is:
    • Call out to zookeper, obtaining the lock for the resource and acquiring the causality tracking information from the znode for the resource.
    • Compare causality tracking to what was extracted from zk.
      • Ignore stale messages
      • Resolve concurrent messages by synchronizing using a simple rule like Fedora wins or Drupal always wins
      • Let new messages pass through to intended receiver
  • Update causality tracking in resource's znode.
  • No matter what, release the lock in zk.

At first I was really resistant to pessimistic locks. But I don't know how else to pin things down and make sure updates to zk data and fedora/drupal data line up without some process pulling the rug out from under us.

@acoburn acoburn added the Salmon label Sep 3, 2016
@ruebot
Copy link
Member

ruebot commented Sep 3, 2016

@acoburn
Copy link
Contributor

acoburn commented Sep 4, 2016

@dannylamb first, a simple question with wide ranging implications for Sync: will Fedora Resources have a 1-1 correspondence to Drupal nodes? Or will individual Drupal nodes aggregate collections of Fedora resources?

That is: is the mapping from Fedora to Drupal bijective, surjective or something else entirely (i.e. there may be resources that are entirely excluded from Drupal)?

And a second question: would it be possible to spec out the Drupal side of the API? If this is documented clearly, it will make the middleware code significantly easier to write.

@DiegoPino
Copy link
Contributor

@acoburn, @dannylamb will have a diff. perspective maybe here so i will just speak for my work.

I proposed some time to make a 1-1 mapping from Drupal entities to Fedora Resources (diverging from the first islandora 2.x demo) because "one to many and back" has many complications:

Some of them are:

  • reduce-expand idea works a lot better when it's just one way
  • RDF resources and Binary resources in Fedora are really different beasts(even in terms of LDP restAPI) and the last ones can't be mapped to a File upload field in Drupal, they need a more complex structure able to capture tech metadata,etc.
  • Managing versioning on a "node" would force you to always version in fedora at the same level
  • All the different inheritance operations like webACL would end being "reduce" operations when mapping from fedora to drupal (many to one) but would yield in "expand by duplicating or assuming same values" when mapping from drupal to fedora, which means an uneven level of detail depending on which part is the one "updating", not sure if i explained that.

All the work i have been doing is based on a 1 to 1 mapping which has evolved to something concrete during this Sprint.
That said, on Friday I manage to share with Jared and Dany (before killing them of boredom with my coding questions) a "Matrushka Doll of entities" i built using multiple nested types of Drupal entities that syncs using a single Node (the aggregation or top-level resources) for explanatory purposes let's name this "A"). The idea ,very resumed, is here:

  • "A" links to a Drupal custom entity that models a fedora resource (Rdf Source == "B"),.

  • "B" links to fedora non Rdf sources("C") that make use of media entities (Plugin based entities, "D").

    All this can be managed (UI, form ingest) on the main, first node "A". End user or admin has the power to build "solution packs" by managing how those are nested inside and defining cardinalities and other constraints using drupal tools (not invented by me!) and each in Drupal defined Fedora Resource ("B" and "C") implement in their storage classes connections via micro services to fedora.

I see many advantages on this approach, lots of code reuse, compatible with all Drupal contributed modules and high flexibility. Still needs more work of course and i am slow, i'm sure people would like me doing as much pulls as possible, i'm just taking my time.
Mostly doing a lot of testing still, and i do have many questions about this and would love to hear about what could be useful or what are the cons and pros you see of this approach.

That last image is a "mashup" from multiple screen captures to allow for the whole vertical view of this working idea. Sorry for the stupid field labels, was just a test:

screen1

@acoburn
Copy link
Contributor

acoburn commented Sep 4, 2016

@DiegoPino I'm fine with whatever structure makes the most sense for Drupal. If the mapping is 1-1, then all we need is a simple bijective function, which is super easy. If the mapping is more complex, then Fedora -> Drupal would involve some kind of map(fn1).reduce(fn2) while the Drupal -> Fedora would involve a (possibly recursive) flatMap(fn3).map(fn4). The later is more involved, but entirely do-able -- in fact, it's not actually all that complex (famous last words and all...).

@dannylamb
Copy link
Contributor Author

@acoburn Ideally, there's a one to one correspondence between Fedora resources and Drupal entities. That's the simplest for the middleware, but means we'd be making entities for containers, proxies, fcr:metadata, etc... in addition to the files and aggregations.

I'm sure inevitably we'll run across something that will challenge that assumption. Limiting things to a particular domain is foreseeable and could be controlled with a predicate like 'indexable' controls things in fcrepo-camel-toolbox. Something could be labeled as 'syncable' to make sure it propagates, otherwise it just stays put.

And the ingest flow that @DiegoPino is showing would still create the individual entities. That top level node he's creating is pretty much the resource map, which we've discussed handling differently.

So the reality is that probably both Fedora and Drupal will each have a subset of data that can be bijectively mapped to the other.

@dannylamb
Copy link
Contributor Author

I've been going over this a lot, and for MVP I think we really need to tone down sync. I think auto flushing to Fedora asynchronously while allowing for a 're-indexing' of Drupal from Fedora is going to be much more manageable. Still tough, and still requires most of what we've talked about (version vectors, idempotency, etc...). And we're gonna have to test it TO DEATH.

But let's face it, we're a really small team, and this gets into some pretty advanced stuff. I wouldn't want the rest of the project to suffer just because this seems really cool.

@ruebot
Copy link
Member

ruebot commented Mar 9, 2017

Resolved with #552 & MVP.

@ruebot ruebot closed this as completed Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants