Skip to content
Michael Uzquiano edited this page Jun 28, 2014 · 12 revisions

Design notes for the Gitana Bulk API

Usage

var Gitana = require("gitana");

var t = Gitana.transactions().create();

// bind the transaction to a context
// you can either use a reference string or an object
t.for("branch://<platformId>/<repositoryId>/<branchId>");
//t.for(branch);

// create an object
t.create({
  "title": "My first article",
  "_type": "custom:article"
});

// update an object
t.update({
  "title": "Custom page",
  "_type": "web:page"
});

// delete an object by GUID
t.del("GUID1");

// delete an object
t.del({
  "_doc": "GUID2"
});

// specify the retry count
t.retryCount(3);

// start the transaction commit
// specify the callback to fire once the commit completes
t.commit(function(results) {
   ...
});

As a first pass, the created transaction should store all JSON objects in-memory until commit() is called. When commit is called, the transaction can run through all objects and send single requests. For N number of objects, there could be N requests to add items to the transaction.

As a second pass, the created transaction could still store all JSON objects in-memory until commit() is called. However, when commit() is called, the objects could be grouped and sent in packages of 10, 100 or more depending on the total size of the JSON payload.

As a third pass, the created transaction could start chunking sends as things are being added. This would remove the requirement to hold things in memory. Local Storage could also be used as a way to queue things. This is not required for the moment, but an intended optimization down the road.

HTTP/REST

The REST API supports four methods for creating, canceling and committing a transaction as well as populating the transaction with objects to commit.

Create a Transaction

Creates a transaction. The only required parameter is a reference against which the transaction will be run. In the case of a transaction containing nodes, the transaction should be referenced to the branch. All references are written out using the reference syntax:

<type>://<platformId>//<datastoreId>[//objectId1][//objectId2]

The method is invoked like this:

POST /transactions?reference=branch://{platformId}/{repositoryId}/{branchId}

The response is:

{
   "_doc": "<transactionId>",
   "container-reference": "<reference>",
   "status": "ACCUMULATING"
}

A transaction is always in one of three states:

  • ACCUMULATING - the transaction is being built up. While in this state, objects can be added to the transaction and the transaction can also be deleted.
  • COMMITTING - the transaction is executing. While in this state, objects cannot be added to the transaction and the transaction cannot be deleted.
  • FINISHED - the transaction has finished committing. While in this state, objects can no longer be added and the transaction can be deleted.

Add objects to a Transaction

Adds one or more objects to an existing transaction

POST /transactions/{transactionId}/add

The request should look like this:

{
   "objects": [{
      "header": {
         "type": "node",
         "operation": "write" (or "delete")
      },
      "data": {
         ... data fields ...
      }
   }, ... more items ...]
}

And the response looks like:

{
   "results": [{
      "_doc": "<objectId>",
      "transactionId": "<id of the transaction instance>",
      "operation": "write" (or "delete"),
      "type": "node"
   }, ... more results ...]
}

Delete a Transaction

Deletes a transaction and removes any objects associated with it

DELETE /transactions/{transactionId}

Commit a Transaction

Starts the commit of all of the objects within a transaction to the database.

This method returns right away with a simple status response indicating that the transaction commit job has been submitted to the distributed job queue. The transaction runs in the background and may run on a different node in the cluster.

The client code is then responsible for polling the server at a future point to retrieve the full transaction status and results.

POST /transactions/{transactionId}/commit

Get transaction status

Retrieves the status of a transaction.

GET /transactions/{transactionId}/status

The results come back like this:

{
   "status": "ACCUMULATING|COMMITTING|FINISHED",
   "results": {
      "transactionId": "<id of the transaction instance>",
      "startTime": <transaction commit start time in ms>,
      "endTime": <transaction commit end time in ms>,
      "totalCount": <total number of objects committed>,
      "errorCount": <total number of object commit failures>,
      "successCount": <total number of object commit successes>,
      "results": {
         "<transactionObjectId>": {
            "ok": true,
            "startTime": <time when object commit started>,
            "endTime": <time when object commit ended>,
            "dataId": "<id of the object written or deleted>",
            "error": {
               "message": "[error message]"
            }
         }
      }
   }
}

Aliases

Objects added to a transaction support an optional _alias field in their data that allows them to claim a temporary identifier for use in cross-referencing between objects in the commit. This is useful because the actual _doc for created objects is not known ahead of time and isn't assigned until the actual transaction is being committed server side.

The transaction commit process will find any _alias text values found within the JSON objects being committed and resolve them to their actual _doc values ahead of actually committing them.

Using aliases, you can create two objects that reference each other like this:

{
   "_alias": "temporary_123",
   "title": "My First Article",
   "related_to": "temporary_456"
}

{
   "_alias": "temporary_456",
   "title": "My Second Article",
   "related_to": "temporary_123"
}

And then resulting objects might be something like this:

{
   "_doc": "GUID1",
   "title": "My First Article",
   "related_to": "GUID2"
}

{
   "_doc": "GUID2",
   "title": "My Second Article",
   "related_to": "GUID1"
}

An alias value can be any textual string. Be careful to make sure they are truly unique or your commits may not turn out as you intend.