-
Notifications
You must be signed in to change notification settings - Fork 10
Bulk API
Design notes for the Gitana Bulk API
var Gitana = require("gitana");
var t = Gitana.transactions().create();
// bind the transaction to a context
// you can either use a reference string or an object
t.for("branch://<platformId>/<repositoryId>/<branchId>");
//t.for(branch);
// create an object
t.create({
"title": "My first article",
"_type": "custom:article"
});
// update an object
t.update({
"title": "Custom page",
"_type": "web:page"
});
// delete an object by GUID
t.del("GUID1");
// delete an object
t.del({
"_doc": "GUID2"
});
// specify the retry count
t.retryCount(3);
// start the transaction commit
// specify the callback to fire once the commit completes
t.commit(function(results) {
...
});
As a first pass, the created transaction should store all JSON objects in-memory until commit() is called. When commit is called, the transaction can run through all objects and send single requests. For N number of objects, there could be N requests to add items to the transaction.
As a second pass, the created transaction could still store all JSON objects in-memory until commit() is called. However, when commit() is called, the objects could be grouped and sent in packages of 10, 100 or more depending on the total size of the JSON payload.
As a third pass, the created transaction could start chunking sends as things are being added. This would remove the requirement to hold things in memory. Local Storage could also be used as a way to queue things. This is not required for the moment, but an intended optimization down the road.
The REST API supports four methods for creating, canceling and committing a transaction as well as populating the transaction with objects to commit.
Creates a transaction. The only required parameter is a reference against which the transaction will be run. In the case of a transaction containing nodes, the transaction should be referenced to the branch. All references are written out using the reference syntax:
<type>://<platformId>//<datastoreId>[//objectId1][//objectId2]
The method is invoked like this:
POST /transactions?reference=branch://{platformId}/{repositoryId}/{branchId}
The response is:
{
"_doc": "<transactionId>",
"container-reference": "<reference>",
"status": "ACCUMULATING"
}
A transaction is always in one of three states:
- ACCUMULATING - the transaction is being built up. While in this state, objects can be added to the transaction and the transaction can also be deleted.
- COMMITTING - the transaction is executing. While in this state, objects cannot be added to the transaction and the transaction cannot be deleted.
- FINISHED - the transaction has finished committing. While in this state, objects can no longer be added and the transaction can be deleted.
Adds one or more objects to an existing transaction
POST /transactions/{transactionId}/add
The request should look like this:
{
"objects": [{
"header": {
"type": "node",
"operation": "write" (or "delete")
},
"data": {
... data fields ...
}
}, ... more items ...]
}
And the response looks like:
{
"results": [{
"_doc": "<objectId>",
"transactionId": "<id of the transaction instance>",
"operation": "write" (or "delete"),
"type": "node"
}, ... more results ...]
}
Deletes a transaction and removes any objects associated with it
DELETE /transactions/{transactionId}
Starts the commit of all of the objects within a transaction to the database.
This method returns right away with a simple status response indicating that the transaction commit job has been submitted to the distributed job queue. The transaction runs in the background and may run on a different node in the cluster.
The client code is then responsible for polling the server at a future point to retrieve the full transaction status and results.
POST /transactions/{transactionId}/commit
Retrieves the status of a transaction.
GET /transactions/{transactionId}/status
The results come back like this:
{
"status": "ACCUMULATING|COMMITTING|FINISHED",
"results": {
"transactionId": "<id of the transaction instance>",
"startTime": <transaction commit start time in ms>,
"endTime": <transaction commit end time in ms>,
"totalCount": <total number of objects committed>,
"errorCount": <total number of object commit failures>,
"successCount": <total number of object commit successes>,
"results": {
"<transactionObjectId>": {
"ok": true,
"startTime": <time when object commit started>,
"endTime": <time when object commit ended>,
"dataId": "<id of the object written or deleted>",
"error": {
"message": "[error message]"
}
}
}
}
}
Objects added to a transaction support an optional _alias
field in their data that allows them to claim a temporary identifier for use in cross-referencing between objects in the commit. This is useful because the actual _doc
for created objects is not known ahead of time and isn't assigned until the actual transaction is being committed server side.
The transaction commit process will find any _alias
text values found within the JSON objects being committed and resolve them to their actual _doc
values ahead of actually committing them.
Using aliases, you can create two objects that reference each other like this:
{
"_alias": "temporary_123",
"title": "My First Article",
"related_to": "temporary_456"
}
{
"_alias": "temporary_456",
"title": "My Second Article",
"related_to": "temporary_123"
}
And then resulting objects might be something like this:
{
"_doc": "GUID1",
"title": "My First Article",
"related_to": "GUID2"
}
{
"_doc": "GUID2",
"title": "My Second Article",
"related_to": "GUID1"
}
An alias value can be any textual string. Be careful to make sure they are truly unique or your commits may not turn out as you intend.