Skip to content

Akvo Flow Sync API

Dan Lebrero edited this page Jun 19, 2020 · 9 revisions

Akvo Flow Sync API

The Akvo Flow Sync API (Sync API) provides a way of getting changes that happened in the system. This is useful for synchronizing data changes out of Akvo Flow to an external system in near real-time fashion.

What problem it solves?

Before the Sync API an Akvo Flow user that wanted to synchronize data out of the system had to read all its data every single time and then try to "detect" what has changed. This process is onerous on both ends: For the server serving API requests but also for the client getting repetitive data and implementing the logic of change detection.

With the Sync API a client can obtain "events" of relevant entities that have changed in the system. It is even possible to detect entities that have been deleted. Something that is not easy with the other approach.

Published events

The Sync API publishes change and delete events for:

  • Survey
  • Form
  • Form Instance
  • Data Point

The contract

The Sync API has the following properties that must be understood:

  • Eventual consistency
  • At-least-once delivery
  • Data authorization
  • Authentication and request headers

Eventual consistency

Due to the nature on how the system records the data change events we can only guarantee that a Sync API client will eventually be "in sync" with the state of the database in Akvo Flow. That is, a client starts with some subset of the Akvo Flow data, consumes the published changes, and at some point, it will reach the same state of Akvo Flow datastore. More info: https://en.wikipedia.org/wiki/Eventual_consistency

⚠️ There is a delay of at least 60 seconds between an event happening in the Flow system and the event showing in the Sync API

At-least-once delivery

Akvo Flow publishes the changes that happen in the system (A new survey was created, new form instance was ingested, etc) but a particular event can show up more than once in the list of published changes. The client must be prepared to process possible duplicated events.

Data authorization

The Sync API follows the exact same authorization model of Akvo Flow. Users of the Sync API will only see changes that are authorized. There are 2 important points to highlight:

  • Due to data authorization is possible that a client can get an "empty batch" (The user is not allowed to "see" any of the changes in that batch). In this case the client should continue with the next batch.
  • Events related to object deletions (surveyDeleted, formDeleted, formInstanceDeleted, dataPointDeleted) can't be authorized. So the client can get ids for entities it doesn't know. The client must ignore those ids.

Authentication and request headers

The process

The steps in more detail:

Step 1

A client must obtain and store a Sync URL by making a initial request:

https://api-auth0.akvo.org/flow/orgs/<org>/sync?initial=true

Example response:

{
  "nextSyncUrl": "https://api.akvo.org/flow/orgs/<org>/sync?next=true&cursor=6534"
}
  • <org> is the organization subdomain
  • nextSyncurl is a unique and "use as is" URL that a client must store and use for consuming change events. NOTE: The client must not try to interpret it. The server is entitled to change its structure.

Step 2

Using the Akvo Flow REST API the client reads and process its current data.

Step 3

Once the client has read and processed its data from the Akvo Flow REST API, the client is "ready" to start consuming change events. Using the nextSyncUrl from Step 1 the client makes a GET request to know what has changed in the system.

Example response:

{
  "changes": {
    "dataPointDeleted": [],
    "dataPointChanged": [],
    "formChanged": [],
    "surveyChanged": [
      {
        "id": "203260001",
        "name": "Survey test - Sync API",
        "registrationFormId": "",
        "createdAt": "2020-02-10T20:55:42.645Z",
        "modifiedAt": "2020-02-10T20:56:04.037Z"
      }
    ],
    "formDeleted": [],
    "surveyDeleted": [],
	"formInstanceChanged": [],
	"formInstanceDeleted": []
  },
  "nextSyncUrl": "https://api-auth0.akvotest.org/flow/orgs/uat2/sync?next=true&cursor=6805"
}

The response contains 2 keys:

  • changes: It's an object with a predefined set of keys: surveyChanged, surveyDeleted, formChanged, formDeleted, dataPointChanged, dataPointDeleted, formInstanceChanged, formInstanceDeleted
    • Each entity use the same representation of the Akvo Flow REST API except they don't contain "URL links", only data.
    • For Deleted events, only entity ids are presented
    • Notice the "Changed" suffix in some keys. The Sync API makes no distinction between new and updated entities. The client can deal with this by using an upsert strategy.
  • nextSyncUrl: Similar to the initial request. A "use as is" URL to be used for getting the next batch of changes.

Step 4

The client continues reading the changes using nextSyncUrl until no changes are available. When a client reaches the end of the list of changes:

  • The server returns HTTP 204 (No Content)
  • The server returns a response header Cache-Control: max-age=<value>
  • The client must wait the value of max-age (e.g. 60 seconds) before attempting to get more changes using the last nextSyncUrl available

FAQ

Why do we need that initial extra call with /sync?initial=true?

Fetching data using Akvo Flow REST API takes time and is a forward only process. Consider the following scenario:

  • The time spent to fetch all data is the interval between 2 timestamps (ts0 and ts1).
  • Your process starts at ts0 and has processed more than half of the stored data
  • Some user(s) submit data for processing to an already processed Survey.
  • In this scenario you lost some Form Instances, due to the forward only nature of the processing. That new data will be only available until the next full synchronization of the data.

Akvo Flow Sync API - 1

With the call /sync?initial=true, your start a "Synchronization session", a mark in the full log of events. Then when your process finishes processing at ts1, you can make a request to know what has changed since ts0, which is to say, give me all change events after the mark.

Akvo Flow Sync API - 2

What happens if the client loses the nextSyncUrl URL?

In order to have a reliable synchronization process the client must store the nextSyncUrl. If the client has lost the value, it must start from scratch. From Step 1 in the described process.