-
Notifications
You must be signed in to change notification settings - Fork 173
RetentionHistory
Running on a local datastore, Bigdata provides an option to retain historical data for a specified period.
The RWStore (for Read-Write) provides an updatable store that can efficiently recycle data allocations as it is updated.
By default no historical data is retained and long-lived read-only transactions are supported via an internal mechanism that protects against the immediate re-allocation of any storage accessible to them.
The option to retain historical data has two main usages:
1) To explicitly allow access to historical data for some analytical purpose.
2) To enable the management of overlapping read-only transactions used in continual load/query deployments that would otherwise be unable to recycle storage.
We will use the Bigdata Sail classes for these examples.
// A simple initialisation method setting a retention period
public BigdataSail initializeSail(final long retention_ms) {
Properties properties = new Properties();
// create temporary file for this application run
File journal = File.createTempFile("BIGDATA", "jnl").getAbsolutePath();
properties.setProperty(BigdataSail.Options.FILE, journal.getAbsolutePath());
// Set RWStore
properties.setProperty(Options.BUFFER_MODE, BufferMode.DiskRW.toString());
// Set retention with the minimum release age propertyproperty
properties.setProperty(AbstractTransactionService.Options.MIN_RELEASE_AGE, retention_ms);
BigdataSail sail = new BigdataSail(properties);
sail.initialize();
return sail;
}
Using the above method we can easily create a Bigdata Sail with a specified retention period.
Note that we have also set the BufferMode
to specify the RWStore
The historical retention points are accessible using a "state" value. This is approximately the system time returned by System.currentTimeMillis(), but you must not rely on using the system time to record retention points. Instead you should retrieve the commit time from the connection.
Here is a handy method:
// Method to commit a connection and return the commit time
public long commit(final BigdataSailRepositoryConnection cxn) {
cxn.commit();
return cxn.getRepository().getDatabase().getIndexManager().getLastCommitTime();
}
But of course this all relies on the reason why you want to retain the history. You may wish to retain a few days history and to be able to run queries as of a few hours previously, without concern for a precise state; in which case using the system time will be quite sensible.
The general approach is to maintain an "update" connection and to make queries against a read only connection:
final long TWO_HOURS = 2L * 60 * 60 * 1000;
final BigdataSail sail = initializeSail(TWO_HOURS);
final BigdataSailRepository repo = new BigdataSailRepository(sail);
final BigdataSailRepositoryConnection cxn = repo.getConnection();
The cxn
is used to update the repository by adding and removing
statements explicitly or via queries.
The commit
method defined above can then be used to commit the current
set of updates and return the state which can later be used to read from
this commit point.
final long rememberedState = commit(cxn);
Generally read only connections are required for two reasons:
1) To access the currently committed state:
final BigdataSailRepositoryConnection ro_cxn1 = repo.getReadOnlyConnection(ITx.READ_COMMITTED);
or 2) To access a specified state:
final BigdataSailRepositoryConnection ro_cxn2 = repo.getReadOnlyConnection(rememberedState);
Remember to close those read only connections when you are done with them or they may hang around resulting in more history retention than you planned for.
final BigdataSailRepositoryConnection ro_cxn2 = repo.getReadOnlyConnection(rememberedState);
try {
// ..do something
} finally {
ro_cxn2.close();
}