You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HBaseBackfillMerger does not seemingly make use of disabling autoflush. I can't see anything in site-xml suggesting it can be disabled by default so wonder if there is a design reason for this. We get a 10x increase in insert rate when auto flush is disabled and on a 6B record cube (Google tiles at 23 zooms for all species) this is crippling.
Before I propose a pull request, I thought I'd just ask.
The text was updated successfully, but these errors were encountered:
@timrobertson100, I think the issue here is that we don't want to delay writes by the HBaseBackfillMergeMapper. The way cubes are merged is by reading all three input cubes (recalculated, snapshot, and live cubes), determining the new value for the live cube based on those three inputs, then writing the live cube. There is an obvious race condition here: if the live cube value is updated between the read and write, then the update will be lost. By flushing the write immediately, we limit the time window where this race condition causes problems.
There could be other opportunities for optimizing the HBaseBackfillMergeMapper. Have you experimented with increasing the number of concurrent map tasks? We might even consider running multiple threads inside a single map task. This would help if the bottleneck is round-trip HBase latency.
HBaseBackfillMerger does not seemingly make use of disabling autoflush. I can't see anything in site-xml suggesting it can be disabled by default so wonder if there is a design reason for this. We get a 10x increase in insert rate when auto flush is disabled and on a 6B record cube (Google tiles at 23 zooms for all species) this is crippling.
Before I propose a pull request, I thought I'd just ask.
The text was updated successfully, but these errors were encountered: