Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider autoflush=false for HBaseBackfillMerger #27

Open
timrobertson100 opened this issue Jul 17, 2012 · 1 comment
Open

Consider autoflush=false for HBaseBackfillMerger #27

timrobertson100 opened this issue Jul 17, 2012 · 1 comment

Comments

@timrobertson100
Copy link
Contributor

HBaseBackfillMerger does not seemingly make use of disabling autoflush. I can't see anything in site-xml suggesting it can be disabled by default so wonder if there is a design reason for this. We get a 10x increase in insert rate when auto flush is disabled and on a 6B record cube (Google tiles at 23 zooms for all species) this is crippling.

Before I propose a pull request, I thought I'd just ask.

@drevell
Copy link
Contributor

drevell commented Jul 29, 2012

@timrobertson100, I think the issue here is that we don't want to delay writes by the HBaseBackfillMergeMapper. The way cubes are merged is by reading all three input cubes (recalculated, snapshot, and live cubes), determining the new value for the live cube based on those three inputs, then writing the live cube. There is an obvious race condition here: if the live cube value is updated between the read and write, then the update will be lost. By flushing the write immediately, we limit the time window where this race condition causes problems.

There could be other opportunities for optimizing the HBaseBackfillMergeMapper. Have you experimented with increasing the number of concurrent map tasks? We might even consider running multiple threads inside a single map task. This would help if the bottleneck is round-trip HBase latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants