You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 28, 2018. It is now read-only.
Putting events onto HDFS is not optimal I think, HBase performs way better for such operations (SCANs). I'd be inclined to say no to the feature of events directly on HDFS.
I was thinking since journaling is writing a consistent stream (nothing random), it would be an good fit. In my mind it would function similar to HBase's use of HDFS for it's WAL, since they are essentially the same thing. But you'd probably end up to write a lot of code to make it efficient and preserve the semantics.
Yeah I am familiar with region hotspotting, was more curious abt using HDFS directly.
I think I'd end up reimplementing a lot of what hbase does when trying to implement this.
I always viewed hbase as "the best way to scan across HDFS".
If we'd naively just append to a file in HDFS I think we'd bump into hot-spotting anyway - directly overloading one datanode more than another one.
I may be wrong (that happens from time to time :-)), please correct me (code very welcome) if that's the case here :-)
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Snapshots can be also stored directly to HDFS, if they are really big or you need "easy takeout".
The text was updated successfully, but these errors were encountered: