Skip to content
This repository has been archived by the owner on Jun 28, 2018. It is now read-only.

Separate HdfsSnapshotStore as akka-persistence-hdfs #8

Open
ktoso opened this issue Feb 4, 2014 · 4 comments
Open

Separate HdfsSnapshotStore as akka-persistence-hdfs #8

ktoso opened this issue Feb 4, 2014 · 4 comments
Assignees
Milestone

Comments

@ktoso
Copy link
Owner

ktoso commented Feb 4, 2014

Snapshots can be also stored directly to HDFS, if they are really big or you need "easy takeout".

@ghost ghost assigned ktoso Feb 4, 2014
@ktoso ktoso modified the milestones: 0.2.1, 0.2.5 Feb 6, 2014
@ktoso ktoso changed the title implement snapshotting support - HDFS Separate HdfsSnapshotStore as akka-persistence-hdfs Jul 9, 2014
@dispalt
Copy link

dispalt commented May 22, 2015

@ktoso do you plan on making the accumulation of events work on hdfs too, or just snapshots?

@ktoso
Copy link
Owner Author

ktoso commented May 22, 2015

Putting events onto HDFS is not optimal I think, HBase performs way better for such operations (SCANs). I'd be inclined to say no to the feature of events directly on HDFS.

(see my talk about avoiding hot-spotting to get more insight how hbase is a better target for events: http://www.slideshare.net/ktoso/hbase-rowkey-design-for-akka-persistence )

@dispalt
Copy link

dispalt commented May 22, 2015

I was thinking since journaling is writing a consistent stream (nothing random), it would be an good fit. In my mind it would function similar to HBase's use of HDFS for it's WAL, since they are essentially the same thing. But you'd probably end up to write a lot of code to make it efficient and preserve the semantics.

Yeah I am familiar with region hotspotting, was more curious abt using HDFS directly.

@ktoso
Copy link
Owner Author

ktoso commented May 22, 2015

I think I'd end up reimplementing a lot of what hbase does when trying to implement this.
I always viewed hbase as "the best way to scan across HDFS".
If we'd naively just append to a file in HDFS I think we'd bump into hot-spotting anyway - directly overloading one datanode more than another one.

I may be wrong (that happens from time to time :-)), please correct me (code very welcome) if that's the case here :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants