-
Notifications
You must be signed in to change notification settings - Fork 88
Handling very large sstables #9
Comments
Hi @bvanberg, I've been prototyping the code on the Coursera fork =) Unfortunately, the way Priam currently writes out the compressed SSTable makes it non-splittable, as far as I know (which is why Cassandra needed the CompressionInfo metadata in the first place). Maybe @jasobrown can comment on this. CompressionInfo.db files should help make the SSTable splittable, but I haven't really looked into it as I'm blocked on the first issue. In theory, you could stage an initial Map phase the decompresses the Priam files into HDFS, and then run the splitter on that. |
Btw, if you're reading compressed files you might want to be aware of #10 |
@bvanberg Basically we bootstrap by taking the Priam snapshot files on day one. Usually a pretty hefty job. Then from day to day we are only consuming the incremental SSTables as Priam archives them out. For the most part this means that we have much smaller snappy compressed files to deal with, but it does mean that finding and moving all the correct files is much more work. To answer your second question, we could indeed split the compressed data, but I haven't actually had to consume the data, so the code was only a proof of concept (and apparently buggy). |
@charsmith @danchia We unfortunately are not running incremental Priam backups as of yet so our jobs are fairly intensive. |
This would still be a nice enhancement. I am closing the issue and adding a reference to this in the Enhancements section of the README. |
I am attempting to run aegisthus against a Priam backup containing some very large sstables (~200GB in size). Being that these files are snappy compressed (in addition to the Cassandra snappy compression) and we are using Cassandra 1.2 I'm currently using the code on the coursera fork. I'm running into problems with this as it takes a very long time (3+ hours) to handle these very large files. Aegisthus appears to expect sstables which haven't been double compressed by Priam, and doesn't appear to split the files if the -CompressionInfo.db files are available.
Questions:
The text was updated successfully, but these errors were encountered: