-
Notifications
You must be signed in to change notification settings - Fork 107
Conversation
looks good to me. the docs need to be updated so that the build checks pass. |
fb1d7ab
to
c5dc6d3
Compare
Added a few more minor changes, made tests pass, and did some more testing in my local env. Seems to all be working fine as fas as i can tell. |
Would it be possible to do just 1 partition, or a range? It looks like this would take quite a long time to run sequentially on our data, so there might be utility in doing a partition at a time so we can stop/continue easily. |
makes it possible to iterate over any single partition or an arbitrary range of partitions
@shanson7 definitely, added that |
i didn't check this in depth (or think about it much because i'm on vacation \o/ ), but running this as a tool seems harder to operate (requires setting up extra jobs etc) |
A background routine would be a really bad idea, for large instances a huge amount of ram is needed the data from cassandra. Doing the archiving out-of-band is definitely the most efficient approach and solves the problems we are trying to address. Perhaps we could make archiving at startup an optional thing, but doing so is outside the scope of this PR. |
- if writes to cassandra fail, just continue on to the next def - keep count of defs successfully archived.
313dbd4
to
d2868c8
Compare
Why would it require any more RAM than MT is already using? MT already has exactly what time-series are still alive in the memory index. I will point out, though, that doing it within MT would be harder for me. We run 1 set of write instances which prune the index aggressively (different set of index rules) so they wouldn't be able to do this. Our read instances are replicated, meaning they would duplicate the effort if they both tried to do the same partitions. Just my 2 cents |
only did some basic testing with this in a local setup, will do more testing
will also need to add docs for it
fixes #1069