-
Notifications
You must be signed in to change notification settings - Fork 107
Provide a tool to copy data between Cassandra clusters #909
Conversation
Mt store cp
New features, clean up
maxBatchSize = flag.Int("max-batch-size", 10, "max number of queries per batch") | ||
|
||
idxTable = flag.String("idx-table", "metric_idx", "idx table in cassandra") | ||
partitions = flag.String("partitions", "*", "process ids for these partitions (comma separated list of partition numbers or '*' for all)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to add validation that this should not be an empty string, or simply that it is a series of digits (can be parsed with strconv.Atoi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind i see now that you do this.
} | ||
|
||
func update(sourceSession, destSession *gocql.Session, tableIn, tableOut string) { | ||
// Get the list of ids that we carry about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
care about?
do you remember offhand what kind of improvement you got thanks to the unlogged batches? this tool looks fit for merging.was it reliable in your experience? |
Offhand, I don't know. While running it in k8s it was unfortunately unreliable. After some time, writes would stop being reflected (although gocql indicated they were succeeding). Not sure if that was our Cassandra setup or not though. Either way, it takes days to copy even a small time range :/ |
Without unlogged batches, it was far slower, but exhibited the same behavior of writes just stopping. Very strange. |
then I think i shall merge it but mark it as experimental. |
This is similar to
mt-update-ttl
but allows the destination cluster to differ from the source.Additionally, it uses unlogged batches for performance.