In CassandraStorage implement segments as clustering keys within the repair_run table #102

michaelsembwever · 2017-05-13T07:05:12Z

Collapses three tables into just one repair_run.

Also a change as required in IStorage so to identify a segment both by runId and segmentId.

ref:

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

michaelsembwever · 2017-05-23T22:15:38Z

This branch is looking better now.
The only thing is that for a yet unknown reason the integration tests stopped working on me unless I switch them to use allowUnreachableNodes, ref https://github.com/thelastpickle/cassandra-reaper/pull/102/files#diff-fa47d3fc97cdc9d63f61af8cd48c2f5bR11

michaelsembwever · 2017-05-23T22:16:27Z

src/test/resources/cassandra-reaper-cassandra-at.yaml

@@ -8,6 +8,7 @@ repairRunThreadCount: 15
 hangingRepairTimeoutMins: 1
 storageType: cassandra
 incrementalRepair: false
+allowUnreachableNodes: true


Integration tests wouldn't work without turning this on. Still not sure if i broke something, if my environment changed, or if I fixed something…?

I doubt you fixed something here :)
allowUnreachableNodes permits have reaper start processing a segment even if all replicas aren't reachable through JMX. This was introduced to repair multi region clusters with the JMX port closed in between regions (which should be the case for security sake).

This flag will be deprecated with the introduction of the ft-reaper branch that uses the Cassandra backend to exchange metrics between regions.

I'll try to replicate the flaky tests using this branch and see if I can make sense out of it.

a) Previously the integration tests were working for me even though two of the three nodes could not be reached via JMX.

b) With this patch the integration tests then started to fail.

c) The integration tests with this patch do work with allowUnreachableNodes=true , as one would expect.

I don't know, but I'm left wondering if the problem was in (a), and not actually in this patch. It's worth checking though.

adejanovski · 2017-05-24T14:28:27Z

src/main/resources/db/cassandra/001_Initialize_db.cql

+  segment_start_time timestamp,
+  segment_end_time   timestamp,
+  fail_count         int,
+  PRIMARY KEY (id, segment_id)
 )
  WITH compaction = {'class': 'LeveledCompactionStrategy'}
  AND caching = {'rows_per_partition': 10};


I still think we need to make no change that requires the users to drop the database then recreate it.
As schema migration is performed, we need to create a new cql file with a new version number, create a single table in it that we would call repair_run_v2 for example, and move all queries from repair_run to repair_run_v2 in the code. Otherwise the app will just crash if people with existing installs try to upgrade.

We also need more caching here since many runs could have thousands of segments. After running some tests on the old schema, setting 1000 rows_per_partition uses only a few kb in cache so I guess we can use values ranging between 1000 and 5000 to maximize cache utilization.

will fix…

adejanovski · 2017-05-24T14:31:15Z

src/test/resources/cassandra-reaper-cassandra-at.yaml

@@ -8,6 +8,7 @@ repairRunThreadCount: 15
 hangingRepairTimeoutMins: 1
 storageType: cassandra
 incrementalRepair: false
+allowUnreachableNodes: true


I doubt you fixed something here :)
allowUnreachableNodes permits have reaper start processing a segment even if all replicas aren't reachable through JMX. This was introduced to repair multi region clusters with the JMX port closed in between regions (which should be the case for security sake).

This flag will be deprecated with the introduction of the ft-reaper branch that uses the Cassandra backend to exchange metrics between regions.

I'll try to replicate the flaky tests using this branch and see if I can make sense out of it.

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

michaelsembwever · 2017-05-29T09:10:46Z

This PR will get re-merged again, into 'mck/cassandra-improvements-94', after #101 gets merged.

michaelsembwever · 2017-05-29T09:12:18Z

I doubt you fixed something here :)
allowUnreachableNodes permits have reaper start processing a segment even if all replicas aren't reachable through JMX. This was introduced to repair multi region clusters with the JMX port closed in between regions (which should be the case for security sake).

I'll take it out. As I figured out I was just having problems with local iptables occasionally blocking the jmx connections to the ccm addresses.

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

* Cassandra performance: Replace sequence ids with time-based UUIDs Makes the schema changes in a separate migration step, so that data in the repair_unit and repair_schedule tables can be migrated over. ref: - #99 - #94 - #99 (comment) * Simplify the creation of repair runs and their segments. Repair runs and their segments are one unit of work in concept and the persistence layer should be designed accordingly. Previous they were separated because the concern of sequence generation for IDs were exposed in the code. This is now encapsulated within storage implementations. This work allows the CassandraStorage to implement segments as clustering keys within the repair_run table. ref: - #94 - #101 * In CassandraStorage implement segments as clustering keys within the repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102 * Fix number of parallel repair computation Downgrade to Dropwizard 1.0.7 and Guava 19.0 to fix dependency issues Make repair manager schedule cycle configurable (was 30s hardcoded) ref: #108 * In CassandraStorage replace the table scan on `repair_run` with a async break-down of per cluster run-throughs of known run IDs. ref: #105

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

* Cassandra performance: Replace sequence ids with time-based UUIDs Makes the schema changes in a separate migration step, so that data in the repair_unit and repair_schedule tables can be migrated over. ref: - #99 - #94 - #99 (comment) * Simplify the creation of repair runs and their segments. Repair runs and their segments are one unit of work in concept and the persistence layer should be designed accordingly. Previous they were separated because the concern of sequence generation for IDs were exposed in the code. This is now encapsulated within storage implementations. This work allows the CassandraStorage to implement segments as clustering keys within the repair_run table. ref: - #94 - #101 * In CassandraStorage implement segments as clustering keys within the repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102 * Fix number of parallel repair computation Downgrade to Dropwizard 1.0.7 and Guava 19.0 to fix dependency issues Make repair manager schedule cycle configurable (was 30s hardcoded) ref: #108 * In CassandraStorage replace the table scan on `repair_run` with a async break-down of per cluster run-throughs of known run IDs. ref: #105

michaelsembwever mentioned this pull request May 13, 2017

Cassandra performance improvements #94

Closed

michaelsembwever changed the title ~~In CassandraStorage implement segments as clustering keys within the repair_run table~~ WIP – In CassandraStorage implement segments as clustering keys within the repair_run table May 17, 2017

michaelsembwever force-pushed the mck/cassandra-collapse-three-tables-to-repair_run branch from c05ff1c to d1267a5 Compare May 23, 2017 22:09

michaelsembwever added a commit that referenced this pull request May 23, 2017

In CassandraStorage implement segments as clustering keys within the …

d1267a5

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

michaelsembwever changed the title ~~WIP – In CassandraStorage implement segments as clustering keys within the repair_run table~~ In CassandraStorage implement segments as clustering keys within the repair_run table May 23, 2017

michaelsembwever commented May 23, 2017

View reviewed changes

adejanovski requested changes May 24, 2017

View reviewed changes

In CassandraStorage implement segments as clustering keys within the …

900426f

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

michaelsembwever force-pushed the mck/simplify-creating-repair-run-and-segments branch from 2a778bd to 25a3f9f Compare May 28, 2017 05:24

michaelsembwever force-pushed the mck/cassandra-collapse-three-tables-to-repair_run branch from d1267a5 to 900426f Compare May 28, 2017 05:25

adejanovski approved these changes May 29, 2017

View reviewed changes

adejanovski merged commit 79eae1c into mck/simplify-creating-repair-run-and-segments May 29, 2017

adejanovski pushed a commit that referenced this pull request May 29, 2017

In CassandraStorage implement segments as clustering keys within the …

b3af66c

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

michaelsembwever added a commit that referenced this pull request May 31, 2017

In CassandraStorage implement segments as clustering keys within the …

6e8fd29

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

adejanovski pushed a commit that referenced this pull request Jun 26, 2017

In CassandraStorage implement segments as clustering keys within the …

85ef9a2

…repair_run table. Change required in IStorage so to identify a segment both by runId and segmentId. ref: - #94 - #102

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In CassandraStorage implement segments as clustering keys within the repair_run table #102

In CassandraStorage implement segments as clustering keys within the repair_run table #102

michaelsembwever commented May 13, 2017 •

edited

Loading

michaelsembwever commented May 23, 2017

michaelsembwever May 23, 2017

adejanovski May 24, 2017

michaelsembwever May 24, 2017

adejanovski May 24, 2017

michaelsembwever May 24, 2017

adejanovski May 24, 2017

michaelsembwever commented May 29, 2017

michaelsembwever commented May 29, 2017

In CassandraStorage implement segments as clustering keys within the repair_run table #102

In CassandraStorage implement segments as clustering keys within the repair_run table #102

Conversation

michaelsembwever commented May 13, 2017 • edited Loading

michaelsembwever commented May 23, 2017

michaelsembwever May 23, 2017

Choose a reason for hiding this comment

adejanovski May 24, 2017

Choose a reason for hiding this comment

michaelsembwever May 24, 2017

Choose a reason for hiding this comment

adejanovski May 24, 2017

Choose a reason for hiding this comment

michaelsembwever May 24, 2017

Choose a reason for hiding this comment

adejanovski May 24, 2017

Choose a reason for hiding this comment

michaelsembwever commented May 29, 2017

michaelsembwever commented May 29, 2017

michaelsembwever commented May 13, 2017 •

edited

Loading