-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cassandra] Remove SASI index from dependencies table #793
Conversation
WITH compaction = { | ||
PRIMARY KEY (bucket, ts) | ||
) WITH CLUSTERING ORDER BY (ts DESC) | ||
AND compaction = { | ||
'min_threshold': '4', | ||
'max_threshold': '32', | ||
'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually delete SASI below
ts timestamp, | ||
ts_index timestamp, | ||
ts timestamp, | ||
date_bucket text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had buckets in a different format, this can be int
, or bigint
if we anticipate smaller buckets in the future.
ts timestamp, | ||
ts_index timestamp, | ||
ts timestamp, | ||
date_bucket text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's customary to list PK fields at the top
PRIMARY KEY (ts) | ||
) | ||
WITH compaction = { | ||
PRIMARY KEY (bucket, ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bucket/data_bucket
f07db20
to
5e58764
Compare
5e58764
to
ae7e823
Compare
- Use traditional indexes and a date bucket field instead - This allows people to use older versions of cassandra Signed-off-by: Prithvi Raj <p.r@uber.com>
ae7e823
to
a19184f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. You might want to run shellcheck on the bash script.
#!/usr/bin/env bash | ||
|
||
function usage { | ||
>&2 echo "Error: $1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a heredoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, let me try it out
@@ -93,6 +98,14 @@ func (s *DependencyStore) GetDependencies(endTs time.Time, lookback time.Duratio | |||
return mDependency, nil | |||
} | |||
|
|||
func getDepSelectString(startTs time.Time, endTs time.Time) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cassandra has no date range support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cassandra does not support range queries on partition keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that explains it.
confirm() { | ||
read -r -p "${1:-Are you sure? [y/N]} " response | ||
case "$response" in | ||
[yY][eE][sS]|[yY]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably just do if [[ "$(echo $response | tr 'a-z' 'A-Z')" =~ "Y(ES)?" ]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying it's necessarily easier, but why wouldn't it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I didn't spend time debugging it. Does it work for you?
bash-3.2$ response=Y && if [[ "$(echo $response | tr 'a-z' 'A-Z')" =~ "Y(ES)?" ]]; then echo "passed"; fi
bash-3.2$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think bash regex might just be awful.
Changes look fine, what's our strategy for rolling this out? We going to wait until a major release? If not, we probably want this to be backwards compatible. |
The strategy for rolling out is for people to stop the spark dependency job, which stops writes to the dependency table, run the migration script, and start a new version of the spark job. (I'm writing this up in a readme) |
I had an offline discussion with @black-adder and decided that it makes sense to add a new Keyspace to accomplish some backward compatibility. The new dependencies table will live in the new keyspace, so that jaeger-query instances that are not updated can still access old dependencies at the older location. |
I think I'd need more info. The main concern is data compatibility, not running outdated query - upgrading components is the easy part. |
The idea is the we will create a new Keyspace as follows: This keyspace contains the new dependency table. The migration path looks like the following:
The main benefit offered here is that dependencies will be available throughout the update/migration process because the old spark-dependencies job still writes in the old data format. |
|
||
cqlsh -e "CREATE TABLE $keyspace.dependencies ( | ||
ts timestamp, | ||
date_bucket bigint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition key first
Superseded by #1328 |
ref #790
Signed-off-by: Prithvi Raj p.r@uber.com