Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

fix duration vars processing and error handling in cass idx #1141

Merged
merged 13 commits into from
Dec 14, 2018

Conversation

robert-milan
Copy link
Contributor

@robert-milan robert-milan commented Nov 14, 2018

add err handling for 0 value durations in cass idx
add ability to implement custom validation for config values in cass idx
add config.go for cass idx configuration
update cass idx tests to use new config
update metrictank config and docs to show valid time units

update mt-index-cat to use new cass idx config
update mt-whisper-importer-writer to use new cass idx config

For earlier discussions on the previous PR to address this issue please see #1098

See also: #944, grafana/globalconf#3

Copy link
Member

@woodsaj woodsaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dieterbe
Copy link
Contributor

I think you forgot ConfigProcess in some of the tools?

@robert-milan
Copy link
Contributor Author

cassandra.New(cassandra.CliConfig) will call cfg.Validate() which is what ConfigProcess is calling, so I think we should be fine in the tools.

I believe ConfigProcess only needs to be called when there is a chance that IdxConfig has been modified after it was created, since that is where the initial validation is performed.

disableInitialHostLookup bool
}

const timeUnits = "Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'"
Copy link
Contributor

@Dieterbe Dieterbe Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a file, please put first global consts and vars, then types followed first by their constructor, then by their methods

@Dieterbe
Copy link
Contributor

I believe ConfigProcess only needs to be called when there is a chance that IdxConfig has been modified after it was created, since that is where the initial validation is performed.

both mt-index-cat and mt-whisper-importer-writer call cassandra.ConfigSetup(), allowing the configuration to be altered. neither of them calls cassandra.ConfigProcess()

cassandra.New(cassandra.CliConfig) will call cfg.Validate() which is what ConfigProcess is calling, so I think we should be fine in the tools.

seems like you're contradicting your other statement where you said ConfigProcess should be called if IdxConfig can be modified. or am I missing something?
Either way, we should validate the configuration before calling cassandra.New because by then we may have initialized other stuff. case in point mt-whisper-importer-writer initializes the store first. generally all config validation should happen before we start initializing stuff.

}
updateInterval32 = uint32(updateInterval.Nanoseconds() / int64(time.Second))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove this here and move it to IdxConfig.Validate()
validation should validate the config. setting this property must not be forgotten, so it should not be in the Validate method (which is optional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate is not optional, it is called at the beginning of New. This way updateInterval is validated before it is attempted to be used in this assignment.

Copy link
Contributor

@Dieterbe Dieterbe Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but it smells like bad code hygiene and for someone reading the code, that flow is not obvious.
and the question remains, why is this line in validate if it has nothing to do with validation, and why is it removed from New() which seems like a better place to put it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I suppose it really doesn't have anything to do with validation itself, other than relying on validation to run before it is assigned. I will move it back to New() and adjust the comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On phone now but if that property is a derivative of the cfg that's set in New it should probably be a field of the index struct not of the config object

@robert-milan
Copy link
Contributor Author

I believe ConfigProcess only needs to be called when there is a chance that IdxConfig has been modified after it was created, since that is where the initial validation is performed.

both mt-index-cat and mt-whisper-importer-writer call cassandra.ConfigSetup(), allowing the configuration to be altered. neither of them calls cassandra.ConfigProcess()

cassandra.New(cassandra.CliConfig) will call cfg.Validate() which is what ConfigProcess is calling, so I think we should be fine in the tools.

seems like you're contradicting your other statement where you said ConfigProcess should be called if IdxConfig can be modified. or am I missing something?
Either way, we should validate the configuration before calling cassandra.New because by then we may have initialized other stuff. case in point mt-whisper-importer-writer initializes the store first. generally all config validation should happen before we start initializing stuff.

When cassandra.New is called it will Validate as ConfigProcess does and fail the same. It would be redundant to call it before cassandra.New, unless there is a specific index setting to validate that has been changed since it was initiated and that would affect the initialization of some other component. I was not very clear in my original reply. For the two tools affected by these changes we should be ok without calling ConfigProcess due to the ordering in their initialization sequences.

@Dieterbe
Copy link
Contributor

AFAIK, everywhere in the codebase we follow the pattern that all configuration is validated before we start initializing stuff and executing business logic. and I think it's a useful convention that we should consistently apply. if it means validating more then once, so be it.
I don't think a tool should initialize the cassandra store for example, start creating the schemas and tables, and only then abort because there is a config error.

}
if cfg.timeout == 0 {
return errors.New("timeout must be greater than 0. " + timeUnits)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these errors are be misleading when a user entered a non-zero value that didn't parse.
Perhaps reword as "timeout invalid. Must be greater than 0 and parseable with https://golang.org/pkg/time/#ParseDuration" or something

Copy link
Contributor

@Dieterbe Dieterbe Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we use flag.ExitOnError, that should work in theory, but in practice, it doesn't seem to
when I use update-interval = 4H I get:

metrictank_1_cced215491fa | 2018-11-21 05:58:48.740 [FATAL] cassandra-idx: Config validation error. updateInterval must be greater than 0. Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'

meaning we don't detect parse errors, and just pass the 0 value through. In this case we're lucky that we also do the != 0 check, but we don't do that everywhere.

unfortunately, we already knew this which is why it was brought up before: #944 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we still using github.com/rakyll/globalconf? we have our own fixed version github.com/grafana/globalconf

func (cfg *IdxConfig) Validate() error {
if cfg.updateInterval == 0 {
return errors.New("updateInterval must be greater than 0. " + timeUnits)
}
Copy link
Contributor

@Dieterbe Dieterbe Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't implement proper validation without stopping supporting a zero value? that seems rather sad.

if we're no longer allowing 0 for this value:

  1. the help message must be updated
  2. this is a breaking change and should be made much more explicit. please mention breaking changes clearly in the PR message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried out github.com/grafana/globalconf. We can still support zero values with it.

@@ -182,11 +181,13 @@ func main() {

cluster.Init("mt-whisper-importer-writer", gitHash, time.Now(), "http", int(80))

cassandra.ConfigProcess()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again. config validation (ConfigProcess()) should be before we start initalizing stuff. the sooner we can abort the program the better.

prune-interval = 3h
# synchronize index changes to cassandra. not all your nodes need to do this.
update-cassandra-index = true
#frequency at which we should update flush changes to cassandra. only relevant if update-cassandra-index is true.
#frequency at which we should update flush changes to cassandra. only relevant if update-cassandra-index is true. valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'. Setting to '0s' will cause instant updates.
Copy link
Contributor

@Dieterbe Dieterbe Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindof weird to tell users that they can use all these subsecond units but in New() round down to a round number of seconds. but it's fine i guess.
we could use https://github.com/raintank/dur instead which accepts units /sec/secs/second/seconds, m/min/mins/minute/minutes, h/hour/hours, d/day/days, w/week/weeks, mon/month/months, y/year/years but those high units are more than what we need. a side benefit would be though that we don't need to convert to uint32 anymore.
anyway, seems like more hassle than it's worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that is a little weird. Maybe we should specify that although you can use all of those time units, it will round to 1 second. Also, if the time they specify is less than 1 second that it will treat it like 0, thus causing instant updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have to do anything. our performance with value of 1s would be the same as it is with <1s since we don't accept messages more frequently than a second anyway.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 3, 2018

well, i'm glad we now finally meet the main objective, which is to bail out when a duration var cannot be parsed.
below are some tests i tried:

[cassandra-idx]
update-interval = 4H
...
metrictank_1_c9c35c4ad137 | failed to set update-interval
metrictank_1_c9c35c4ad137 | time: unknown unit H in duration 4H
docker-dev_metrictank_1_c9c35c4ad137 exited with code 2
[cassandra-idx]
prune-interval = die
...
metrictank_1_4f46ba0a7b69 | failed to set prune-interval
metrictank_1_4f46ba0a7b69 | time: invalid duration die
[cassandra-idx]
prune-interval = 3d
...
metrictank_1_d0ccd3ad938e | failed to set prune-interval
metrictank_1_d0ccd3ad938e | time: unknown unit d in duration 3d
[cassandra-idx]
timeout = 10d
...
metrictank_1_13dbbc0f7ce0 | failed to set timeout
metrictank_1_13dbbc0f7ce0 | time: unknown unit d in duration 10d
[bigtable-store]
# read timeout
read-timeout = 5S
...
metrictank_1_ff3b164d8a50 | failed to set read-timeout
metrictank_1_ff3b164d8a50 | time: unknown unit S in duration 5S
[bigtable-store]
write-timeout = foo
...
metrictank_1_555f79a0a8e3 | failed to set write-timeout
metrictank_1_555f79a0a8e3 | time: invalid duration foo
docker-dev_metrictank_1_555f79a0a8e3 exited with code 2
stats
+timeout = ye
...
metrictank_1_8b68411701d5 | failed to set timeout
metrictank_1_8b68411701d5 | time: invalid duration ye
docker-dev_metrictank_1_8b68411701d5 exited with code 2
[bigtable-idx]
update-interval = H
...
metrictank_1_4309892793b3 | failed to set update-interval
metrictank_1_4309892793b3 | time: invalid duration H
docker-dev_metrictank_1_4309892793b3 exited with code 2
[bigtable-idx]
prune-interval = a
...
metrictank_1_8579f921e24e | failed to set prune-interval
metrictank_1_8579f921e24e | time: invalid duration a
[cluster]
http-timeout = 1d
...
metrictank_1_6e399a769abe | failed to set http-timeout
metrictank_1_6e399a769abe | time: unknown unit d in duration 1d
[swim]
tcp-timeout = 1d
...
metrictank_1_6e769e4ab957 | failed to set tcp-timeout
metrictank_1_6e769e4ab957 | time: unknown unit d in duration 1d
[swim]
push-pull-interval = 30
...
metrictank_1_b4cad3349253 | failed to set push-pull-interval
metrictank_1_b4cad3349253 | time: missing unit in duration 30

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 3, 2018

I think this PR now finally meets the objectives..
Can you just fix up grafana/globalconf#3 and get it merged, then we'll pull it in here and merge this pr.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 4, 2018

oh, and address my previous comment of course about ConfigProcess...

@Dieterbe
Copy link
Contributor

beautiful..

metrictank_1   | failed to set cassandra-idx.update-interval: time: unknown unit H in duration 4H
docker-dev_metrictank_1 exited with code 2

robert-milan and others added 10 commits December 14, 2018 12:36
add ability to implement custom validation for config values in cass idx
add config.go for cass idx configuration
update cass idx tests to use new config
update metrictank config and docs to show valid time units

update mt-index-cat to use new cass idx config
update mt-whisper-importer-writer to use new cass idx config

See also: #944
update configuration files
update documentation
@Dieterbe
Copy link
Contributor

force pushing rebase on master to resolve some conflicts..

@Dieterbe Dieterbe force-pushed the fix-durationvars-v2 branch from e83045f to 995d817 Compare December 14, 2018 11:44
@Dieterbe Dieterbe merged commit 9624aa9 into master Dec 14, 2018
@Dieterbe Dieterbe added this to the 0.11.0 milestone Dec 14, 2018
@Dieterbe Dieterbe deleted the fix-durationvars-v2 branch March 27, 2019 21:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants