-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Field type conflict blocks the output buffer #2245
Comments
I guess it would be too much against the design philosophy of Telegraf do bookkeeping of the (type of) measurements as they come it? Influx DB creates the measurement with a type when it sees it first. Telegraf could do the same, so it would be able to do type checking. |
@wiebeytec As of the current version Telegraf has no persistence so upon restart it would have no way to know what schema already exists in InfluxDB, unless it performs schema exploration. |
Telegraf can't track the types of points as they flow through the system, as this doesn't provide any guarantees with data that comes from a separate source anyways. I'm not sure the best way to handle this, maybe if InfluxDB fails the write then we should discard that batch (Influx does write the well-formed points even when it returns an error code). |
@sparrc we need to further test this, in my repro case once the buffer gets blocked due to the type mismatch no new points make it into InfluxDB. |
Actually, this is false. see influxdata/influxdb#4856 (comment) |
@phemmer, you're right, I thought that InfluxDB handled mismatched types the same as malformed points (it doesn't) I created an issue but it might be a dupe: influxdata/influxdb#7814 |
Unfortunately I don't think there is anything telegraf can do to fix this at the moment. I'd like to hear what other users think, but it might be best to simply drop batches when receiving a 400. This could lead to dropped metrics in the case of mismatched types, but the only alternative is to let the mismatched point get stuck in the buffers, which can only be recovered from by restarting telegraf (and thus dropping even more points). as @phemmer mentioned, the only sure-fire workaround for now will be to use 1-metric batch sizes until influxdata/influxdb#7814 is fixed. |
+1 on waiting for Influxdb issue 7814 to be fixed and then go with dropping the points in a batch that receives a 400 |
If we write a batch of points and get a "field type conflict" error message in return, we should drop the entire batch of points because this indicates that one or more points have a type that doesnt match the database. These errors will never go away on their own, and InfluxDB will successfully write the points that dont have a conflict. closes #2245
I have a fix for this at #2311. One caveat is that this fix will only work in combination with InfluxDB version 1.2+ |
If we write a batch of points and get a "field type conflict" error message in return, we should drop the entire batch of points because this indicates that one or more points have a type that doesnt match the database. These errors will never go away on their own, and InfluxDB will successfully write the points that dont have a conflict. closes #2245
If we write a batch of points and get a "field type conflict" error message in return, we should drop the entire batch of points because this indicates that one or more points have a type that doesnt match the database. These errors will never go away on their own, and InfluxDB will successfully write the points that dont have a conflict. closes #2245
If we write a batch of points and get a "field type conflict" error message in return, we should drop the entire batch of points because this indicates that one or more points have a type that doesnt match the database. These errors will never go away on their own, and InfluxDB will successfully write the points that dont have a conflict. closes influxdata#2245
If we write a batch of points and get a "field type conflict" error message in return, we should drop the entire batch of points because this indicates that one or more points have a type that doesnt match the database. These errors will never go away on their own, and InfluxDB will successfully write the points that dont have a conflict. closes #2245
High level description: A type mismatch for a single point totally blocks the output buffer for outputs.influxdb.
Version tested: v1.1.2
input plugins: any
output plugins: outputs.influxdb in HTTP mode
Conditions:
A field key for a measurement already exists with a defined context type (e.g. float) in the backend InfluxDB
A metric arrives to Telegraf for that measurement & field with a different context type (e.g. int)
Issue:
The output queue gets stuck due to type mismatch. Telegraf indefinitely retries to write the mismatching point and does not flush the output buffer at all. Any following points are stacked up and are not written to InfluxDB.
e.g., using the Telegraf inputs.http_listener:
Result: Output buffer fullness increases indefinitely.
Sample config used for the above test:
Proposal: add a max-attempts parameter for type conflicts to avoid indefinitely blocking the buffer.
The text was updated successfully, but these errors were encountered: