-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select a tag key when the tag and field conflict #6519
Comments
There was another syntax propsed as well which is modeled after Postgres's cast syntax. Solution 3
The first syntax may cause parsing trouble because '.' is allowed in field and tags name too. |
Ah, I'll add that syntax too. I personally don't like it because we're not doing a cast and it has some of the same problems solution 1 has, but it deserves to be on there anyway. |
I personally like syntax 3 ( e.g. |
@benbjohnson I just added the syntax and pointed out that syntax should probably be used for forcing a specific type when types conflict. I think that syntax fits a lot better in your example than as a specifier for tag vs field. |
I find the |
Also, the field types don't overlap with tags. You'd never specify a e.g.
|
Hm, that could work. We would then not include a |
I don't think the RPC concern is specific to the |
The sigil syntax is much easier to parse than the others so it can be embedded within the string without requiring any kind of special parser. The |
I prefer |
Updated solution 3 to include the information from what @benbjohnson and I discussed. |
There's still the problem of what to do about Maybe another way to make that work with the other syntaxes? |
We could still use |
Hm, I'm not really sure the best way to do that. It seems like we're putting too many features into one piece of syntax. It also seems like we're diverging from this being the cast operator. I dislike allowing Unless there's some precedent for using a cast operator with a wildcard, I don't think that makes sense and I'm not sure how useful the feature is. The rationale for allowing |
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `tag1::tag` to specify that a tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. This changes the RPC wire protocol and is not backwards compatible. Fixes #6519.
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `tag1::tag` to specify that a tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. This changes the RPC wire protocol and is not backwards compatible. Fixes #6519.
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. This changes the RPC wire protocol and is not backwards compatible. InfluxDB will be able to process old messages, but it will not be able to communicate with old coordinators. Fixes #6519.
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. This changes the RPC wire protocol and is not backwards compatible. InfluxDB will be able to process old messages, but it will not be able to communicate with old coordinators. Fixes #6519.
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. This changes the RPC wire protocol and is not backwards compatible. InfluxDB will be able to process old messages, but it will not be able to communicate with old coordinators. Fixes #6519.
Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. Fixes #6519.
Hi. For me, as a user, the idea of a Also it would be nice if selecting all fields or all tags, or both (grouped separately as fields and tags) were possible - and it would be consistent with the way the data is written. ( comment copied from the docs repo as advised by @beckettsean ) |
Hello, @jsternberg Has this feature been provided in InfluxDB 1.2? I tried but it threw syntax error
|
Problem
This is a single issue to discuss #4630.
When points are written to the same measurement where a tag key and a field key conflict with each other, it is impossible to query the tag key with InfluxQL. The following examples (rendered to different measurements to demonstrate these are different examples) leaves the tag not queryable.
Example 1
Example 2
The following queries become ambiguous and cannot be resolved correctly, so they default to selecting the field rather than the tag.
Background
There is a bit of a debate on exactly what a tag is. Before talking about what a tag is, here is a list of exactly what capabilities a tag has:
Since a tag is associated with the series and not the individual point, tags do not have a time associated with them. This makes it impossible to query for only a tag in a measurement since a select works by finding the time associated with a field and returning the value at that time. Since tags are not associated with points, they also do not have a time and cannot be iterated over. To query a tag with a select, they are wholly dependent on being queried with a field in the same query.
There is also no way to differentiate between a field and a tag in InfluxQL at the moment. The search path for how to treat a variable reference is:
Because of the first step, it makes it impossible to reference a tag once any series in a measurement writes uses a key as a field even if there was previous data with that tag.
Possible Solutions
Solution 1
Change the prioritization so it will read a tag over a field (if present) and restrict points from being written with a conflicting field and tag key as described and implemented in #6410.
This solution makes it so example 1 is possible and works correctly when querying so that the first series will return the value of the tag and the second will return the value in the field. It makes example 2 into an error so it becomes impossible to get into this situation to begin with.
Pros:
SELECT value, host FROM cpu GROUP BY host
use a consistent value (currently it will use the tag for theGROUP BY
and the field for the actual selection).Cons:
SELECT host FROM example1
you will get a weird result where it will only return the second point.Solution 2
Introduce new syntax to specify whether you meant to grab a tag or a field to InfluxQL as described and implemented in #6509 or as proposed in #4823.
This solution makes it so both examples are queryable and doesn't require any change to existing data or the point writer, but does require users to use new syntax to reference the correct value.
Pros:
Cons:
host
as a tag and then write it once as a field to one series in the measurement, all of your previous queries will break. It is much more optimal to require that the new syntax is always used for tags, but then we're not backwards compatible anymore.Proposed Syntax
Solution 1
Prepend the variable reference with
tag
orfield
.The syntax for this looks pretty ugly in my personal opinion and I dislike that it adds more reserved keywords to a language that already has a lot of them. It becomes impossible to have a key with these names which makes this a backwards incompatible change. We then also might have to add code that strips the front of the variable reference from the ident as the query engine passes around auxiliary fields as a raw string. That means the identifier cannot just be stripped at parse time since we need to send it to the underlying engine which doesn't accept the AST structs as arguments. We can modify the iterator options struct to include references for the auxiliary fields, but this changes the wire protocol for RPC and might make things more complicated with protobuf.
Solution 2
Prepend the input with a special sigil to signify that a tag is being referenced. The syntax for this is common in programming languages like Perl and Ruby. Some programmers may be offput by the syntax and so it might not be desirable (Perl and Ruby's use of sigils is a holy war among programmers). The positive benefit is that it is one character (very easy to check and strip efficiently) and so it is easier to pass around as a string without involving complex AST structures. This is also a backwards compatible change since the
@
symbol is not used like this anywhere else.What sigil is used specifically is debatable. I would like to keep
$
reserved for future InfluxQL support for a Template node in the AST since I think this will make some things, like Chronograf or Influx Stress, easier to implement using the influxql package. I also avoided#
since that's a comment character even though I was very tempted to use it because tag and hashtag both reference tags (even if its real name is the pound sign).Solution 3
This syntax alleviates one of the problems from solution 1 because it is less likely to conflict with a real measurement name, but it still may be difficult to strip off the end in an efficient manner when passing it through auxiliary fields. Likely, the best solution would be to change the function signature of IteratorOptions and just accept the wire protocol will change. The syntax is supposed to correspond to the casting syntax from PostreSQL, but we're not technically doing a cast, but a selection. While true, we can extend this to also allow specifying the field type of a field. If a field type is given, that will be considered a cast for a field and will override the default field type chosen. Tag would be treated as a separate type.
The text was updated successfully, but these errors were encountered: