should not allow field and tag with same key #2615

beckettsean · 2015-05-19T21:57:37Z

We should raise an error if a field key has the same name as a tag key for a given measurement. Otherwise the behavior is unpredictable.

Inspiried by https://groups.google.com/d/msgid/influxdb/feca028b-7c62-439c-9d17-97b58edcee92%40googlegroups.com

corylanou · 2015-11-12T16:50:13Z

I need more context on how we want to fix this. There are several challenges for this depending on what the fix is.

First, let's examine the write endpoint to solve the problem.

If we check each point being written for duplicate tag/fields that will be fairly easy. While adding a little overhead, it should be fairly quick. Use of benchmarking could easily tell us what the trade off is.

However, it's important to understand that enforcing the write endpoint BEYOND the single point write is VERY costly, as you could have this scenario:

measurement,foo=bar value=12
measurement,value=bar foo=12

Because we can write any tag/field name on any insert, we would have to look up all the meta data for that set of keys/fields before accepting the write. Now, in the case where we are in the same shard (time index), this is less overhead, but in the case of truly making this correct, it would need to check EVERY shard that the measurement exists in across the cluster.

For the query endpoint, we have the same problem. To know if you have a duplicate field/tag value prior to issuing the query, we would have to check every shard where the time index of your query will hit. To be truly accurate, you would have to check every shard, otherwise sometimes the query may work, and sometimes not, as it may not run into a collision in one time range, but could in another.

So the only feasible fix imo is on the single point write, but beyond that the cost seems way to high.

/cc @pauldix @otoolep @beckettsean

otoolep · 2015-11-12T16:54:24Z

Agreed, this isn't really fixable for the reasons you outlined @corylanou -- there is no central store of tag and field metadata in our system.

pauldix · 2015-11-12T18:24:21Z

@corylanou agree, just disallow it if they're trying to do that in the same point.

beckettsean · 2015-11-12T18:57:25Z

@pauldix @corylanou @otoolep

Fixing it only for single point duplicates does very little to address the problem. It is still very easy for users to render their database unusable. If we can't prevent duplicate field and tag names, and that does seem to be a very hard challenge, then we must fix one or more of the following:

implement RENAME FIELD - #4156
OR
implement RENAME TAG - #4157

OR

make it possible to query tag and field of the same name - #4630
AND
make it possible to group by tag with field of the same name - #4217
AND
make it possible to use the tag or the field in the WHERE clause, explicitly declaring which is intended. (No issue filed yet)

So, either we make it possible to rename a tag or field, so that we can undo a bad situation, or we make it possible to explicitly declare in the query language which column to use when querying the duplicate names, so we can work around a bad situation.

Otherwise it is possible to make a single write to the database that causes huge chunks of data to become un-queryable forever, with no workaround or mitigation. That is not acceptable behavior for a database, regardless of the performance or architectural concerns involved.

otoolep · 2015-11-12T19:15:13Z

There is a third option -- allow users to specify up front what the schema is.

#3006

corylanou · 2015-11-17T16:55:11Z

I vote for what @beckettsean said about supporting RENAME FIELD and RENAME TAG. I don't think there is any real value in dup checking a single line for field/tag names. It adds unnecessary overhead to a very critical performance path and fixes only the furthest of edge case that we are really trying to solve for.

pauldix · 2015-11-18T01:26:15Z

rename isn't going to be tractable because those names existing in the underlying TSM files. It would necessitate rewriting every file across the entire database that has a reference to that name. We're unlikely to support renaming features for a long time.

Using INTO queries they could potentially query the old data and write into a new series, but that could be tricky too.

beckettsean · 2015-11-18T17:58:26Z

@pauldix I understand that none of the solutions are easy, but we have to do something that either prevents the issue from happening or allows for recovery after the fact. Otherwise it's a one way ticket to data loss.

I like @gunnaraasen's proposal in #4823, and that seems more tractable than either preventing duplicates or renaming them.

pauldix · 2015-11-18T22:21:03Z

@beckettsean yeah, I think that's the way to go. I'm going to close this out in favor of giving the user explicit controls at query time.

jsternberg · 2016-03-16T13:51:47Z

I should be able to modify the query engine to handle queries across fields and tags so that it looks correct, but the only requirement would be to prevent a single point from containing a duplicate. I can make this work:

cpu,host=server01 value=2
cpu host="server02",value=3

I don't think I can make this work:

cpu,host=server01 host="server01",value=2

Should I reopen this ticket to prevent points from doing the latter?

corylanou · 2016-03-16T15:16:42Z

Since you can't prevent it in all cases, I think the conclusion above is still the correct one, give them better query controls. It's a very poor design choice for the end user to have both tags and fields with the same name imo, but if we give them a way to query it then it's up to them.

pauldix · 2016-03-16T21:38:34Z

@corylanou yep agree. As long as it's queryable later, should be fine. Give users a length of rope and a knife to cut it ;)

shakefu · 2018-07-28T01:00:45Z

Is there any way to prevent this (perhaps a database configuration option) now on Influx 1.5.x or 1.6.x? We just had an issue with this that caused several hours of data loss.

Edit: The SHOW FIELD KEYS and SHOW TAG KEYS queries seem to indicate that there are now central metadata stores for the data points which could be used.

gunnaraasen · 2018-07-30T17:16:12Z

@shakefu InfluxDB will accept overlapping tag and field names for writes so this issue will not cause data loss. The tag and field name overlap only affects queries. You'll need to use the identifier::[tag|field] syntax described here to specifically select the tag or field you desire.

shakefu · 2018-08-01T21:55:11Z

@gunnaraasen Yes, I'm aware, but I want to prevent that - because if someone accidentally writes a field with the same name as a tag, it breaks all the queries that don't specify ::tag or ::field (also, Grafana, which we used extensively with InfluxDB, doesn't support ::tag or ::field identifiers with its InfluxDB plugin without manually editing the query.)

gunnaraasen · 2018-08-01T22:19:41Z

I recommend opening a feature request to add support for ::tag/::field in the Grafana query builder.

Preventing duplicate tag/field names would still require an unfeasible/inefficient check on every write even with the new index. We have no plans to add a setting to disallow these kinds of writes, but we are considering how this could be addressed with future API changes.

shakefu · 2018-08-03T22:51:04Z

@gunnaraasen I'm not aware of the internals of the new index - but if it's maintaining a list of the fields and tags, couldn't those be placed in a hash filter for an efficient check? Our issue goes beyond Grafana because we have tens of thousands of lines of code referencing field/tag names without qualifiers that we'd need to update.

Or perhaps a configurable whitelist/blacklist for fields or tags?

daliborfilus · 2022-03-29T00:27:23Z

I just stumbled upon this problem in one system I "inherited".
Some of our scripts insert tag keys and fields with the same key/name and I was in awe that it's even possible. (I always wondered why our queries returned "status,status_1" and the like.)

Anyway, I tried to fix it by removing the values. But with that fix, the queries weren't working correctly.
It was because that the tag values and field values existed with the same name, the inserts inserted now only to the tags (using line protocol), but the select query looks into fields first.

I'm sorry if it was already said somewhere, but... can I ask you why does it make more sense to filter by value first, when the same name exists as a tag? Searching via tags is faster, right?

select * from x where status = 'x';

takes the same time as

select * from x where status::field = 'x';

but

select * from x where status::tag = 'x';

returns instantly. That's how I discovered this behavior.

InfluxDB 1.8.9, x64, CentOS 7

beckettsean added this to the 0.9.0 milestone May 26, 2015

beckettsean added the area/queries label May 26, 2015

toddboom added the 1 - Ready label May 27, 2015

beckettsean mentioned this issue Jun 1, 2015

[feature request] make regex work on field keys and values #2715

Closed

beckettsean modified the milestones: 0.9.1, 0.9.0 Jun 16, 2015

beckettsean mentioned this issue Jun 24, 2015

command "select * from measurement" do not return all the columns #3045

Closed

toddboom modified the milestones: 0.9.1, 0.9.2 Jul 2, 2015

beckettsean modified the milestones: 0.9.4, 0.9.2 Jul 15, 2015

beckettsean mentioned this issue Sep 25, 2015

GROUP BY on tag does not work if there is a value with the same name. #4217

Closed

gunnaraasen mentioned this issue Oct 31, 2015

[0.9.4.2] A tag key that is the same as a field key cannot be queried directly #4630

Closed

beckettsean added the support label Nov 3, 2015

corylanou modified the milestones: 0.9.5, 0.9.4 Nov 16, 2015

corylanou added the blocking release label Nov 17, 2015

gunnaraasen mentioned this issue Nov 17, 2015

[feature request] Add InfluxQL syntax to explicitly declare an identifier as a tag or field key #4823

Closed

corylanou removed blocking release 1 - Ready labels Nov 18, 2015

pauldix closed this as completed Nov 18, 2015

jsternberg modified the milestones: 0.13.0, 0.9.5 Apr 14, 2016

gunnaraasen mentioned this issue Apr 18, 2016

Allow tags to be queried even when a field key exists with the same name #6410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should not allow field and tag with same key #2615

should not allow field and tag with same key #2615

beckettsean commented May 19, 2015

corylanou commented Nov 12, 2015

otoolep commented Nov 12, 2015

pauldix commented Nov 12, 2015

beckettsean commented Nov 12, 2015

otoolep commented Nov 12, 2015

corylanou commented Nov 17, 2015

pauldix commented Nov 18, 2015

beckettsean commented Nov 18, 2015

pauldix commented Nov 18, 2015

jsternberg commented Mar 16, 2016

corylanou commented Mar 16, 2016

pauldix commented Mar 16, 2016

shakefu commented Jul 28, 2018 •

edited

Loading

gunnaraasen commented Jul 30, 2018

shakefu commented Aug 1, 2018

gunnaraasen commented Aug 1, 2018

shakefu commented Aug 3, 2018

daliborfilus commented Mar 29, 2022

should not allow field and tag with same key #2615

should not allow field and tag with same key #2615

Comments

beckettsean commented May 19, 2015

corylanou commented Nov 12, 2015

otoolep commented Nov 12, 2015

pauldix commented Nov 12, 2015

beckettsean commented Nov 12, 2015

otoolep commented Nov 12, 2015

corylanou commented Nov 17, 2015

pauldix commented Nov 18, 2015

beckettsean commented Nov 18, 2015

pauldix commented Nov 18, 2015

jsternberg commented Mar 16, 2016

corylanou commented Mar 16, 2016

pauldix commented Mar 16, 2016

shakefu commented Jul 28, 2018 • edited Loading

gunnaraasen commented Jul 30, 2018

shakefu commented Aug 1, 2018

gunnaraasen commented Aug 1, 2018

shakefu commented Aug 3, 2018

daliborfilus commented Mar 29, 2022

shakefu commented Jul 28, 2018 •

edited

Loading