Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Metadata of Tags #3445

Closed
KrishnaPG opened this issue Jul 23, 2015 · 9 comments
Closed

[Feature Request] Metadata of Tags #3445

KrishnaPG opened this issue Jul 23, 2015 · 9 comments

Comments

@KrishnaPG
Copy link

The features of tags looks promising. However as others indicated in this issue (#373) there are few questions / concerns - and appreciate it if someone could address these below:

The requirement is: consider the cpu series where each host has additional info (apart from its traditional name) as below:

  • ipv4 address
  • ipv6 addresses
  • the data center address (city / county / country so on..) etc.

Now, it is not clear how to specify this additional info for the host tag.

  • Obviously this info need to be set just once and we cannot and (will not perhaps) send it with every measurement

Using all these dependent info as separate tags will not be the right way. Since it is pure metadata about the tags.

We usually send only the hostname with every measurement (as show in the example here), but need to retain the meta info about that hostname tag somewhere in the db so that it can be accessed (e.g. present in the ui) when needed.

Right now, it looks like I have to use some other db (such as mysql) to store this meta info about the tags, which is not really good idea.

If some solutions already exists for this with in Influxdb itself, please share it. Else, it would be great if this situation is addressed.

@KrishnaPG
Copy link
Author

Another scenario forgot to mention in the above:

  • we store the measurements by, say hostname, but want to be able to make queries such as "average response time by city / county / datacenter" etc.

where the city/county/datacenteris the meta-data about the hostname tag.

@beckettsean
Copy link
Contributor

Now, it is not clear how to specify this additional info for the host tag.

You can apply multiple tags to a point, so insert the data in a pattern like:

cpu_series,host=hostname,ipv4=169.128.0.1,ipv6=0100::/64,city=london,country=uk load=12.2,temp=37.2 <timestamp>

Using all these dependent info as separate tags will not be the right way. Since it is pure metadata about the tags.

I don't understand why you think that's not correct. It's all metadata about the specific point being measured. Metadata goes into tags.

We usually send only the hostname with every measurement (as show in the example here), but need to retain the meta info about that hostname tag somewhere in the db so that it can be accessed (e.g. present in the ui) when needed.

If you want the metadata to be available for a point, you must write it directly to the point. InfluxDB does not support metadata that is not applied directly to a point. As you point out, that is more of a RDBMS function, and InfluxDB is intentionally specialized for the time series data challenges. InfluxDB is not a replacement for all database functions, merely an optimization for time series data.

Submitting all metadata with every point isn't hard if you use telegraf, which supports setting global tags and can easily run on each machine, and already reports many default devops metrics. Telegraf can be extended with plugins to report metrics for most processes, as well.

@KrishnaPG KrishnaPG changed the title Metadata vs Tags Metadata of Tags Jul 24, 2015
@KrishnaPG
Copy link
Author

Please do not close the issue. Here are the challenges with your approach:

  • Given that the primary interface of Influxdb is REST/HTTP api, I do not think supplying all the tag's meta-info always is really a wise idea. HTTP is already verbose enough for sensors. host=host1,ipv4=169.128.0.1,ipv6=0100::/64,city=london,country=uk
  • Event if it is, How would you address the change of meta-info about tags?  
    

For example, consider the host1 is moved from NY to CA city. Then are you suggesting one should go back and change/touch all the time-series records? Conceptually it is wrong, since time-series is supposed to be 'read-only' data. (If you do not change the old records, then your query results will be wrong !!).

  • And I do not think presently InfluxDB allows changing old tags either (which, BTW is right approach).

Tags are metadata about series. And this issue is about "metadata of tags" (and not "metadata of points"). They both cannot exists on the same plane.

The difference is, this metadata of tags can change independent of the underlying measurements / series.

I am not sure how telegraf does this (is it transparently supplying all the parameters always? and does it support changing the tags metainfo?). If it solved this problem, any docs/pointers in that direction would be of great help.

At one point or the other any time-series database, such as InfluxDB, should address this issue. Since this tags is still in its infancy, it would be great to consider this right now than later.

Re-writing old records is the approach typical NOSQL databases take (including ES). It is not necessarily valid concept for time-series db though.

@KrishnaPG KrishnaPG changed the title Metadata of Tags [Feature Request] Metadata of Tags Jul 24, 2015
@gunnaraasen
Copy link
Contributor

FWIW, moving a sensor to a new location should create a new series tagged with the new location. A query like SELECT value FROM sensors WHERE host = 'host1' would return the combined points for both locations. In that case the best way to change tag metadata is to create a new series and query both series.

@KrishnaPG
Copy link
Author

@gunnaraasen you are right if a sensor is being moved.

But if it is a machine (a host from one datacenter to another) that is being moved you would not want to lose the history, nor want to rewrite the queries to include both locations (since the old location has no significance on the measurements).

@desa
Copy link
Contributor

desa commented Jul 24, 2015

@KrishnaPG I've got an idea. Just send one point out of the batch that has extra fields as meta-data about your tags. For example:

insert meta,host=host1 value=1,host1_ca=1
insert meta,host=host1 value=10

Then when you query for meta, host1_ca will be available under the columns

{  
   "results":[  
      {  
         "series":[  
            {  
               "name":"meta",
               "tags":{  
                  "host":"host1"
               },
               "columns":[  
                  "time",
                  "host1_ca",
                  "value"
               ],
               "values":[  
                  [  
                     "2015-07-24T02:16:14.795796762Z",
                     1,
                     1
                  ],
                  [  
                     "2015-07-24T02:20:34.131083576Z",
                     null,
                     10
                  ]
               ]
            }
         ]
      }
   ]
}

@desa
Copy link
Contributor

desa commented Jul 24, 2015

You'd have to manage parsing the column name yourself and you cant update it, but it'll be there with every query and you wont have to pass the value along with each write.

@KrishnaPG
Copy link
Author

Thanks @mjdesa Yes, that approach looks interesting option.

@beckettsean
Copy link
Contributor

@KrishnaPG I can understand why you would want InfluxDB to manage all the metadata about all your samples, but the performance gains depend on not replicating every feature of a traditional RDBMS. Metadata about the metadata adds a whole new tabular structure and index to the database, as well as adding complexity and correctness checking during the write path. While those features might eventually make it into InfluxDB it won't be for a long time.

Meanwhile, using a traditional RDBMS is the better way to solve this issue. The data from the two can be joined client-side to represent the full picture.

I am not sure how telegraf does this (is it transparently supplying all the parameters always? and does it support changing the tags metainfo?).

Telegraf sends all metadata with every point. The tags sent can be changed by altering the telegraf config file, but there is no historical modification of data. Telegraf is a wrapper for efficiently injecting points into InfluxDB. It does nothing but smooth the write path.

And I do not think presently InfluxDB allows changing old tags either (which, BTW is right approach).

Correct. Old data can be overwritten, but this is assumed not to happen very often.

Given that the primary interface of Influxdb is REST/HTTP api, I do not think supplying all the tag's meta-info always is really a wise idea.

It's not RESTful, but it is HTTP. A binary protocol will come with the 1.x family to add performance but not 0.9. I'm not sure that adding another 30-50 bytes to each HTTP call will lead to significant issues. There's no change in the on-disk representation, it's just a bit more HTTP on the wire. The write performance of InfluxDB won't be significantly affected, and if you have sensors where sending an extra few bytes is painful, a general purpose TSDB optimized for ease of use might not be the right product. If you prefer, use UDP to eliminate the REQ and ACK traffic. It sounds like you really want a more tunable system, which InfluxDB will eventually become, but for now ease of use trumps edge case performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants