Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-series regex queries are too strict #1039

Closed
ross-nordstrom opened this issue Oct 16, 2014 · 11 comments
Closed

Cross-series regex queries are too strict #1039

ross-nordstrom opened this issue Oct 16, 2014 · 11 comments

Comments

@ross-nordstrom
Copy link

My use case is I have many multi-column series, all named xxx_yyy_zzz where xxx/yyy/zzz are references I use to index the series. The structure of these series may change over time as the metrics I store (columns) change or become obsolete. The result is some series may or may not have a given metric, e.g. MetricA.

What I want to do is find a summary of all of xxx's MetricA with a query like:

SELECT MEAN(MetricA) FROM /^xxx_.*?_.*?$/ WHERE time < t2 AND time > t1
==> Error! Field "MetricA" doesn't exist in series "xxx_y1y1y1_z1z1z1"

I think it would be reasonable to expect the query to simply return empty data for the series that don't have that field/column. An alternative, reasonable, behavior would be to "skip" those series and just return data for the series which have the desired field/column.

A workaround would be for me to change my data model and split it to be many more single-column series, each having the metric it's tracking in the series name (e.g. xxx_yyy_zzz_MetricA). This seems like a bit of a kludge to me though.

Thoughts? Does this feature exist but I just didn't see it in the documentation? Is it on the roadmap? Is there a deeper concept of "metadata" in Influx beyond just the series name.

@jvshahid
Copy link
Contributor

The way this query is handled is by iterating over all series names that match the regex. We can add a clause to the query language, e.g. select mean(foo) from /.*/ SKIP ERRORS... to tell InfluxDB to skip series level errors while executing the query.

@codyhanson
Copy link

If a series was skipped, would it not appear in the output at all? Or would there be some notion of being able to tell which series gave errors.

@jvshahid
Copy link
Contributor

I'm guessing it wouldn't exist at all, may be the fact that they were skipped will get logged

@ross-nordstrom
Copy link
Author

An example of what @codyhanson might be getting at.

Normal Query

SELECT * FROM /^demoSkip.*/

demoSkip_error

time sequence_number value
1 2598921390001 10

demoSkip_success

time sequence_number value
10 2598921410001 20
1 2598921400001 10

Query with Skip/Suppress Error

SELECT * FROM /^demoSkip.*/ WHERE time > 1 SKIP/SUPRESS ERROR

demoSkip_error

time error
null No data for query

demoSkip_success

time sequence_number value
10 2598921410001 20

@jvshahid
Copy link
Contributor

What's the use case for that kind of output ?

@ross-nordstrom
Copy link
Author

I want to get data for many series at a time, but the data may be sparse.

An example would be if I am recording response times for my servers by node, but the nodes may change or die at any given time, or simply not record any data during low-capacity periods of time.

In this example case, I might use a series_name syntax of app1_nodeUuid_responseTime and then query for the past hour across EVERYTHING that ever recorded responseTimes for app1, but I don't want the query to fail if some the nodes haven't been active recently.

# Pretend [[1-hour-ago]] is a valid timestamp
SELECT * FROM /^app1_.*?_responseTime$/ WHERE time > [[1-hour-ago]]

Fundamentally, you have this awesome feature in the ability to query across series using a Regex, but it's totally hamstrung by the fact that the entire query fails if a single series isn't as expected. How depressing is it if you are querying across 100 series and the whole thing barfs because 1 of those didn't have data at the time you care about?

Edit:

Regarding having an Error message, that would be a way for me to know (in the response time example) whether I'm not getting a response for a given series because:

  1. There's no data for that WHERE clause
  2. The desired column doesn't exist in that series (Imagine if I stored responseTime as a column in a multi-column series instead of as a generic value column in a single-column series)
  3. Other error cases I'm not thinking of?

@leesjensen
Copy link

I hit this same problem. It seems unexpected that no error is returned if a series doesn't exist in the queried timeframe, but it returns an error if a metric doesn't exist. I need to be able to add and remove columns as my system matures. It doesn't seem like an "error" that needs "suppressing". Just return none/null for the value just like is done when the series doesn't exist for the entire time frame.

I guess I could go to a single column per series as the OP proposed, but then I really begin to question what is the point of columns?

@leesjensen
Copy link

This seems to be worse than I previously thought. It is not using the timeframe when checking if the column exists.

Here is what I have observed:

  1. All queried series must have at least one entry for an explicitly queried column or you get an error.
  2. It doesn't matter if you are querying a timeframe where only series with the column will return results. If any series doesn't have the column you get an error.
  3. If you use a * instead of a explicit column then it will return none values for the missing columns.

Because * works properly it seems like this is just a bug. If it can handle it when all columns are selected then it should be able to handle it when a specific column is selected.

@inthecloud247
Copy link

+1 agree with @leesjensen on this.

I'm definitely finding this error behavior to be unexpected and frustrating.

edit: I like the suggestion "Just return none/null for the value just like is done when the series doesn't exist for the entire time frame."

@inthecloud247
Copy link

Hm I wonder if this is now fixed in the current 0.9.0 rc release:

#1813

@languitar
Copy link

I also desperately need a solutuion for this. Having a system evolve constantly affects the fields contained in series. Some way to still get around this is needed very much for regex queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants