Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

Closed
svscorp opened this issue Apr 20, 2015 · 12 comments

Comments

@svscorp
Copy link

svscorp commented Apr 20, 2015

Hi,

I am facing a weird issue on 3-server-cluster setup (replication factor = 2), which might be related to #2272

I am having two measurements: memory and network_in (I have more, but let's just pock those two)

Query to the cluster:

curl -G http://influxdb:8086/query?pretty=true --data-urlencode "q=select mean(value) from memory where platform='p01'" --data-urlencode "db=stats"

answer:

{
    "results": [
        {
            "series": [
                {
                    "name": "memory",
                    "columns": [
                        "time",
                        "mean"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            546.1333333333333
                        ]
                    ]
                }
            ]
        }
    ]
}

Then, I am making same query

curl -G http://influxdb:8086/query?pretty=true --data-urlencode "q=select mean(value) from network_in where platform='p01'" --data-urlencode "db=stats"

And result is (with crashing a node):

<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

Is there something I am missing?

@svscorp
Copy link
Author

svscorp commented Apr 20, 2015

Update

Sometimes it returns this error (instead of Bad Gateway):

{
    "results": [
        {
            "error": "Post http://dbserver02:8086/data/run_mapper: EOF"
        }
    ]
}

If you are wondering, what are the values for 'network_in' measurement, same query with removing 'mean' keyword is giving:

{
    "results": [
        {
            "series": [
                {
                    "name": "network_in",
                    "columns": [
                        "time",
                        "value"
                    ],
                    "values": [
                        [
                            "2015-04-20T14:27:40Z",
                            "26517893"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "10174"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "76"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "10312"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "265208164"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "10378"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "228"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "10228"
                        ]
                    ]
                }
            ]
        }
    ]
}

Does it have something to do with the number type? (the assumption sounds wrong for that kind of "storage", but who knows)

@svscorp
Copy link
Author

svscorp commented Apr 20, 2015

Update 2:

I tried to store some DISK values, and selecting that is behaving same. Taking the fact of big numbers into account, I think it is related.

@svscorp svscorp changed the title Select 'mean' query for a one series works, for another - crashes the node [0.9.0-rc26] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node Apr 22, 2015
@svscorp
Copy link
Author

svscorp commented Apr 22, 2015

Update 3: 0.9.0-RC26, issue is still there

@neonstalwart
Copy link
Contributor

i'm guessing that calculating the mean is causing an overflow when calculating the sum of large numbers.

calculating the sum can be avoided as long as we have the current average and a count for each of the 2 means we are trying to merge.

average = average1 * (count1 / (count1 + count2)) + average2 * (count2 / (count1 + count2))

since count1 / (count1 + count2) and count2 / (count1 + count2) are each less than 1, the mean of 2 means should be able to be calculated without using numbers that are larger than the largest number in the 2 sample sets.

it's probably simple enough to update the mean calculations to use this method and avoid overflow. MapMean should return a count and an average using:

average += (value - average) / ++count

and then ReduceMean should merge these together using the first equation above.

@neonstalwart
Copy link
Contributor

@svscorp i can't reproduce your issue. do you have some curl commands to reproduce this from a clean db?

meanwhile, i may still go ahead and change the way mean is calculated but it doesn't seem to be the cause of your problem from what i can tell.

@svscorp
Copy link
Author

svscorp commented Apr 22, 2015

I'll give it when will reach my laptop. But it happenes when I add a graph in graphana. You can change it editing the query, but that's the default query if you use the US to add filters for a graph.

@svscorp
Copy link
Author

svscorp commented Apr 23, 2015

@neonstalwart

I was trying to reproduce it from the scratch and couldn't either. It is only reproducable when the data is being inserted with a scheduled script.

I'm now busy logging and making a valid case. Will get back once done.

Thanks for reaction, though!

@svscorp
Copy link
Author

svscorp commented Apr 23, 2015

@neonstalwart I got it again. To not overload this thread I've put it here: CURL sequence with queries

@neonstalwart
Copy link
Contributor

ok, i should have seen this sooner but it's a very subtle issue. your values are strings - e.g. "10228", "0.083334". you need to make them numbers - i.e. 10228, 0.083334.

once i had your sample data, i could see the error for myself

panic: interface conversion: interface is string, not float64

goroutine 119258 [running]:
github.com/influxdb/influxdb/influxql.MapMean(0x7fdcf623c688, 0xc2082f3760, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/functions.go:214 +0xbf
github.com/influxdb/influxdb.(*LocalMapper).NextInterval(0xc2082f3760, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tx.go:401 +0xd3
github.com/influxdb/influxdb/influxql.(*MapReduceJob).processAggregate(0xc2081c65b0, 0xc2085ec660, 0xac2db8, 0xc2083eeee0, 0x1, 0x1, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:626 +0x264
github.com/influxdb/influxdb/influxql.(*MapReduceJob).Execute(0xc2081c65b0, 0xc2085a1d40, 0xc208270200)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:167 +0x8bf
github.com/influxdb/influxdb/influxql.(*Executor).execute(0xc208355340, 0xc2085a1d40)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:774 +0xba
created by github.com/influxdb/influxdb/influxql.(*Executor).Execute
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:753 +0x5a

this made me realize that your values were strings. the fix is easy - make them numbers. https://gist.github.com/neonstalwart/00d106d8de7c9a960696/5e0a181eb83a649bbb34965dbf6e49347e2a7e29 is a reduced example that demonstrates the problem and https://gist.github.com/neonstalwart/00d106d8de7c9a960696 is the same example with the values changed to numbers and it works.

@otoolep
Copy link
Contributor

otoolep commented Apr 23, 2015

This is a known issue, which we will address before the 0.9.0 release.

#2299

@svscorp svscorp changed the title [0.9.0-rc26] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node [0.9.0-rc28] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node Apr 29, 2015
@svscorp svscorp changed the title [0.9.0-rc28] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node [0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. Apr 29, 2015
@svscorp
Copy link
Author

svscorp commented Apr 29, 2015

It works when sending values as floats

@svscorp svscorp closed this as completed Apr 29, 2015
@abhiofdoon
Copy link

I see Influxdb 0.9 crash on a simple select query as well.

curl -G 'http://localhost:8086/query' --data-urlencode "db=graphite" --data-urlencode "q=select * from "xxx-com.cloud.gauges.raw_storage_usage"" | python -m json.tool

The data in influxdb looks like this:
$ curl -G 'http://localhost:8086/query' --data-urlencode "db=graphite" --data-urlencode "q=select * from "xxx-com.cloud.gauges.raw_storage_usage" where time > now() - 5m"
{"results":[{"series":[{"name":"xxx-com.cloud.gauges.raw_storage_usage","columns":["time","value"],"values":[["2015-07-29T21:06:07Z",152506]]}]}]}

Here are the influxdb logs.

[http] 2015/07/29 20:59:10 127.0.0.1 - - [29/Jul/2015:20:59:10 +0000] GET /query?q=select+value+from+%22xxx-com.cloud.gauges.num_sessions%22+where+time+%3E+1435611550s+and+time+%3C+1438203551s&p=root&u=root&db=graphite HTTP/1.1 200 3198 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic a930fa47-3634-11e5-801c-000000000000 7.753755ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=show+series+from+%2F%5Exxx-com%5C.cloud%5C.gauges%5C.num_sessions%2F&p=root&u=root&db=graphite HTTP/1.1 200 122 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa5ba27d-3634-11e5-801d-000000000000 8.497784ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=show+series+from+%2F%5Exxx-com%5C.cloud%5C.gauges%5C.num_sessions%2F&p=root&u=root&db=graphite HTTP/1.1 200 122 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa5d9916-3634-11e5-801e-000000000000 13.248976ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=select+value+from+%22xxx-com.cloud.gauges.num_sessions%22+where+time+%3E+1435611552s+and+time+%3C+1438203553s&p=root&u=root&db=graphite HTTP/1.1 200 3198 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa603c90-3634-11e5-801f-000000000000 14.286768ms
panic: runtime error: slice bounds out of range

goroutine 3633 [running]:
github.com/influxdb/influxdb/tsdb.scanTagValue(0xc2086c0a00, 0x80, 0x80, 0x81, 0x103d, 0x80, 0xc2086c0a5d, 0x23)
/root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tsdb/points.go:436 +0x74
github.com/influxdb/influxdb/tsdb.(*point).Tags(0xc2086c0a80, 0xc2086c0a00)
/root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tsdb/points.go:559 +0x1bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants