[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

mengjinglei · 2015-11-17T05:04:28Z

In cluster mode, when the request data is not stored locally, local node will send a rpc to fetch data from a remote node. while such a query with aggregate functions like select max(value) from cpu where time > now() - 3h group by time(1h) returns all points with nil fields which is not the correct case. This issue is also applicable when retention replication factor is less than the number of nodes in cluster.

reproduce steps

for description convinience, we assume that we have three nodes namely node1:8086, node2:8086 and node3:8086

start node1: ./influxd
create database: curl -G http://node1:8086/query --data-urlencode "q=create database test"
insert one point: curl -i -XPOST 'http://node1:8086/write?db=test' --data-binary 'cpu value=1'
query node1: curl -G 'http://node1:8086/query?db=test&pretty=true' --data-urlencode "q=select max(value) from cpu where time > now() - 3h group by time(1h)"
response is:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            1
                        ]
                    ]
                }
            ]
        }
    ]
}

start node2 ./influxd -join http://node1:8088
start node3 ./influxd -join http://node1:8088 http://node2:8088
query node2 with sql: curl -G 'http://node2:8086/query?db=test&pretty=true' --data-urlencode "q=select max(value) from cpu where time > now() - 3h group by time(1h)"
response is:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            null
                        ]
                    ]
                }
            ]
        }
    ]
}

after apply pr #4815, the response is:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            null
                        ]
                    ]
                }
            ]
        }
    ]
}

we can see that, point cpu value=1 is stored in node1, when query node1, we can get the correct data, while query node2 with the same sql, the fields of values in result are all nil.

but query with mean aggregate function returns the correct result:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "mean"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            1
                        ]
                    ]
                }
            ]
        }
    ]
}

after investigation, I found that four functions bottom,min,max,top function work incorrectly.

The text was updated successfully, but these errors were encountered:

beckettsean · 2015-11-18T00:00:00Z

@mengjinglei in your example, you write the points to node 1 before nodes 2 and 3 are launched. If this is a new cluster, then node 1 had no concept of nodes 2 and 3 and therefore would not write data to them when they join the new cluster.

Can you confirm whether the cluster already existed? If not, this is expected behavior from the current clustering implementation. Prior points are not copied to new nodes in a cluster.

CrazyJvm · 2015-11-18T00:48:52Z

@beckettsean though node1 won't send data to node2 and node3, query(described by @mengjinglei for example) executed on node2 and node3 will pull data from node1. Moreover, functions like mean works pretty well. @mengjinglei is right I think , there's no max and min in https://github.com/influxdb/influxdb/blob/0.9.5/tsdb%2Ffunctions.go#L184-L245

beckettsean · 2015-11-18T01:06:05Z

@CrazyJvm that's a good point. The queries should be distributed to each node, and node 1 should be able to respond with the data it has.

beckettsean · 2015-11-18T01:16:47Z

@mjdesa can you try to repro? This might be an 0.9.5 blocker.

mengjinglei · 2015-11-18T01:40:41Z

pr #4817 fix Min and max

fix issue #4816 some aggregate function work incorrectly in cluster mode

jsternberg · 2016-05-18T02:24:39Z

Clustering is no longer supported in the open source version. Thank you.

beckettsean added area/functions category/clustering labels Nov 17, 2015

beckettsean added the blocking release label Nov 18, 2015

otoolep added a commit that referenced this issue Nov 21, 2015

Merge pull request #4817 from mengjinglei/fix-MinMax

3ae624e

fix issue #4816 some aggregate function work incorrectly in cluster mode

jsternberg closed this as completed May 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

mengjinglei commented Nov 17, 2015

beckettsean commented Nov 18, 2015

CrazyJvm commented Nov 18, 2015

beckettsean commented Nov 18, 2015

beckettsean commented Nov 18, 2015

mengjinglei commented Nov 18, 2015

jsternberg commented May 18, 2016

[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

Comments

mengjinglei commented Nov 17, 2015

reproduce steps

beckettsean commented Nov 18, 2015

CrazyJvm commented Nov 18, 2015

beckettsean commented Nov 18, 2015

beckettsean commented Nov 18, 2015

mengjinglei commented Nov 18, 2015

jsternberg commented May 18, 2016