Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.5-nighly, cluster] some aggregate function work incorrectly in cluster mode #4816

Closed
mengjinglei opened this issue Nov 17, 2015 · 6 comments

Comments

@mengjinglei
Copy link
Contributor

In cluster mode, when the request data is not stored locally, local node will send a rpc to fetch data from a remote node. while such a query with aggregate functions like select max(value) from cpu where time > now() - 3h group by time(1h) returns all points with nil fields which is not the correct case. This issue is also applicable when retention replication factor is less than the number of nodes in cluster.

reproduce steps

for description convinience, we assume that we have three nodes namely node1:8086, node2:8086 and node3:8086

  1. start node1: ./influxd
  2. create database: curl -G http://node1:8086/query --data-urlencode "q=create database test"
  3. insert one point: curl -i -XPOST 'http://node1:8086/write?db=test' --data-binary 'cpu value=1'
  4. query node1: curl -G 'http://node1:8086/query?db=test&pretty=true' --data-urlencode "q=select max(value) from cpu where time > now() - 3h group by time(1h)"
    response is:
{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            1
                        ]
                    ]
                }
            ]
        }
    ]
}
  1. start node2 ./influxd -join http://node1:8088
  2. start node3 ./influxd -join http://node1:8088 http://node2:8088
  3. query node2 with sql: curl -G 'http://node2:8086/query?db=test&pretty=true' --data-urlencode "q=select max(value) from cpu where time > now() - 3h group by time(1h)"
    response is:
{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            null
                        ]
                    ]
                }
            ]
        }
    ]
}

after apply pr #4815, the response is:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "max"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            null
                        ]
                    ]
                }
            ]
        }
    ]
}

we can see that, point cpu value=1 is stored in node1, when query node1, we can get the correct data, while query node2 with the same sql, the fields of values in result are all nil.

but query with mean aggregate function returns the correct result:

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "mean"
                    ],
                    "values": [
                        [
                            "2015-11-17T00:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T01:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T02:00:00Z",
                            null
                        ],
                        [
                            "2015-11-17T03:00:00Z",
                            1
                        ]
                    ]
                }
            ]
        }
    ]
}

after investigation, I found that four functions bottom,min,max,top function work incorrectly.

@beckettsean
Copy link
Contributor

@mengjinglei in your example, you write the points to node 1 before nodes 2 and 3 are launched. If this is a new cluster, then node 1 had no concept of nodes 2 and 3 and therefore would not write data to them when they join the new cluster.

Can you confirm whether the cluster already existed? If not, this is expected behavior from the current clustering implementation. Prior points are not copied to new nodes in a cluster.

@CrazyJvm
Copy link
Contributor

@beckettsean though node1 won't send data to node2 and node3, query(described by @mengjinglei for example) executed on node2 and node3 will pull data from node1. Moreover, functions like mean works pretty well. @mengjinglei is right I think , there's no max and min in https://github.com/influxdb/influxdb/blob/0.9.5/tsdb%2Ffunctions.go#L184-L245

@beckettsean
Copy link
Contributor

@CrazyJvm that's a good point. The queries should be distributed to each node, and node 1 should be able to respond with the data it has.

@beckettsean
Copy link
Contributor

@mjdesa can you try to repro? This might be an 0.9.5 blocker.

@mengjinglei
Copy link
Contributor Author

pr #4817 fix Min and max

otoolep added a commit that referenced this issue Nov 21, 2015
fix issue #4816 some aggregate function work incorrectly in cluster mode
@jsternberg
Copy link
Contributor

Clustering is no longer supported in the open source version. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants