-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
union(): Inconsistent results between runs #3053
Comments
I've extracted the data for that measure with the data explorer as csv, if that might help reproducing it. |
Thank you for reporting this bug. I'll be taking a look at it. This might be related to a bug in the planner. I thought it was fixed but I'll double check. It seems like this could happen if the transformation received the signals in different orders which is very plausible.
The table is supposed to get sent when the final finish is received regardless of the order, but it seems like it determines which parents have finished by the dataset id. If there's a conflict where both of the dataset ids are the same, then we might end up in a situation where the first finish marks the current transformation as finished. These dataset ids are intended to be uuids. I expect this to be the case. But, looking at the code, those dataset ids being in conflict is the easiest explanation so I'm going to check that assumption about the dataset ids. If this is the case, this may only happen in cloud when a pushdown happens. That may mean this won't occur when attempting to replicate it with |
I think the above is likely correct but I need to figure out why the plan node ids are the same since I thought we previously fixed that. This link shows where the UUIDs are generated: Line 183 in f8c6fa0
This link shows how those UUIDs are generated: Line 65 in f8c6fa0
Based on the documentation, UUID v5 doesn't seem to have any random component. Testing that: https://play.golang.org/p/24I88SIF5y5 While the plan node ids are supposed to be different, it seems a problem that we are using UUID v5. My proposal to fix this is:
@nathanielc @wolffcm does that sound like a good idea? I am not aware if there is a specific reason why UUID v5 was chosen or if it was just a misunderstanding of what that function did. |
@SGStino looking into this the commit for fixing this was introduced on November 2. If you have time, can you retest this? I suspect it will work. @nathanielc @wolffcm I still think we should consider using UUID v4 instead of v5, but I believe this bug shouldn't exist anymore. |
@jsternberg I think I’m missing some context on that issue. Is it that the two inputs to the union have the same dataset ID because they both had the same plan node ID? The plan node IDs should definitely be different, it could cause a lot of problems otherwise. That should be enforced reliably. That could be put somewhere around here where we validate the finished plan in I see that this PR was merged but it looks like it only fixes one instance of the problem: I don't really think changing to use a random UUID for the dataset ID is needed if uniqueness of plan nodes is enforced. I think it's nice that the mapping from plan node ID to dataset ID is deterministic, but I don't feel that strongly about it. |
I think we then need to enforce that plan nodes must be unique then. I'm concerned though that we'll introduce another piece of code that accidentally doesn't generate a unique plan node and then it will error with an opaque error message then since plan nodes aren't something that's exposed to the end user. But, at least it's better than potentially breaking behavior. |
We could keep a hash set of IDs in the |
I think we could do that. I'll dive into it and see what I can manage. |
I'm not sure I'm familiar enough with the code to identify where the proper place for that would be. It seems like the node names are kept with the plan node itself rather than as part of the plan spec and it is hidden behind an interface. The plan spec would be the one to know whether node names conflicted with each other, but it doesn't really seem like it has a mechanism to rename nodes. I'm going to close this issue and open a new one since the bug itself is fixed and we are just talking about potentially refactoring the code or adding an error message. |
Fixed and will be followed up on in #3438. |
Every time i execute the query with a fixed timeRange, i get a random result:
or
or
This is on the currently live version of Influx Cloud
Executing the individual from() statements results in one value every single time
The text was updated successfully, but these errors were encountered: