union(): Inconsistent results between runs #3053

SGStino · 2020-07-30T11:08:59Z

measure = "Tags.PLC_252.TANK1.STAPNR"
windowPeriod = 1s

previous = from(bucket: "Bucket")
  |> range(stop: v.timeRangeStart, start: 0) 
  |> filter(fn: (r) => r["_measurement"] == measure)
  |> filter(fn: (r) => r["_field"] == "value")    
  |> last()  
  
next = from(bucket: "Bucket")
  |> range(start: v.timeRangeStop) 
  |> filter(fn: (r) => r["_measurement"] == measure)
  |> filter(fn: (r) => r["_field"] == "value")  
  |> first() 

union(tables: [previous,  next])    
  |> yield()

Every time i execute the query with a fixed timeRange, i get a random result:

#group	false	false	true	true	false	false	true	true
#datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	long	string	string
#default	_result
	result	table	_start	_stop	_time	_value	_field	_measurement
		0	1970-01-01T00:00:00Z	2020-07-29T04:53:55Z	2020-07-29T04:52:52.044Z	10	value	Tags.PLC_252.TANK1.STAPNR

or

#group	false	false	true	true	false	false	true	true
#datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	long	string	string
#default	_result
	result	table	_start	_stop	_time	_value	_field	_measurement
		0	1970-01-01T00:00:00Z	2020-07-29T04:53:55Z	2020-07-29T04:52:52.044Z	10	value	Tags.PLC_252.TANK1.STAPNR
		1	2020-07-30T10:53:55.983Z	2020-07-30T11:04:28.372395611Z	2020-07-30T10:54:00.779Z	31	value	Tags.PLC_252.TANK1.STAPNR

or

#group	false	false	true	true	false	false	true	true
#datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	long	string	string
#default	_result
	result	table	_start	_stop	_time	_value	_field	_measurement
		0	2020-07-30T10:53:55.983Z	2020-07-30T11:05:45.208092895Z	2020-07-30T10:54:00.779Z	31	value	Tags.PLC_252.TANK1.STAPNR

This is on the currently live version of Influx Cloud

Executing the individual from() statements results in one value every single time

The text was updated successfully, but these errors were encountered:

SGStino · 2020-08-01T17:28:56Z

I've extracted the data for that measure with the data explorer as csv, if that might help reproducing it.
2020-08-01-19-25_chronograf_data.zip

jsternberg · 2021-01-13T15:19:27Z

Thank you for reporting this bug. I'll be taking a look at it.

This might be related to a bug in the planner. I thought it was fixed but I'll double check. It seems like this could happen if the transformation received the signals in different orders which is very plausible.

Process -> Process -> Finish -> Finish
Process -> Finish -> Process -> Finish

The table is supposed to get sent when the final finish is received regardless of the order, but it seems like it determines which parents have finished by the dataset id. If there's a conflict where both of the dataset ids are the same, then we might end up in a situation where the first finish marks the current transformation as finished.

These dataset ids are intended to be uuids. I expect this to be the case. But, looking at the code, those dataset ids being in conflict is the easiest explanation so I'm going to check that assumption about the dataset ids.

If this is the case, this may only happen in cloud when a pushdown happens. That may mean this won't occur when attempting to replicate it with csv.from().

jsternberg · 2021-01-13T15:51:00Z

I think the above is likely correct but I need to figure out why the plan node ids are the same since I thought we previously fixed that.

This link shows where the UUIDs are generated:

flux/execute/executor.go

Line 183 in f8c6fa0

ec.parents[i] = DatasetIDFromNodeID(pred.ID())

This link shows how those UUIDs are generated:

flux/execute/dataset.go

Line 65 in f8c6fa0

return DatasetID(uuid.NewV5(uuid.UUID{}, string(id)))

Based on the documentation, UUID v5 doesn't seem to have any random component. Testing that: https://play.golang.org/p/24I88SIF5y5

While the plan node ids are supposed to be different, it seems a problem that we are using UUID v5.

My proposal to fix this is:

Fix the node ids to be different instead of the same.
Stop using UUID v5 and switch to something like UUID v4 so the UUIDs are actually random as is expected.

@nathanielc @wolffcm does that sound like a good idea? I am not aware if there is a specific reason why UUID v5 was chosen or if it was just a misunderstanding of what that function did.

jsternberg · 2021-01-13T16:00:05Z

@SGStino looking into this the commit for fixing this was introduced on November 2. If you have time, can you retest this? I suspect it will work.

@nathanielc @wolffcm I still think we should consider using UUID v4 instead of v5, but I believe this bug shouldn't exist anymore.

wolffcm · 2021-01-14T15:50:41Z

@jsternberg I think I’m missing some context on that issue. Is it that the two inputs to the union have the same dataset ID because they both had the same plan node ID?

The plan node IDs should definitely be different, it could cause a lot of problems otherwise. That should be enforced reliably. That could be put somewhere around here where we validate the finished plan in validatePhysicalPlan:
https://github.com/influxdata/flux/blob/master/plan/physical.go#L87

I see that this PR was merged but it looks like it only fixes one instance of the problem:
#3297

I don't really think changing to use a random UUID for the dataset ID is needed if uniqueness of plan nodes is enforced. I think it's nice that the mapping from plan node ID to dataset ID is deterministic, but I don't feel that strongly about it.

jsternberg · 2021-01-14T16:57:48Z

I think we then need to enforce that plan nodes must be unique then. I'm concerned though that we'll introduce another piece of code that accidentally doesn't generate a unique plan node and then it will error with an opaque error message then since plan nodes aren't something that's exposed to the end user. But, at least it's better than potentially breaking behavior.

wolffcm · 2021-01-14T17:06:42Z

We could keep a hash set of IDs in the Plan itself and a have a helper function to generate unique IDs. It's been a while since I've been in that code, but it seems feasible?

jsternberg · 2021-01-14T17:19:23Z

I think we could do that. I'll dive into it and see what I can manage.

jsternberg · 2021-01-15T17:05:21Z

I'm not sure I'm familiar enough with the code to identify where the proper place for that would be. It seems like the node names are kept with the plan node itself rather than as part of the plan spec and it is hidden behind an interface. The plan spec would be the one to know whether node names conflicted with each other, but it doesn't really seem like it has a mechanism to rename nodes.

I'm going to close this issue and open a new one since the bug itself is fixed and we are just talking about potentially refactoring the code or adding an error message.

jsternberg · 2021-01-15T17:14:14Z

Fixed and will be followed up on in #3438.

SGStino changed the title ~~Inconsistent results between runs~~ union(): Inconsistent results between runs Jul 30, 2020

wolffcm added team/query kind/bug Something isn't working labels Jan 8, 2021

nathanielc added this to the Sprint 21-Q1-1 milestone Jan 11, 2021

jsternberg self-assigned this Jan 13, 2021

jsternberg mentioned this issue Jan 13, 2021

fix(execute): use random dataset ids instead of ones hashed from the name #3434

Closed

2 tasks

jsternberg mentioned this issue Jan 15, 2021

Make planner or executor more robust for duplicate node ids #3438

Closed

jsternberg closed this as completed Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

union(): Inconsistent results between runs #3053

union(): Inconsistent results between runs #3053

SGStino commented Jul 30, 2020 •

edited

Loading

SGStino commented Aug 1, 2020

jsternberg commented Jan 13, 2021

jsternberg commented Jan 13, 2021

jsternberg commented Jan 13, 2021

wolffcm commented Jan 14, 2021 •

edited

Loading

jsternberg commented Jan 14, 2021

wolffcm commented Jan 14, 2021

jsternberg commented Jan 14, 2021

jsternberg commented Jan 15, 2021

jsternberg commented Jan 15, 2021

union(): Inconsistent results between runs #3053

union(): Inconsistent results between runs #3053

Comments

SGStino commented Jul 30, 2020 • edited Loading

SGStino commented Aug 1, 2020

jsternberg commented Jan 13, 2021

jsternberg commented Jan 13, 2021

jsternberg commented Jan 13, 2021

wolffcm commented Jan 14, 2021 • edited Loading

jsternberg commented Jan 14, 2021

wolffcm commented Jan 14, 2021

jsternberg commented Jan 14, 2021

jsternberg commented Jan 15, 2021

jsternberg commented Jan 15, 2021

SGStino commented Jul 30, 2020 •

edited

Loading

wolffcm commented Jan 14, 2021 •

edited

Loading