Address code confusion in the lang package regarding node names #5021

jsternberg · 2022-07-25T19:22:48Z

In our code for the planner, we have some interactions that are less than ideal and can cause a great amount of confusion.

Some background is required as the problem only arises when you see the interaction between different components. There isn't a single component that is "the problem", but more how the components interact with each other.

Dataset ids are used in the code to determine the source from where data is coming from. We use this in multi-parent transformations such as join to determine where data came from to do things like properly joining the data. For single parent transformations, this information is generally disregarded as it isn't important. When we produce a plan, we include those dataset ids as they can help debug information about the plan.

To produce dataset ids, we take the plan node name and produce a hash. We do this for some consistency with the dataset ids so we can compare different plans which makes debugging a bit easier. We considered changing this to a random dataset id, but this was ultimately rejected. The PR is here but discussion about the implications was on private channels and I don't really know where that discussion is anyway.

This ultimately led to a problem when it came to planner rules. A planner rule would occasionally need to create a new node and it would need to choose a name. We had this happen and it generated two different plan nodes with identical names so the dataset ids were not unique. In order to have a valid plan, the node names also need to be unique. To address this, we added CreateUniquePhysicalNode to keep some state that would add a sequential number to the plan node name to keep things unique.

Unfortunately, this code path also stores this state in the context and requires that state to be injected into the context. Recently, we discovered that the repl was not properly injecting this state so CreateUniquePhysicalNode just didn't work in the repl. This is because it's injected in AstProgram which is an admittedly strange place to insert it.

This might just be a simple refactor to move where this happens. This might be refactoring this section of code to not use the context to store such important state. Something that probably shouldn't be optional. But, I also believe this brings up something that is generally more confusing. It can be unclear exactly where the entry point for invoking a flux program starts. We may want to also explore that and create new issues for refactoring this package in general. It is a very confusing package to follow the logic and has mostly existed the way it is for a long time without any modifications.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-03-29T01:54:15Z

This issue has had no recent activity and will be closed soon.

scbrickley mentioned this issue Jul 25, 2022

fix(repl): repl uses unique planner node ids #5022

Merged

github-actions bot added the no-issue-activity label Mar 29, 2023

github-actions bot closed this as completed Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address code confusion in the lang package regarding node names #5021

Address code confusion in the lang package regarding node names #5021

jsternberg commented Jul 25, 2022

github-actions bot commented Mar 29, 2023

Address code confusion in the lang package regarding node names #5021

Address code confusion in the lang package regarding node names #5021

Comments

jsternberg commented Jul 25, 2022

github-actions bot commented Mar 29, 2023