Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature / Add columnar data support in main model #712

Merged
merged 12 commits into from
Sep 10, 2024

Conversation

Jerry-Jinfeng-Guo
Copy link
Contributor

@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo commented Sep 5, 2024

  • input
  • output
  • update
  • integration test
    data id field inference

mgovers and others added 2 commits September 5, 2024 13:30
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Jerry Guo <Jerry.Jinfeng.Guo@alliander.com>
@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo self-assigned this Sep 5, 2024
@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo added the feature New feature or request label Sep 5, 2024
@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo marked this pull request as draft September 5, 2024 13:55
@TonyXiang8787
Copy link
Member

@Jerry-Jinfeng-Guo @mgovers @figueroa1395, from the user's perspective, the user would definitely like to provide a columnar batch dataset in a way that the id is not provided for a certain component. In that case, it should be inferred that the elements where attributes are to be updated via columnar buffer are in the exact same sequence of the input data. This is a realistic use-case and will be appreciated by the user, to save the additional step to just assign the exactly the same id as in the input data. The following Python code should work:

model = PowerGridModel(input_data=input_data)
result = model.calculate_power_flow(
    update_data={
        "sym_load": {"p_specified": np.random.randn(n_step, n_sym_load)}
    }
)

In the main core, we need to have special treatment in is_update_independent to make this work:

  1. is_update_independent should be per component instead of the whole dataset. So we can allow individual sequence for each component.
  2. For a certain component, if the buffer is row-based, we use the same logic as is now.
  3. For a certain component, if the buffer is columnar, we do the following:
    1. If id attribute buffer is provided, we look at id to judge if the component is independent or not. We do not need to create proxy stuff which is waste of time. Just directly look at id buffer.
    2. If id attribute buffer is not provided:
      1. If the buffer is not uniform, or the buffer is uniform but elements_per_scenario is not the same as the number of elements in the input data (in the model). An error should be raised.
      2. If the above check passes, we assume the component buffer is independent. And we generate a sequence from 0 to n_comp for this component. This will be consumed by the update function so the update function does not do id lookup.

Signed-off-by: Jerry Guo <Jerry.Jinfeng.Guo@alliander.com>
Signed-off-by: Jerry Guo <Jerry.Jinfeng.Guo@alliander.com>
@mgovers
Copy link
Member

mgovers commented Sep 5, 2024

@Jerry-Jinfeng-Guo @mgovers @figueroa1395, from the user's perspective, the user would definitely like to provide a columnar batch dataset in a way that the id is not provided for a certain component. In that case, it should be inferred that the elements where attributes are to be updated via columnar buffer are in the exact same sequence of the input data. This is a realistic use-case and will be appreciated by the user, to save the additional step to just assign the exactly the same id as in the input data. The following Python code should work:

model = PowerGridModel(input_data=input_data)
result = model.calculate_power_flow(
    update_data={
        "sym_load": {"p_specified": np.random.randn(n_step, n_sym_load)}
    }
)

In the main core, we need to have special treatment in is_update_independent to make this work:

  1. is_update_independent should be per component instead of the whole dataset. So we can allow individual sequence for each component.

  2. For a certain component, if the buffer is row-based, we use the same logic as is now.

  3. For a certain component, if the buffer is columnar, we do the following:

    1. If id attribute buffer is provided, we look at id to judge if the component is independent or not. We do not need to create proxy stuff which is waste of time. Just directly look at id buffer.

    2. If id attribute buffer is not provided:

      1. If the buffer is not uniform, or the buffer is uniform but elements_per_scenario is not the same as the number of elements in the input data (in the model). An error should be raised.
      2. If the above check passes, we assume the component buffer is independent. And we generate a sequence from 0 to n_comp for this component. This will be consumed by the update function so the update function does not do id lookup.

I do think it's very valuable, but I also believe that it's a separate feature request and definitely out of scope of this PR

Signed-off-by: Jerry Guo <Jerry.Jinfeng.Guo@alliander.com>
@TonyXiang8787
Copy link
Member

@Jerry-Jinfeng-Guo @mgovers @figueroa1395, from the user's perspective, the user would definitely like to provide a columnar batch dataset in a way that the id is not provided for a certain component. In that case,
...

I do think it's very valuable, but I also believe that it's a separate feature request and definitely out of scope of this PR

I agree to put this as a separate PR. But this feature needs to be part of the release 1.10.0, including its documentation and examples. I have now edited in #548 as step 7 of the task.

@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo changed the title Add columnar data support in main model Feature / Add columnar data support in main model Sep 6, 2024
Jerry-Jinfeng-Guo and others added 4 commits September 6, 2024 10:40
Signed-off-by: Jerry Guo <Jerry.Jinfeng.Guo@alliander.com>
…lumnar-data

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
@mgovers mgovers marked this pull request as ready for review September 9, 2024 13:26
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
@mgovers mgovers enabled auto-merge September 10, 2024 06:02
Copy link
Contributor

@figueroa1395 figueroa1395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I have a final question to get the full picture better:

Was the introduction of all iterator_like stuff necessary because the columnar stuff was implemented using boost iterator stuff? And since this boost iterator stuff doesn't fully support all the std::iterator concepts, then you had to mock it? And that is the reason why extending the use of columnar in main model was not trivial?

Copy link
Contributor Author

@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the subsequent changes make sense to me. I can not approve this PR since I created it. Could someone else give the approval if it looks good to you as well? @TonyXiang8787 @figueroa1395 @nitbharambe

@mgovers
Copy link
Member

mgovers commented Sep 10, 2024

Looks good to me, but I have a final question to get the full picture better:

Was the introduction of all iterator_like stuff necessary because the columnar stuff was implemented using boost iterator stuff? And since this boost iterator stuff doesn't fully support all the std::iterator concepts, then you had to mock it? And that is the reason why extending the use of columnar in main model was not trivial?

yes indeed. The fundamental reason is that std::iterator requires you to return an exact reference, but a view only returns something that acts like a reference but is not necessarily one.

@Jerry-Jinfeng-Guo
Copy link
Contributor Author

Was the introduction of all iterator_like stuff necessary because the columnar stuff was implemented using boost iterator stuff?

Since @mgovers made that change, I could only answer from my understanding. Making it typename ForwardIterator is technically working but provides much less control for us than ideal (the real forward iterator). Having one temporary one in place made it easier to switch once in the future said functionality is fully supported by boost.

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
@figueroa1395
Copy link
Contributor

figueroa1395 commented Sep 10, 2024

Since @mgovers made that change, I could only answer from my understanding. Making it typename ForwardIterator is technically working but provides much less control for us than ideal (the real forward iterator). Having one temporary one in place made it easier to switch once in the future said functionality is fully supported by boost.

But would it work though? I think not, because the boost iterator stuff doesn't yet provide range access as required in the newly introduced concepts. Specifically this:

yes indeed. The fundamental reason is that std::iterator requires you to return an exact reference, but a view only returns something that acts like a reference but is not necessarily one.

Hence triggering the changes from std::convertible_to to detail::assignable_to

But indeed this change would be easy to change once boost does.

I think the subsequent changes make sense to me. I can not approve this PR since I created it. Could someone else give the approval if it looks good to you as well? @TonyXiang8787 @figueroa1395 @nitbharambe

I will give approval. Feel free to resolve remaining open conversations.

@mgovers
Copy link
Member

mgovers commented Sep 10, 2024

But would it work though? I think not, because the boost iterator stuff doesn't yet provide range access as required in the newly introduced concepts.

std::views::iota coming to the rescue for the IdxCount issue

Copy link

@mgovers mgovers added this pull request to the merge queue Sep 10, 2024
Merged via the queue into main with commit a05b88a Sep 10, 2024
26 checks passed
@mgovers mgovers deleted the feature/main-model-columnar-data branch September 10, 2024 09:59
@mgovers mgovers mentioned this pull request Nov 5, 2024
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants