-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more efficient columnar deserialization #716
more efficient columnar deserialization #716
Conversation
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
test script: // SPDX-FileCopyrightText: Contributors to the Power Grid Model project <powergridmodel@lfenergy.org>
//
// SPDX-License-Identifier: MPL-2.0
#include <power_grid_model/auxiliary/input.hpp>
#include <power_grid_model/auxiliary/meta_data_gen.hpp>
#include <power_grid_model/auxiliary/serialization/deserializer.hpp>
#include <power_grid_model/auxiliary/update.hpp>
#include <fstream>
#include <sstream>
namespace {
using namespace power_grid_model;
using namespace power_grid_model::meta_data;
} // namespace
int main() {
std::vector<char> serialized_data = [] {
using namespace std::string_view_literals;
constexpr auto file_path = "<msgpack_data>";
std::vector<char> result(std::filesystem::file_size(file_path));
std::ifstream f{file_path, std::ios::binary};
f.read(result.data(), result.size());
return result;
}();
auto deserializer = Deserializer{from_msgpack, serialized_data, meta_data_gen::meta_data};
auto& dataset = deserializer.get_dataset_info();
auto const& info = dataset.get_description();
std::vector<std::vector<std::byte>> row_buffers{};
std::vector<std::vector<std::vector<std::byte>>> column_buffers{};
auto const n_components = meta_data_gen::meta_data.get_dataset("update").n_components();
for (Idx idx = 0; idx < n_components; ++idx) {
auto const& meta_component = *info.component_info[idx].component;
auto const buffer_size = info.component_info[idx].total_elements * meta_component.size;
if (idx < n_components / 4) {
row_buffers.push_back(std::vector<std::byte>(buffer_size));
dataset.set_buffer(meta_component.name, nullptr, row_buffers.back().data());
} else if (idx < n_components / 2) {
auto& buffer = column_buffers.emplace_back();
for (auto const& meta_attribute : meta_component.attributes) {
buffer.push_back(std::vector<std::byte>(buffer_size));
dataset.add_attribute_buffer(meta_component.name, meta_attribute.name, buffer.back().data());
}
}
}
deserializer.parse();
return 0;
} |
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
...d_model_c/power_grid_model/include/power_grid_model/auxiliary/serialization/deserializer.hpp
Show resolved
Hide resolved
power_grid_model_c/power_grid_model/include/power_grid_model/auxiliary/serialization/common.hpp
Show resolved
Hide resolved
...d_model_c/power_grid_model/include/power_grid_model/auxiliary/serialization/deserializer.hpp
Show resolved
Hide resolved
This is a nice improvement. But I still need to understand why |
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Quality Gate passedIssues Measures |
cfr. offline discussions + benchmarking, |
Makes deserialization more efficient compared to #708 as follows:
WritableDataset
component buffer:row_based(*)
norcolumnar(*, with_attribute_buffers=true)
and therefore there's nothing to doThis provides a sustainable solution compared to the one proposed in #714
NOTE: the check
if none of the provided attribute buffers are present in the header
is done when reordering the attribute buffers and is reduced to a simple check whether it is empty. this is the best we can do because there's no more efficient way to read only a subset of the msgpack array while skipping all the rest