Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we deserialize automagically with simdjson::ondemand #29

Open
lemire opened this issue Jul 9, 2024 · 11 comments
Open

Can we deserialize automagically with simdjson::ondemand #29

lemire opened this issue Jul 9, 2024 · 11 comments
Assignees

Comments

@lemire
Copy link
Member

lemire commented Jul 9, 2024

The simdjson::ondemand API is begging for reflection-based deserialization.

We have this ugly approach:

https://github.com/simdjson/simdjson/blob/master/doc/basics.md#adding-support-for-custom-types

It works, but it requires too much work from our users. It should be automagical. :-)

@FranciscoThiesen
Copy link
Member

Will start investigating this a bit. Will keep updates posted here (@lemire)

@FranciscoThiesen
Copy link
Member

Will open a branch on https://github.com/simdjson/simdjson for it

@lemire lemire mentioned this issue Jul 16, 2024
@FranciscoThiesen
Copy link
Member

Let's keep this open and when we have the official C++26 spec and compiler support we can add it to simdjson, using the version check as @lemire recently used for leveraging concepts for having serialization of vector

@lemire
Copy link
Member Author

lemire commented Sep 21, 2024

The extractor syntax should help....

simdjson/simdjson#2247

It will be interesting to find out how it interacts with C++ reflection.

@the-moisrex

@the-moisrex
Copy link

the-moisrex commented Sep 21, 2024

The extractor syntax should help....

simdjson/simdjson#2247

It will be interesting to find out how it interacts with C++ reflection.

@lemire

Extract should work without modification.

In C++26, we could simply, instead of blowing up in the user's face with static_assert(false, ...) in ::get(T& out), we can check the type's each child and if we have tag_invokes for all of them, then we can set each one.

So, we simply need a tag_invoke of the most general case possible but restrict it so that its non static members MUST be convertible as well.

OR or AND:

We could use a little helper function with extract to relax even that "MUST" above and just simply ignore the non-static member fields of the type; for example:

struct users {
  int id;
  std::string username;
  std::ifstream profile_image; // no can do kinda field; gets ignored
} user;

struct admins {
   int id;
   std::string username;
} admin;
obj.extract(
  to{"user", ignore_unknown_fields(user)},
  to{"admin", admin} // no need to do anything, the global `tag_invoke` should be able to handle this
);

So, I don't think we need to do much about the extract idea, but we certainly can provide a general tag_invoke, or change the implementation of document/value::get member functions in order to let any type that all of its fields are convertible, be convertible.

Note: extract already ignores setting fields that JSON string doesn't provide, by ignore_unknown_fields I mean ignore the users' fields that we even can't convert to, even if the JSON string provides values for.

@the-moisrex
Copy link

the-moisrex commented Sep 21, 2024

@lemire, @FranciscoThiesen Here's a mock code of what I'm thinking:

namespace simdjson {

template <typename T>
consteval auto is_convertible_type() {
  for (std::meta::info field : nonstatic_data_members_of(^T)) {
    if (deserializable(type_of(field))) return false;
  }
  return true;
}

template <typename T>
  requires (is_convertible_type<T>()) // all of its fields MUST be deserializable as well
constexpr error_code tag_invoke(deserialize_tag, auto& value, T& out) {
  template for (constexpr auto e : std::meta::nonstatic_data_members_of(^T)) {
    simdjson::ondemand::object obj;
    if (auto error = value.get_object().get(obj); error) {
      return error;
    }
    // it might be better if we could somehow call .extract once
    obj.extract(
        to{name_of(e), [:e:]}
    );
  }
  return SUCCESS;
}

}

It's my first time writing C++26 reflections, so, I don't know how reflections work in above mock code.

Honestly, this could be laughably easy to implement if C++26's reflection is what I think it is.

@Yaraslaut
Copy link
Contributor

I looked at simdjson PR and it looks amazing, the issue with disserealization and reflection is the need to call recursion on objects that are not trivial, so you need to call sometimes to and sometimes sub in terms of simdjson PR

Here is idea that i used to make simple deserializer for yaml-cpp where it provides map-like structure via Node

template <typename T> void from_node(auto const &node, T &t) {

  util::for_range<0, util::number_of_members<T>()>([&]<auto I>() { // loop over members
    constexpr auto mem = util::member_info<T>(I);
    auto name = std::string{util::name_of(mem)};
    if constexpr (std::is_constructible_v<std::formatter<[:type_of(mem):]>>) { // check if type is trivial like double, int, char ...
      t.[:mem:] = node[name].template as<[:type_of(mem):]>());
    } else {
      from_node(node[name], t.[:mem:]); // call recursion when member is some other structure
    }
  });

Also, when you use reflection to deserialize it is not clear how to handle custom cases like this one

      to{"year", [&car](auto val) {
        car.year = val;
      }});

@the-moisrex
Copy link

We could even have another helper function or class for extract that

obj.extract(
  to{"user", ignore_fields(user, "password", "ip_address")},
);

Wait, that might be a good idea for now too in case we have a tag_invoke for user but we don't want some specific fields to be put into it even if its provided is the JSON string.

This could be cool, but it wouldn't be the most performant way that the user could implement user's tag_invoke. One way to do it is to instead of giving simdjson_result<value> to user's tag_invoke, wrap it in something like ignore_keys<3> that wraps simdjson_result<value>.

But that requires the implementers of tag_invokes to don't explicitly specify simdjson_result<value>& or simdjson_result<document>& as the argument and use auto&.

It might be a good idea to have a concept for types (document and value currently) that we can extract object from in a way that in the future helper functions can wrap them around.

@the-moisrex
Copy link

I looked at simdjson PR and it looks amazing, the issue with disserealization and reflection is the need to call recursion on objects that are not trivial, so you need to call sometimes to and sometimes sub in terms of simdjson PR

@Yaraslaut The whole idea of tag_invoke is that itself on its own should handle sub-objects.

Meaning, even though this is possible:

obj.extract(
  to{"user", sub{
    to{"id", user.id},
    to{"username", user.username}
  }},
);

But this is more general when you give a tag_invoke of it:

struct users {
  friend error_code tag_invoke ... {
   // ...
    obj.extract(
      to{"id", out.id},
      to{"username", out.username}
   );
  }
}


// now in the middle of your codes, you simply provide `user`:
std::vector<user> admins;
obj.extract(
  to{"user", user},
  to{"admins", admins}
);

We probably should provide tag_invokes for this issue as well. I'd call that General Madness since it'll allow such things:

std::vector<
  unique_ptr<shared_ptr<atomic<uniqe_ptr<user>>>> // and how much deep you'd like to go
> admins;
obj.extract(
  to{"admins", admins},
);

@lemire
Copy link
Member Author

lemire commented Sep 21, 2024

@the-moisrex Ideally, we'd call extract once, as it makes optimizations easier.

@lemire
Copy link
Member Author

lemire commented Sep 26, 2024

@the-moisrex Turns out that calling extract exactly once should be 'easy'.

#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants