Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access by (const) reference #91

Closed
ropuls opened this issue Jun 15, 2015 · 23 comments
Closed

access by (const) reference #91

ropuls opened this issue Jun 15, 2015 · 23 comments

Comments

@ropuls
Copy link

ropuls commented Jun 15, 2015

Hi Niels,

yet another question. I try to avoid copying whereever possible (during serialization). So when:

json j = "foo";
if (j.is_string) {
  const std::string &s = j.get<const std::string &s>();
  // do something with s
}

the compiler tells us:

json.hpp:1010: error: forming pointer to reference type 'const std::basic_string<char>&'
         return get_impl(static_cast<T*>(nullptr));
                        ^

with a static cast:

  const std::string &s = static_cast<const std::string &>(j);

the compiler is happy, but apparently returns a reference to a temporary, so s is dangling.

Have you got any hints on how to access underlying memory (for string, int, ...) so we can avoid copying the values?

would

template<typename T> T* get_ptr();
template<typename T> const T* get_ptr<T>() const;

be acceptable?

Thanks and cheers,
Roman

@nlohmann
Copy link
Owner

Hi Roman,

I actually haven't thought about this yet. I'll see what I can find out.

Cheers,
Niels

@igadmg
Copy link

igadmg commented Jun 15, 2015

Roman, temporary object return should be handled by RVO, which is in C++ since 199x, so I think you are overoptimizing and there is no need in storing values in const references here.

On the other hand char * can return pointer to a string stored in a json parsing buffer - which definitely would be faster than building a std::string, but that way buffer should be changed and \0 placed at the end of the tokens.

@ropuls
Copy link
Author

ropuls commented Jun 16, 2015

Hi Igor, Nils,

RVO is not the point. If you're i.e. serializing a json struct into something else, you currently have to take copies of all keys and values inside the json structure for no good reason. Instead, it would be desirable to just reference keys and values without actually copying them. This is completely unrelated to RVO.

Examples where you want to have that behaviour is when

  • writing a json structure to the wire or file via msgpack, or
  • you want to asynchroneously write data, i.e. via asio::async_write, or
  • you want to bind a json object into a prepared statement in i.e. mysql.

So instead of taking copies everywhere, you keep the source json alive, and refer to it's memory regions during i/o operations.

Thanks and cheers,
Roman

@andreihu
Copy link

Hi Nils,

Here is a suggestion for the functionality:

In the json class, the following public interface could be supported:

    /// get a ptr to the value
    template<typename T>
    inline T* get_ptr() noexcept {
        return get_ptr_impl(static_cast<T*>(nullptr));
    }

    template<typename T>
    inline const T* get_ptr() const noexcept {
        return get_ptr_impl(static_cast<const T*>(nullptr));
    }

The corresponding private implementation:

    /// get_ptr implementations
    inline const bool* get_ptr_impl(const bool*) const noexcept{
        return m_type == value_t::boolean ? &m_value.boolean : nullptr;
    }

    inline bool* get_ptr_impl(bool *) noexcept {
        return m_type == value_t::boolean ? &m_value.boolean : nullptr;
    }

    inline const number_integer_t* get_ptr_impl(const number_integer_t*) const noexcept {
        return m_type == value_t::number_integer ? &m_value.number_integer : nullptr;
    }

    inline number_integer_t* get_ptr_impl(number_integer_t*) noexcept {
        return m_type == value_t::number_integer ? &m_value.number_integer : nullptr;
    }

    inline const number_float_t* get_ptr_impl(const number_float_t*) const noexcept {
        return m_type == value_t::number_float ? &m_value.number_float : nullptr;    
    }

    inline number_float_t* get_ptr_impl(number_float_t*) noexcept {
        return m_type == value_t::number_float ? &m_value.number_float : nullptr;
    }

    inline const string_t* get_ptr_impl(const string_t*) const noexcept {
        return m_type == value_t::string ? m_value.string : nullptr;
    }

    inline string_t* get_ptr_impl(string_t*) noexcept {
        return m_type == value_t::string ? m_value.string : nullptr;
    }

    inline const array_t* get_ptr_impl(const array_t*) const noexcept {
        return m_type == value_t::array ? m_value.array : nullptr;
    }

    inline array_t* get_ptr_impl(array_t*) noexcept {
        return m_type == value_t::array ? m_value.array : nullptr;
    }

    inline const object_t* get_ptr_impl(const object_t*) const noexcept {
        return m_type == value_t::object ? m_value.object: nullptr;
    }

    inline object_t* get_ptr_impl(object_t*) noexcept {
        return m_type == value_t::object ? m_value.object: nullptr;
    }

plus an accessor for object keys in the iterator classes:

        inline const typename object_t::key_type* key_ptr() const noexcept {
            m_object->m_type  == basic_json::value_t::object ? &(m_it.object_iterator->first) : nullptr;
        }

Some remarks:

  • Pointers are returned instead of references because the method can be noexcept, plus a returned reference can be part of an expression which accidentally ends up in an unwanted copy.
  • No implementation for the nullptr type. That would require a little change in the json_value class.
  • This implementation is less intelligent than the get flavour of methods, since it's not performing any conversion, and is only supported for the stored types (otherwise it would result in a conversion/copy).

Thanks
Andras

@nlohmann
Copy link
Owner

Hi all,

I thought about the issue and I think I have a solution:

With code like

json j = "This is a string stored in a JSON value.";

// explicit
const json::string_t& t = j.get<const json::string_t&>();

// implicit
const json::string_t& s = j;

you would get a const reference to the types of basic_json.

For json::string_t, this function would be implemented as follows.

const string_t& get_impl_ref(string_t*) const
{
    switch (m_type)
    {
        case (value_t::string):
        {
            return reinterpret_cast<const string_t&>(* m_value.string);
        }
        default:
        {
            throw std::domain_error("cannot cast " + type_name() + " to " + typeid(string_t).name());
        }
    }
}

Note that string_t* is only used to choose the correct version of get_impl_ref.

What do you think? Is this what you want? If so, I can add this for the value types (object_t, array_t, string_t, boolean_t, number_integer_t, and number_float_t).

Cheers,
Niels

@nlohmann
Copy link
Owner

Hi @andreihu,

it seems our posts overlap ;-)

I like your approach, because it's much simpler than my way with const references. However, I have a bad feeling about returning non-const pointers to the member variables. Furthermore, I'd rather raise an exception in case of a type error than to silently return a null pointer - but that may be personal taste.

Anyway - now that there are two proposals, I would be glad to hear opinions :-)

Cheers,
Niels

@ropuls
Copy link
Author

ropuls commented Jun 16, 2015

Hi Niels, Andras,

@nlohmann: sorry mistyping your name all the time, fixed now!

both version have their advantages. As usual, it all comes with some cost, which is responsibility here if one is dealing with raw pointers.

Pointer Access

  • exception freeness enables you to write exception-free code (assuming -fnoexcept)
  • non-const access to underlying pointers may enable some in-place processing.
  • greater flexibility (assuming you know what you're doing)
  • might even serve as foundation for internal handling

const reference access

  • easier access, good enough for most things
  • limited in case you really need a pointer to the underlying memory.
  • if somebody really wants to break things, he can cast away the reference both in pointers and references

My choice: both, and eventually mask out the pointer semantics via preprocessor via

#ifndef _NLOHMANN_JSON_POINTER_SEMANTICS_

Besides, I'd re-write your internal accessor to that one:

const string_t& get_impl_ref(string_t*) const
{
    return (m_type == value_t::string) ? *m_value.string : throw std::domain_error("not a string");
}

which is shorter and will run faster. I currently see no need for the reinterpret_cast, plus delegate the creation of the error message into a delegate which knows if RTTI is available or not (i.e. -fno-rtti and __GXX_RTTI under g++/llvm)

Thanks and cheers,
Roman

@nlohmann
Copy link
Owner

Hi Roman,

good point for having them both, and good point with RTTI - I shall remove all the typeid stuff.

One question: What do you mean with "might even serve as foundation for internal handling"?

Cheers
Niels

@ropuls
Copy link
Author

ropuls commented Jun 16, 2015

Hi Niels,

I saw a lot of switch statements inside the code and asked myself if pointer-based members could somehow unify access and spare code - not sure about that. Will come back in another ticket, if so.

Cheers,
Roman

@andreihu
Copy link

Hi Niels, Roman,

My 2 cents on the ptr/ref based accessors:

  • C++ has the long standing legacy of providing zero overhead abstractions, if possible. In my understanding my get_ptr proposal is that one. It's more or less analog to the std::vector's operator[] and at(). The first is the fast, simple, non-throwing version, the second one is the safer throwing.
  • Usage:

The good point with returning a nullptr is ease of usage. An idiomatic use of get_ptr is:

json j;
if (auto p = j.get_ptr<std::string>()) {
  // use p
}

Whilst the reference based approach forces me to use a separate check, or (even worse) a catch statement.

json j;
if (j.is_string()) {
  const std::string& r = j.get_ref<std::string&>();
  // use r
}
  • Type inference support
json j;
auto p = j.get_ptr<std::string>(); // this is completely fine, deduced type for std::string*

auto r = j.get_ref<std::string>(); // if i am not wrong, deduced type for r will be std::string (a copy)
const std::string& r = j.get_ref<std::string>(); // should be used, but this is ugly and redundant
const auto& r = j.get_ref<std::string>(); // not redundant, but easy to miss that & and get a copy
  • Unified access

For further study: Since get_ptr is the lowest level of the accessors, it's also has reusability potential. E.g. get/get_impl could be refactored in a way that a front end function checks convertibility (+ handling returned nullptr by throwing an exception) and get_ptr does the rest of the job. I think this would be a good move, but not the topic of this discussion.

  • Safety

Niels's concerns about safety are valid and should be addressed:

  • it should be documented that get_ptr returns a nullptr on type mismatch
  • also it should be documented either for get_ref and get_ptr that later assignments to the json object can invalidate returned pointers and references. Thus using the pointer after assignment is undefined behaviour by design.
  • i think it's not dangerous to return a pointer in a sense, that one cannot really break the json object via the pointer (except for UB in the previous case). We're protecting against Murphy, not Machiavelli ;).
  • also i think the preferred way of accessing members in the json object is still via get<>, because it has more rich semantics and convenience. In my understanding we're not compromising the library here, just exposing some low-level functionality for some narrow usage.

Regards
Andras

@ropuls
Copy link
Author

ropuls commented Jun 17, 2015

@andreihu: agree

@nlohmann: in case you'd agree, would you need help? If so, please advise (I guess some enable_if is needed to make overloads work on the public API for the get-family).

Thanks and cheers,
Roman

@nlohmann
Copy link
Owner

Hi all, thanks a lot for the lively discussion. I really appreciate this! I will have time on the weekend to integrate the ideas.

Until then, I would like to clarify some questions from my side:

  1. We agree that T get() remains as is to explicitly ask for copies of the stored values, including conversions where possible and exceptions in case of incompatibles.
  2. To allow for implicit type conversion, the operator T() will just be calling T get().
  3. There should be a function to return a const reference to the stored values. This function should only be applicable to the basic types. In case of incompatibilities, an exception is thrown.
  4. There should be a function to return a pointer to the stored values. This function should only be applicable to the basic types. In case of incompatibilities, a null pointer is returned. Though the pointer is non-const, anything but reading values leaves the basic_json object in an undefined state.

Open questions:

  1. Should the function from 3. really be called get_ref, or should it be integrated into the get function. That is, should we prefer auto r = j.get<const string_t&>() over auto r = j.get_ref<string_t>()?
  2. Same question as above: Should we get pointers via a dedicated get_ptr function (auto ptr = j.get_ptr<string_t>()) or rather via auto ptr = j.get<string_t*>()?
  3. Should operator T() be extended as well?
  4. What about non-const references?

From my point of view, I would like to keep the number of functions low and rather increase the use cases in which the existing functions work. Hence, I would answer the first three questions with yes.

What do you think?

@ropuls
Copy link
Author

ropuls commented Jun 18, 2015

Good day,

my naive approach when seeing a

template <typename T>
T get();

is that you get exactly that T that you specify, or a compiler error if that T is not available. If somebody says:

auto s = j.get<const std::string>()

then IMO he shall be served with a const copy of the string, even if it most likely makes no sense.

The same applies to question 4) non-const references, i.e:

void trim(std::string &s) {
  // removes trailing and leading whitespaces
}

trim(j.get<std::string &>())

would be perfectly fine for me, as it does not take copies, but allows in-place modification / transformation etc.

For the pointers, the get_ptr is a bit more expressive, but the templated get is absolutely fine for me, too.

My vote therefore is: extend the get<> to support [const,non-const]x[pointers,references,copies].

The only thing that we'd need to be aware of is that we're not accidentally doing an implicit conversion and return a temporary (reference or pointer), which is currently the case with the cast operator.

i.e.

class mystring;
json j = std::string("test");
  1. const mystring &s = j.get<const &mystring>(); // shall not compile
  2. const mystring &s = static_cast<mystring &s>(j); // shall not compile
  3. mystring s = static_cast<mystring>(j); // may/should work if mystring is convertible from json::string_type
  4. mystring *p = j.get<mystring *>(); // shall not compile

Thus, the getters most likely need some enable_if decoration and just be enabled for the exact type used by json.

The same would be true with cast operators, of course, plus maybe a check if that implicit conversion is not allowed when returning references/pointers, but valid when casting to values.

Thanks and cheers,
Roman

@andreihu
Copy link

Hi Roman, Niels,

My thoughts (i am following Niels's bulletpoints).

  1. Agreed. This will keep backwards compatibility, and covers majority of use cases.
  2. Agreed.
  3. I am a bit ambivalent about the get_ref(). Since the majority of the use cases are covered by get(). The remaining <1% (mostly performance-minded) use cases are covered by get_ptr() in an adequate non-throwing way. I feel this would be over-engineering of the interface to add get_ref().
  4. Using the pointer returned by get_ptr() in a non-const way is not an undefined behaviour. This is completely valid and will never break the class (assuming json is a string): *(j.get_ptr<std::string>()) = "hello world".

Open questions:
Whilst integrating everything under a big umbrella (get()) seems attracting at first, i would not recommend it. Overloading/templating can be very counterintuitive (and carries the risk of introducing bugs on refactoring), if the underlying behaviour is significantly different. E.g. the reference/pointer related overloads doesn't support conversion, whilst value based overloads support these things.

overload conversions exceptions
value yes yes
reference no yes
pointer no no

I think we're missing "conceptual sinergies" here to implement them under the same name.

Regards
Andras

nlohmann added a commit that referenced this issue Jun 24, 2015
- implemented `get_ptr` function to return pointer to value member
- overworked `get` function to support pointer types
- added test cases
- added documentation (see
http://nlohmann.github.io/json/classnlohmann_1_1basic__json.html) with
examples
@nlohmann
Copy link
Owner

Hi there,

I implemented a get_ptr function which a returns a pointer to the value member (or the null pointer, if the desired pointer type does not match the value type). I also extended the get function to be call with pointer types.

I would be glad for feedback on 40312fb. You may also want to check out the documentation at http://nlohmann.github.io/json/classnlohmann_1_1basic__json_a2efda2b6d931ee72d9757418042e74e1.html#a2efda2b6d931ee72d9757418042e74e1.

Cheers
Niels

@nlohmann
Copy link
Owner

Hi @andreihu, @ropuls, @yatagarasu25,

it would be great to get feedback on 40312fb. I implemented a way to get pointers to the value members. If the interface is OK, I would like to close this issue. Or do we need additional calls for references?

All the best,
Niels

@andreihu
Copy link

Hi Niels,

Thx for the modifications, i think it's ok. I don't think that we need reference based overloads. However i suggest you to reconsider adding pointer based overloads to get<> (see my previous comment on reasons).

Regards
Andras

@nlohmann
Copy link
Owner

Hi Andras,

so you mean get<json::string_t*>() should be disallowed?

Regards
Niels

@andreihu
Copy link

Hi Niels,

Yes, the reason is that the same name yields fundamentally different behaviour (implicit conversion and error reporting) for different types and this is might be misleading. That's why i would vote for different names (get<>, get_ptr<>) to clarify the distinction.

Since get_ptr<> implicitly tells that we're getting a pointer, i would prefer to use it as:

std::string* sptr = j.get_ptr<std::string>();

instead of:

std::string* sptr = j.get_ptr<std::string*>();

But this is only a minor thing...

Regards
Andras

@nlohmann
Copy link
Owner

@ropuls, @yatagarasu25 what do you think?

@ropuls
Copy link
Author

ropuls commented Jun 29, 2015

Hi Niels,

thanks a lot for the recent additions!

Regarding get_ptr at line 1929: IMO, there shall be

PointerType get_ptr() noexcept {...}

plus

const PointerType get_ptr() const noexcept {...}

If the object itself is const, it shall not internally cast to non-const (as in line 1939), and shall not expose a non-const pointer (let the user cast the const'ness away, then he's responsible).

Besides, everything is fine for me, and I appreciate the latest changes. Also, I am happy with all accessor versions, and IMO you cannot protect an API user from himself if he's completely unwilling to read the documentation / header comments about pre- and post-conditions.

Thanks a lot, and case closed for me (except you'd want to have the const/non-const overloads for get_ptr).

Cheers,
Roman

PS: Niels: did you get my email about cpp1x.com on gmail?

@nlohmann
Copy link
Owner

Hi @ropuls, I made some changes. It's more code in the end, but it feels better without a const_cast in it.

(About the mail: I got it and answer if I find the time.)

@andreihu
Copy link

Hi Niels,

Thanks for the effort, i think this is a useful addition.

Regards
Andras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants