Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior in generate with return by reference #807

Closed
vbeffara opened this issue Mar 10, 2018 · 5 comments · Fixed by #911
Closed

Unexpected behavior in generate with return by reference #807

vbeffara opened this issue Mar 10, 2018 · 5 comments · Fixed by #911
Labels

Comments

@vbeffara
Copy link

The following code gives a weird output:

#include <iostream>
#include <range/v3/all.hpp>
#include <vector>

const std::vector<int> p4{0, 1, 2, 3};

auto perms() {
    return ranges::view::generate([t = p4]() -> const std::vector<int> & {
        std::cout << "Returning vector of size " << t.size() << '\n';
        return t;
    });
}

int main(int /*argc*/, char ** /*argv*/) {
    auto ps = perms();
    for (const auto & p : ps | ranges::view::take(2)) std::cout << " -> Obtained vector of size " << p.size() << '\n';
    std::cout << "==========\n";
    for (const auto & p : perms() | ranges::view::take(2)) std::cout << " -> Obtained vector of size " << p.size() << '\n';
}

On both clang (OSX, trunk) and gcc (linux, 7.4) the output I obtain is this:

Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4
==========
Returning vector of size 4
 -> Obtained vector of size 0
Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4

Notice the 0 as the size of the first returned vector for the second version. I would not expect a difference between the two options here, is there something I am missing?

Related question, I would have expected the lambda passed to generate to be called 2 rather than 3 times with that code, are 3 calls the expected behavior?

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 10, 2018

Notice the 0 as the size of the first returned vector for the second version. I would not expect a difference between the two options here, is there something I am missing?

This looks like a memory bug to me. I get the same output that you get on debug mode, but if I run it with O3 I get:

Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4
==========
Returning vector of size 4
 -> Obtained vector of size 18446708891870198896
Returning vector of size 4
 -> Obtained vector of size 4
Returning vector of size 4

Notice that on my machine the code produces 0 with -O0 and 18446708891870198896 with -O3. This smells like undefined behavior somewhere.

Related question, I would have expected the lambda passed to generate to be called 2 rather than 3 times with that code, are 3 calls the expected behavior?

The lambda is called once on the construction of the view, and the result is stored within the view. When an iterator is incremented, a new result is stored in the view. When you write:

int main(int /*argc*/, char ** /*argv*/) {
    auto ps = perms();
    std::cout << " -> Obtained vector of size " << ranges::begin(ps)->size() << '\n';
    std::cout << " -> Obtained vector of size " << ranges::begin(ps)->size() << '\n';
}

which prints:

Returning vector of size 4
 -> Obtained vector of size 4
 -> Obtained vector of size 4

dereferencing the iterators does not need to check whether there is a value in the view already, and if there isn't, create one, because the view always contains one value. Also that example creates two iterators into the view, and both return the same value (the one that is cached inside the view). If one increments one of the iterators, that advances the range, so that dereferencing the other would also return the new value.

When you do a ps | view::take(2) you are basically iterating over the range [0, 2), with the begin iterator pointing to zero, and the end iterator pointing to 2. The range is empty when begin == end, so begin must be incremented 2 times to reach the end. A new value is generated every time the iterator is incremented independently of whether the result is read or not, because incrementing an iterator advances the range. So because one needs to iterate to one past the end to be able to do the begin == end comparison, the lambda is called a total of three times.

So I would say this is expected. Whether it's the right call, I don't know. This is an InputRange, so every time you dereference it the range is allowed to return a different value. A way to get the lambda to only be called twice in your case would be to store the cached value in the iterators and to generate a next value on dereference only.

That would mean, however, that while the code I posted above would call the lambda two times, dereferencing the two independent iterators to the front of the range would return two different values. And that code like this:

int main(int /*argc*/, char ** /*argv*/) {
    auto ps = perms();
    auto it = ranges::begin(ps);
    std::cout << " -> Obtained vector of size " << it>size() << '\n';
    std::cout << " -> Obtained vector of size " << it->size() << '\n';
}

would also call the lambda only twice, but dereferencing the same iterator twice would return a different value each time...

Whether dereferencing an input iterator is allowed to advance the range or not, I don't know. Dereferencing the same iterator is allowed to return two different values, but that does not mean that the dereference operation is allowed to advance the range.

It might be possible to have your cake and eat it too but every approach I can think of turns overly complicated quickly.

@vbeffara
Copy link
Author

OK, thanks a lot for the reply, I believe I see the point. I would have expected take(2) to advance only once but indeed the way it is now is closer to what take_while would do, and it makes sense. It does mean that generate_n(f,2) and generate(f) | take(2) have different semantics (the first one calls f twice, the second one 3 times), which is maybe surprising and could lead to issues if f has side effects, but I can live with that.

In any case, the undefined behavior remains.

@CaseyCarter
Copy link
Collaborator

CaseyCarter commented Apr 20, 2018

Closing as a duplicate of #819, despite this issue being older, because #819 has a clearer title.

EDIT: Nope - I was a bit hasty here.

@vbeffara
Copy link
Author

Wait, the initial bug I reported was about the undefined behavior, not about the number of times the function was called (that was incidental). As far as I am aware the main issue remains, and is not a duplicate of #819 - though it might well be fixed at the same time

@CaseyCarter CaseyCarter reopened this Apr 20, 2018
@ericniebler
Copy link
Owner

This is a tricksy bug. The problem happens when the generator function returns an internal reference. generate_view calls the generator on construction, which stores a reference in the view's cache. Then the view gets moved into the take view, leaving the reference (which is still pointing into the moved-from generator) intact. That stale reference then gets returned as the first element of the new generate. Stack use after free.

The fix isn't hard: don't call the generate function eagerly on construction. Defer until begin is called. Then, don't propagate cached values on moves and copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants