Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference stability requirements? #101

Open
brevzin opened this issue Jul 15, 2023 · 5 comments
Open

Reference stability requirements? #101

brevzin opened this issue Jul 15, 2023 · 5 comments
Labels
question Further information is requested

Comments

@brevzin
Copy link
Contributor

brevzin commented Jul 15, 2023

If I have a sequence whose element type is T&, are there any rules around what the lifetime of that reference needs to be? Specifically, is this necessary valid:

auto cursor = flux::first(seq);
T& r = flux::read_at(seq, cursor);
flux::inc(seq, cursor);
use(r); // cursor has advanced, can I still use r?

One motivation for this question is how to do algorithms. Like min is currently

struct min_op {
template <sequence Seq, strict_weak_order_for<Seq> Cmp = std::ranges::less>
[[nodiscard]]
constexpr auto operator()(Seq&& seq, Cmp cmp = Cmp{}) const
-> flux::optional<value_t<Seq>>
{
return flux::fold_first(FLUX_FWD(seq), [&](auto min, auto&& elem) -> value_t<Seq> {
if (std::invoke(cmp, elem, min)) {
return value_t<Seq>(FLUX_FWD(elem));
} else {
return min;
}
});
}
};

where fold_first is

struct fold_first_op {
template <sequence Seq, typename Func, typename V = value_t<Seq>>
requires std::invocable<Func&, V, element_t<Seq>> &&
std::assignable_from<value_t<Seq>&, std::invoke_result_t<Func&, V&&, element_t<Seq>>>
[[nodiscard]]
constexpr auto operator()(Seq&& seq, Func func) const -> flux::optional<V>
{
auto cur = flux::first(seq);
if (flux::is_last(seq, cur)) {
return std::nullopt;
}
V init(flux::read_at(seq, cur));
flux::inc(seq, cur);
while (!flux::is_last(seq, cur)) {
init = std::invoke(func, std::move(init), flux::read_at(seq, cur));
flux::inc(seq, cur);
}
return flux::optional<V>(std::in_place, std::move(init));
}
};

What this means is that if I have a sequence where value=string and element=string const&, and I'm finding the minimum string, the worst-case scenario (the sequence is perfectly sorted, backwards) is that I'm doing N string copies. But in this case, if references were guaranteed/required stable, you could do a much more efficient approach via this:

template <sequence Seq, strict_weak_order_for<Seq> Cmp = std::ranges::less> 
[[nodiscard]] 
constexpr auto min_ref(Seq&& seq, Cmp cmp = Cmp{}) const 
    -> flux::optional<element_t<Seq>> // <== NB
{
    auto cursor = flux::first(seq);
    if (flux::is_last(seq, cursor)) {
        return nullopt;
    }
    
    value_t<Seq> const* best = &flux::read_at(seq, cursor);
    flux::inc(seq, cursor);
    while (not flux::is_last(seq, cursor)) {
        value_t<Seq> const* elem = &flux::read_at(seq, cursor);
        if (std::invoke(cmp, *elem, *best)) {
            best = elem;
        }
        flux::inc(seq, cursor);
    }
    return *best;
}

This does zero string copies. Basically, this is min_element instead of min. This version returns an optional reference, but an equivalent that returns an optional value would still be better since it does exactly 1 string copy.

So I guess the question is:

If a sequence's element type is an lvalue reference, does advancing the cursor invalidate that reference?

If NO, then we probably want a version of fold_first that returns optional<element> instead of optional<value>. Possibly up to even having fold_first itself just only ever return optional<element> (which is what it does in Rust 1) or possibly have the callable required to either return value or element and return accordingly.

If YES, then... I guess no further work necessary, but probably should be noted somewhere regardless.

Footnotes

  1. Rust originally called it fold_first, which I thought was a good name, so that's what I named our version of the algorithm for C++23, then they rename it to reduce. Jerks.

@tcbrindle
Copy link
Owner

I haven't documented this anywhere, but I definitely should, so here we go...

For single-pass sequences, there is no reference stability: that is, if you're holding on to a reference to an element returned by read_at(), it may be invalidated by the next call to inc() (via any cursor, but of course you should only have one when dealing with single-pass sequences).

For multipass_sequences, element references should not be invalidated, and your min_ref above should be valid.

I've previously thought there should be alternative versions of min/max returning optional<element_t> as you suggest: I don't know whether it would be better to just make these overloads with the same names, or call them something different as with ranges::min and ranges::min_element.

A related question that I haven't quite answered yet is about cursor invalidation, specifically, if I have a valid cursor cur for a copyable multipass sequence seq, then does cur need to be valid for a copy of seq? This seems like a desirable property and is true for integer indices, but I'd worry that it might limit implementations?

(I think we do need to require that given auto seq2 = std::move(seq), then cur is valid for seq2, but I think that's an easier thing to make work.)

@brevzin
Copy link
Contributor Author

brevzin commented Jul 17, 2023

(I think we do need to require that given auto seq2 = std::move(seq), then cur is valid for seq2, but I think that's an easier thing to make work.)

I don't think you can do that in general? Especially if you're providing an adapter from a C++ range to a sequence, since there's definitely no guarantee that an iterator is still valid through a move.

@tcbrindle
Copy link
Owner

The particular example I had in mind was a rotate adaptor:

template <multipass_sequence Seq>
auto rotate(Seq seq, cursor_t<Seq> mid) -> multipass_sequence auto;

This is a relatively easy adaptor to write: we iterate from mid until the end of seq, and then from the start of seq up to mid. But since we need to store (a move-initialised version of) mid in the adaptor object, we have to make sure that it's still valid for the move-initialised version of seq that the adaptor also stores.

@tcbrindle tcbrindle added the question Further information is requested label Jul 25, 2023
@tcbrindle
Copy link
Owner

Having pondered it some more, I think that having fold_first() (and min() and friends) returning optional<element_t> is too close to the kind of unsafe interface I'd rather avoid -- it would make it too easy to pass an rvalue sequence in and get a dangling optional<T&> out:

auto elem = flux::min(get_vector()); // assume this returns optional<element_t>

if (elem.has_value()) { // engaged if source vec was non-empty
    do_stuff(*elem); // uh-oh
}

Instead I think it would be better to have a more direct equivalent of std::min_element, that is, an algorithm which returns a cursor to the minimum element rather than the element itself. This is not only safer, but also potentially more useful -- I can definitely imagine situations where I'd want to know the position of the minimum/maximum element. The implementation could be something like:

template <multipass_sequence Seq, typename Cmp = std::ranges::less>
auto find_min(Seq&& seq, Cmp cmp = {}) -> cursor_t<Seq>
{
    auto min = first(seq);
    if (is_last(seq, min)) { 
        return min;
    }

    for (auto cur = next(seq, min); !is_last(seq, cur); inc(seq, cur)) {
        if (invoke(cmp, read_at(seq, cur), read_at(seq, min)) {
           min = cur;
        } 
    }

    return min;
}

We could also potentially optimise min() (and maybe fold_first()?) to avoid copies in the case where we're dealing with a multipass sequence whose element type is an lvalue reference, but I think the external interface (returning optional<value_t>) is probably best left as-is.

@brevzin
Copy link
Contributor Author

brevzin commented Aug 5, 2023

Downside of min_element, outside of just rarely being the interface you want, is that you also lose efficiency in the case where read_at actually has to do work (e.g. you're looking up the minimum of a .map(f), you're going to have to call f lots of times, and then once more to read out the value).

Certainly true that returning a cursor is safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants