Skip to content

Iterators efficiency and instance tracking #376

Open
@aldanor

Description

@aldanor

In the current implementation, the iterator implementation turns out to be very inefficient at times (to the point where it adds overhead that is order of magnitude bigger than the iterator logic itself).

Iterators with internal reference

Consider a hypothetical example (this would be quite typical for stream-like / input ranges):

struct point { int x; int y; };

struct range {
    int n;
    range(int n) : n(n) {}

    struct iterator {
        int i = 0;
        point p {};
        iterator() = default;
        iterator(int n) : i(n), p{n, n} {}
        const point& operator*() const { return p; }
        iterator& operator++() { --i; p.x = i; p.y = i; return *this; }
        bool operator==(const iterator& it) const { return i == it.i; }
    };

    iterator begin() { return {n}; }
    iterator end() { return {}; }
};

Assuming point has a corresponding py::class_ binding, wrapping this range in a py::make_iterator() with default return value policy of reference_internal and iterating over it in Python will do roughly the following for each iteration step:

  • advance C++ iterator, check if iteration is over, dereference the C++ iterator
  • do the cast to Python point instance:
    • do a runtime check in the map of registered types to get type_info (why isn't this cached per each py::cpp_function?)
    • do a runtime check in the multimap of registered instances to find the instance
    • the instance will always be found
    • incref it and return

Note that this will essentially yield the same Python object on every iteration, but it will still perform two map lookups every time.

In this example, this could be completely avoided if py::iterator_state cached the resulting py::object (whose ->value always points to the same C++ instance in this example, so it doesn't even need to be reallocated) and not just the current state of begin/end iterators. Instead of doing the cast, it could just incref the object and return the handle.

No internal reference or copy r/v policy

If the C++ iterator returns an object with a different address on each dereference, or if we specify the copy return value policy, things get even worse.

The sequence of steps is now:

  • advance C++ iterator, check if iteration is over, dereference the C++ iterator
  • do the cast to Python point instance:
    • do a runtime check in the map of registered types to get type_info
    • do a runtime check in the multimap of registered instances to find the instance
    • the instance will never be found
    • allocate new Python instance
    • (if copy r/v policy is specified) call C++ object's copy constructor
    • bind the Python instance to the C++ object
    • record the new instance in the registered instances multimap
    • (at some point in future) remove instance from registered multimap

If the downstream Python code doesn't care about yielded values outside of one iteration
cycle and doesn't pass them as arguments to other functions, which is quite often the case,
i.e. if it's something like this:

sum(p.x * p.x + p.y * p.y for p in points_iterable)

or this:

for p in points_iterable:
    # do some computation, don't save p or pass it as argument anywhere

then registering/unregistering instances adds overhead that is completely unneeded. Plus, between the garbage collection cycles, the registered instances multimap will grow quite fast here which will slow things down even further. Here, the sequence of steps might as well just be:

  • advance and derefence C++ iterator
  • allocate new Python instance
  • bind the C++ object to it

Would it make sense to have a return value policy copy_untracked (cast doesn't do registered instance lookup; dealloc calls dtor and doesn't try to unregister)? Or reference_untracked (cast doesn't do registered instance lookup; dealloc doesn't call dtor and doesn't try to unregister) or something like that? The only catch here is that some sort of flag must be stored in the instance itself, so that the deallocator knows not to try and erase it from registered instances multimap when the time comes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions