Iterators efficiency and instance tracking

In the current implementation, the iterator implementation turns out to be very inefficient at times (to the point where it adds overhead that is order of magnitude bigger than the iterator logic itself).
#### Iterators with internal reference

Consider a hypothetical example (this would be quite typical for stream-like / input ranges):

``` cpp
struct point { int x; int y; };

struct range {
    int n;
    range(int n) : n(n) {}

    struct iterator {
        int i = 0;
        point p {};
        iterator() = default;
        iterator(int n) : i(n), p{n, n} {}
        const point& operator*() const { return p; }
        iterator& operator++() { --i; p.x = i; p.y = i; return *this; }
        bool operator==(const iterator& it) const { return i == it.i; }
    };

    iterator begin() { return {n}; }
    iterator end() { return {}; }
};
```

Assuming `point` has a corresponding `py::class_` binding, wrapping this range in a `py::make_iterator()` with default return value policy of `reference_internal` and iterating over it in Python will do roughly the following for each iteration step:
- advance C++ iterator, check if iteration is over, dereference the C++ iterator
- do the cast to Python `point` instance:
  - do a runtime check in the map of registered types to get `type_info` (why isn't this cached per each `py::cpp_function`?)
  - do a runtime check in the multimap of registered instances to find the instance
  - the instance will **always** be found
  - incref it and return

Note that this will essentially yield the same Python object on every iteration, but it will still perform two map lookups every time.

In this example, this could be completely avoided if `py::iterator_state` cached the resulting `py::object` (whose `->value` always points to the same C++ instance in this example, so it doesn't even need to be reallocated) and not just the current state of begin/end iterators. Instead of doing the cast, it could just incref the object and return the handle.
#### No internal reference or copy r/v policy

If the C++ iterator returns an object with a different address on each dereference, or if we specify the copy return value policy, things get even worse.

The sequence of steps is now:
- advance C++ iterator, check if iteration is over, dereference the C++ iterator
- do the cast to Python `point` instance:
  - do a runtime check in the map of registered types to get `type_info`
  - do a runtime check in the multimap of registered instances to find the instance
  - the instance will **never** be found
  - allocate new Python instance
  - (if copy r/v policy is specified) call C++ object's copy constructor
  - bind the Python instance to the C++ object
  - record the new instance in the registered instances multimap
  - (at some point in future) remove instance from registered multimap

If the downstream Python code doesn't care about yielded values outside of one iteration
cycle and doesn't pass them as arguments to other functions, which is quite often the case, 
i.e. if it's something like this:

``` python
sum(p.x * p.x + p.y * p.y for p in points_iterable)
```

or this:

``` python
for p in points_iterable:
    # do some computation, don't save p or pass it as argument anywhere
```

then registering/unregistering instances adds overhead that is completely unneeded. Plus, between the garbage collection cycles, the registered instances multimap will grow quite fast here which will slow things down even further. Here, the sequence of steps might as well just be:
- advance and derefence C++ iterator
- allocate new Python instance
- bind the C++ object to it

Would it make sense to have a return value policy `copy_untracked` (cast doesn't do registered instance lookup; dealloc calls dtor and doesn't try to unregister)? Or `reference_untracked` (cast doesn't do registered instance lookup; dealloc doesn't call dtor and doesn't try to unregister) or something like that? The only catch here is that some sort of flag must be stored in the instance itself, so that the deallocator knows not to try and erase it from registered instances multimap when the time comes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iterators efficiency and instance tracking #376

Iterators with internal reference

No internal reference or copy r/v policy

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Iterators efficiency and instance tracking #376

Description

Iterators with internal reference

No internal reference or copy r/v policy

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions