-
Notifications
You must be signed in to change notification settings - Fork 13
Reactive collections
I got the idea to create this API after reading Eric Lippert's article on iterators, and Jon Skeet's implementation of LINQ on these types, so I thought I could create my own, in part as a form of practice, but also to generalize certain methods and concepts in WinApi.
I am aware of Reactive Extensions, but I have no experience with it and I have taken no inspiration from it.
To obtain elements from a collection, there are two "objects" (may not be actual .NET objects) that communicate the elements over - the producer and the consumer.
In the "pull" approach (standard IEnumerable), the producer is the IEnumerator which can be queried for more elements, and tell if there are any more elements. Another example of such a producer is the ArgIterator value type. The consumer is the actual piece of code that queries the enumerator for more elements. The producer is passive, because it produces no elements on its own, they must be actively "pulled" from it. In unmanaged code, this model is used mainly in COM interop.
The "push" model is the opposite. Here, the producing object or code creates the elements sequentially on its own, "pushes" them to the consumer object (delegate), and optionally, the consumer object may return whether no more elements are expected, effectively terminating the iteration. This type of iterator is not represented much in .NET for reasons I will explain later, but the IObservable interface is a generalisation of this concept. The ForEach method is a partial realisation of this technique, but lacks many features. In WinApi, functions like EnumWindows are an example too.
Both models are equally valid and usable, but each shares its own set of issues. Usually, the "pull" model is easier to use for the consumer, because it only needs to call a method to get the next element, but it is not as simple for the producer to create a new class, define its state, and return specific elements based on it. Before the introduction of iterator blocks in C#, it was not trivial to implement enumerable objects, but with foreach
on one side and yield
on the other, it is fairly easy to use this model for both sides.
The "push" model is really simple for the producer, needs no syntactic sugar or state machines, and only requires a single function to consume the elements. However, the consumer almost always needs to pass a state together with the function, and if the framework or the producer doesn't support this, ugly tricks must be performed to iterate the collection (fortunately, .NET delegates contain both a function and its state, and lambdas are an easy way to create them). The downsides of this model that cannot be so easily overcome are the general inability to control the flow of elements, pause it or resume it and store it for later (although this is beyond the scope of simple foreach
).
Reactive collection in this library are represented with the IIterable interface with a single method, Iterate. The method takes an IIterator interface and calls its appropriate methods during the iteration. OnNext is called when a new element is produced, and OnCompleted is called when the collection is at its end (analogous to yield return
and yield break
). The OnNext method returns a bool signaling whether it should be given more elements, or the iteration should be stopped. That's all.
The EnumIterable class can be used as an example of the usage:
public class EnumIterable<T> : IIterable<T>
{
readonly IEnumerable<T> collection;
public EnumIterable(IEnumerable<T> collection)
{
this.collection = collection;
}
public void Iterate(IIterator<T> iterator)
{
foreach(T value in collection)
{
if(!iterator.OnNext(value)) break;
}
iterator.OnCompleted();
}
}
This class turns a standard collection in the "pull" model to the "push" model. It pushes all elements in the collection to the iterator, and marks the end with OnCompleted. Notice that generally, the iterable collection shoudn't be modified by calling Iterate, which should keep its state to itself (like it's with IEnumerator).
The opposite conversion, i.e. "push" to "pull" is non-trivial, because it needs interrupting the Iterate method to execute code outside of it. That is also a big reason for using the "pull" model by default, because the opposite conversion is much harder. The solution in this project, IterEnumerable, is using fibers (either implemented by threads, or supported by WinApi).
Now that we have the necessary basis for iterable collections (from now on, "iterable" will refer to "push"-based collections, while "enumerable" will refer to "pull"-based collection), there is still no way to modify, filter, or otherly process the elements in a generic way. Such a class must be both IIterable, because we need to iterate it, of course, but also IIterator in a way, because it must be used by its underlying iterable collection.
I have split this class into two concepts. The first is IIteratorLink which is something that can be provided with elements, but can be also passively observed by other iterators. Subscribe adds a "listener" to the object which will be given all elements that are pushed to the iterator. However, we also need a way to actually combine the link with an iterable to produce a new iterable collection.
private class LinkIterable<TSource, TResult> : IIterable<TResult>
{
readonly IIterable<TSource> iterable;
readonly IIteratorLink<TSource, TResult> link;
public LinkIterable(IIterable<TSource> iterable, IIteratorLink<TSource, TResult> link)
{
this.iterable = iterable;
this.link = link;
}
public void Iterate(IIterator<TResult> iterator)
{
using(var handle = link.Subscribe(iterator))
{
iterable.Iterate(link);
}
}
}
The consumer's iterator is passed to the link for the duration of the iteration, and the link is responsible with passing the elements to the provided iterator.
Now I had two options, either provide the minimum implementation of IIteratorLink and modify the code of LinkIterable for different LINQ functions, or make a powerful system of links that can be used to create any LINQ function. I chose the latter. BasicIteratorLink contains all the necessary implementation and methods to allow deriving new LINQ iterators in a simple way:
public abstract class BasicIteratorLink<TSource, TIntermediate, TResult> : IIteratorLink<TSource, TResult>
{ /* lots of code */ }
private class SelectIterator<TSource, TIntermediate, TResult> : BasicIteratorLink<TSource, TIntermediate, TResult>
{
readonly Func<TIntermediate, TResult> selector;
public SelectIterator(IIteratorLink<TSource, TIntermediate> source, Func<TIntermediate, TResult> selector) : base(source)
{
this.selector = selector;
}
protected override bool OnNextInternal(TIntermediate value)
{
return OnNextFinal(selector(value));
}
}
Notice there are three generic parameters now. TSource is what the source (or producer) iterator link expects to obtain from an iterable collection, TIntermediate is the type of elements the iterator link should produce (could be the same as TSource), and TResult is the final type of the transformed elements (produced by this iterator link). This allows for easy chaining of iterator links without having to produce a new iterable collection each time. And because an iterator link can be attached with any observer iterator, getting side values is easy too.
The methods provided by BasicIteratorLink are OnNext (provided wtih an element from the source collection), OnNextInternal (receiving the element processed by the inner link), and OnNextFinal, whose task is to push the element to all subscribed iterators. The respective Completed versions of these methods are also provided. Usually, it is necessary to override only the Internal one.