-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
summation functions should be lazy, more general than openArray
#288
Comments
openArray
Maybe in a seperate module? |
In general I agree but we need to watch out -- lazy APIs can become super annoying to use. ("Yes this really only requires an |
I disagree, from a long experience with D. D uses duck typing, not
I don't see the problem: proc sum(a: iterable): ElementType(a) =
# I'm debugging, please tell me it's length
when overloadExists(a.len): debugEcho a.len
# I'm debugging, please tell me all of its contents
when overloadExists(a.copy): debugEcho a.copy
# aka in D: a forward range, which has a `save` property, to avoid messing with original
# (eg: not available for things like stdin)
# do the work with what `a` offers
# conditionally use `len` if algorithm can exploit this (eg preallocate some seq)
for ai in a: result.add ai
echo sum(@[1,2]) # ok
echo sum(stdin.lazyMapIt(it.float)) # works too with stdin stream, requires O(1) memory instead of O(n) how this works:
proc sum(a: auto): ElementType(a) {.enableif: iterable(a).}
# => with sugar:
proc sum(a: iterable): ElementType(a)
proc sum1(a: iterable[string]): string = ...
proc sum2(a: iterable): ElementType(a) = ...
proc foo(a: iterable, b: iterable[ElementType(a)]): seq[ElementType(ElementType(a))]
# 1-off arbitrary custom conditions:
proc foo(a, b: auto): foo(a,b) {.enableif: a.type.size == b.type.size .} = ... note also that, unlike concepts, enableif doesn't subvert the type and is a transparent abstraction once the enableif condition is evaluated: # with concepts:
type Foo = concept x: x.len
proc main1(x: Foo): string = $x.type
echo main1(@[1,2]) # Foo instead of seq[int]
# with enableif + not-yet-implemented concept-like syntax sugar:
template Foo(x): bool = overloadExists(x.len)
proc main2(x: Foo): string = $x.type
echo main2(@[1,2]) # seq[int] links
it allows specializing algorithms depending on what the input offers, eg:
|
Stop lecturing me on these things please, I know how it works in D-land quite well. I also know how it works in C#-land and I've no desire to bring even more "duck typing" into Nim. |
While it may be a more general API, "online by default" may not be all you want it to be in non-API dimensions like speed, accuracy, and memory. I haven't checked lately, but for many years the GNU Scientific Library foisted this 2+ orders of magnitude slower online mean implementation in 80-bit Everything we add generates "why is it not best in class?" kinds of questions. Solving a problem badly or making it the default can be worse than not solving it at all. Not always, I know, perfect enemy of the good and all that. But this is Nim. The stdlib does not need to solve all problems. We already have |
I think we have iterators already and we just need make them slightly more flexible. Agree with @c-blake, sum of |
I don't follow your logic.
So you never need proc sumNaive1(a: iterable[T]): T =
for ai in a: result += ai
proc sumNaive2[T](a: openArray[T]): T =
for ai in a: result += ai Likewise with the more accurate sum variants in std/sums: proc sumShewchuck*[T: SomeFloat](a: iterable[T]): T =
var state = initShechukState[T]()
for ai in iter: add(state, ai)
peekFinal(state) And likewise with many more algorithms.
nim-lang/Nim#11992 solves this |
As written now, |
the way sums.sumsPairwise is implemented via recursion is inefficient, and it's easy to write it as an online algorithm that requires And it will give exactly the same result as the current
I disagree; so far every algorithm in std/sums can indeed be written in online fashion, with less memory usage, no worse and likely better performance for large Maybe you can come up with some summation algorithm that can't be written in online fashion (I'm curious which one?), then obviously you'll need |
Yes, I held back linking to the Wikipedia article in the doc comment giving the stack based algo which sounds like what @timotheecour (TC) is also thinking of, and I already said it would be much faster. The wiki pseudocode for the stack way does not unpack the last 7-ish recursion levels like the 128 in the current code, also important for performance but reducing accuracy. TC's evaluation of good ideas here seems to be based on what he, personally, has seen "so far" being "onlinable" and maybe ulterior motivations. My point here was that the "idea" here of robust summing is not, in fact, "intrinsically online". That follows from the uncontentious non-commutativity of FP arithmetic alone. APIs tracking ideas works best. Another way in which robust summing is not online is that the way to achieve the best error bound/parameters of said algo depends upon how many intermediate partial sums are expected a priori. Unless the sum in question is truly exact, it should really provide an error bound to the caller. Such bounds also probably need-ish the number of floats. No error bound for the caller is a far bigger API problem than online-ness. O(log(n)) is hopelessly imprecise if the set is so ill conditioned that the inexact answer is still garbage. But there are Per TC's curiosity, there is a large space of algos, some of which require pre-sorting arrays like Kahan’s cascaded-compensated summation (ref in that last chapter link). I think we can agree sorting would want the That last link also argues for So, look - robust summation is a very specialized area that is A) definitely not intrinsically online { order, error bounds/scale dependence of method, parallelism }, B) not currently used by anything in the stdlib, C) OR by even any package in the nimbleverse that I could tell from a Personally, I'd say this all argues for lifting |
Spoon feeding a sum algorithm with numbers because then you might be able to avoid a temporary seq allocation is not even "premature optimization", it can seriously harm correctness. |
There are contexts ( To me, though, this only argues for additional online APIs, not eliminating the (according to TC) "entirely redundant" Total elimination seems like frivolous backward incompatibility (which as mentioned, I looked into and may impact zero people, but who knows what invisible code may be out there). If you want to go that way, then I say go all the way and lift it into a package. If you do not want to go that far, then |
Not yet, but soon. |
now that we have |
openArray
openArray
typical summation functions (in std/sums) only use each input element once, so should be usable in an online mode. However the API's in std/sums take an
openArray
input, preventing that.Instead, a better API would maintain a state, and allow to update it when a new element is consumed.
example
instead of this API (nim-lang/Nim#16004)
use this:
This would be analog to other procs that can be used online, eg:
note, a high level wrapper can still be defined for convenience, eg:
ditto with other summation procs
note
even better, can also be generic wrt the summation algorithm:
note2
more generally, outside of std/sums, lots of algorithms should be usable online, and openArray isn't the best API for those. For these, an Iterable[T] (for now, untyped) is the better API.
The text was updated successfully, but these errors were encountered: