Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntMath.sum(int[]), and likewise #309

Open
gissuebot opened this issue Oct 31, 2014 · 42 comments
Open

IntMath.sum(int[]), and likewise #309

gissuebot opened this issue Oct 31, 2014 · 42 comments

Comments

@gissuebot
Copy link

Original issue created by hein.meling on 2010-01-05 at 11:15 PM


Suggest to provide:

sum()
average()

for Ints, Longs, Floats etc.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2010-01-06 at 01:00 AM


Agreed.

Varargs?

The best way to implement Longs.average() is a somewhat interesting puzzle.


Status: Accepted

@gissuebot
Copy link
Author

Original comment posted by gpampara on 2010-01-06 at 05:21 AM


Could you please elaborate at what the complexities are?

I think I'm missing something.

@gissuebot
Copy link
Author

Original comment posted by hein.meling on 2010-01-06 at 08:12 AM


About varargs; currently I have this:

int sum(Collection<Integer> c)

in my project. Not sure if such a collection can be transformed into an array without copying; if so varargs are
fine, otherwise I would like to have both.

Another issue that come to mind when I started to think about test cases for this was: what to return if the sum
exceeds the limit of its primitive type. (maybe this was what you were referring to as a puzzle with
Longs.average()?)

We have to avoid any silent overflows for this. Maybe the return type should be Number to allow
BigInteger/Decimal to be returned for those cases.

@gissuebot
Copy link
Author

Original comment posted by ogregoire on 2010-01-06 at 03:03 PM


for the Longs.average() wouldn't it be interesting to use BigInteger as internal
mechanics?

Or use an integer to keep track of the number of times the overflow (or underflow) is
reached.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2010-01-06 at 04:54 PM


Yup, BigInteger is the simple solution, but I'm quite curious to benchmark simpler
alternatives.

As for what to do in the case of overflow, I think ArithmeticException seems
appropriate.

@gissuebot
Copy link
Author

Original comment posted by fry@google.com on 2011-01-26 at 10:16 PM


(No comment entered for this change.)


Labels: Type-Enhancement

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by j...@nwsnet.de on 2011-01-27 at 09:07 AM


I'd like to have each method to accept iterable, too.

While looking at my custom math utilities: What about rate and percentage calculation? Like this:

  public static double calculateRate(double count, double total) {
    // Compare against 0.0 to prevent infinity as result.
    return total == 0.0 ? 0.0 : count / total;
  }

This might be useful to implement for multiple primitives as one might be dealing with doubles or floats etc. depending on the application.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2011-03-15 at 05:24 PM


I'm up for writing this, including some attempts I've thought of for "simpler alternatives."

@gissuebot
Copy link
Author

Original comment posted by raymondofrish on 2011-03-16 at 02:01 AM


How about something like the following? Not thorough benchmarked or optimized, but surely better than BigInteger, no? I hear there's a nice microbenchmarking tool available for Java...

public static long average(Iterable<Long> longs) {
    long count = Iterables.size(longs);

    long integerSum = 0;
    long remainderSum = 0;

    for (long num : longs) {
        integerSum += num / count;
        remainderSum += num % count;
    }

    return integerSum + remainderSum / count;
}

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-07-13 at 06:18 PM


(No comment entered for this change.)


Status: Triaged

@gissuebot
Copy link
Author

Original comment posted by cpovirk@google.com on 2011-07-13 at 08:40 PM


(No comment entered for this change.)

@gissuebot
Copy link
Author

Original comment posted by blank101 on 2011-07-19 at 07:06 PM


Recollecting back to my numerical methods class, there are also some interesting issues for the floating point primitives when calculating sum/average values beyond (over|under)flow.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2011-10-18 at 11:26 PM


Revisiting...

I'm thinking we might just provide sum(), which has a less ambiguous return type.

In particular:

long IntMath.sum(int... ints)
BigInteger LongMath.sum(long... longs)

since we can safely assume that the sum of an array of ints will fit into a long. (The result will fit in 96 bits, so we can do intermediate accumulation with smaller types, too...)

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by j...@nwsnet.de on 2011-10-19 at 07:53 AM


So iterables are not supported as arguments? I've got a lot of them floating around when creating reports.

@gissuebot
Copy link
Author

Original comment posted by dancerjohn on 2011-10-19 at 10:35 AM


You can use Longs.toArray(Lists.newArrayList(myIterable)) to convert to an array and pass to a varargs.

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by j...@nwsnet.de on 2011-10-19 at 12:31 PM


That does two copies. Seems a little unnecessary, considering that the "for each" loop introduced in Java 5 can iterate both arrays and iterables.

As implementing the loop twice (for each number type) would be somewhat unattractive, one might convert the other way round, i.e. from array to iterable. Iterators.forArray exists, and it might be the time to add it to Iterables, too. Then, that (or Arrays.asList(array).iterator(), as mentioned in a code comment) might more efficient.

For the record, a very common use case in my projects is retrieving a total field's value from some iterable of objects using transform and then summing up those numbers.

@gissuebot
Copy link
Author

Original comment posted by ogregoire on 2011-10-19 at 12:43 PM


Can't you have Collections instead of Iterables ? You speak about the transform case, but to be honest, I never had the case where I wanted to perform only one action on my transformed list, so it was always "persisted" in a collection in order to be used more than once without having to re-transform it. Don't forget that Collection2 contains a transform(Collection,Function) method as well...

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by j...@nwsnet.de on 2011-10-19 at 07:23 PM


Well, as long as I can get away with passing just iterables around, I try to do it. Maybe I'm overestimating the memory efficiency here (compared to, say, generators in Python).

OTOH, I had the impression that one tried to avoid arrays in favor of collections, especially in Guava. Also, what is a common use case that supports the idea of accepting varargs? I'd expect to either have very few numbers which I then could just easily sum up using the classic + operator, or a whole bunch of numbers that are in some kind of container (usually a collection, not an array [see my expectation above], but I don't know how popular number crunching in Java with Guava is) anyway.

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2011-10-20 at 06:17 AM


"avoid arrays in favor of collections" -- definitely, for Object arrays, but primitive arrays do make some sense.

@gissuebot
Copy link
Author

Original comment posted by fry@google.com on 2011-12-10 at 03:42 PM


(No comment entered for this change.)


Labels: Package-Primitives

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by j...@nwsnet.de on 2011-12-15 at 09:32 AM


I revisited the use of my custom sum methods and it turns out that they are called with an Iterable in most cases because Iterables.transform is often used beforehand (on values that mostly already are in Iterables, too) to extract an attribute. The few cases that pass a Collection are those that call Map.values().

@gissuebot
Copy link
Author

Original comment posted by fry@google.com on 2012-02-16 at 07:17 PM


(No comment entered for this change.)


Status: Acknowledged

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-25 at 06:47 PM


I still think the simplest most obvious win here is just to add IntMath.sum(int[]) and friends.

I'd like to save the topic of the mean/average for when we look at statistics calculation in general.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-05-25 at 08:30 PM


We need more detail, though:

  1. What is the return type of sum(int[])? It always fits in a long, but...
  2. What is the return type of sum(long[])? BigInteger, or long?
  3. Do we deal with floats and doubles? Do we take steps to reduce rounding error? (There are steps that can be taken.)

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-05-30 at 07:43 PM


(No comment entered for this change.)


Labels: -Type-Enhancement, Type-Addition

@gissuebot
Copy link
Author

Original comment posted by kevinb@google.com on 2012-06-22 at 06:16 PM


(No comment entered for this change.)


Status: Research

@gissuebot
Copy link
Author

gissuebot commented Nov 1, 2014

Original comment posted by emoro...@griddynamics.com on 2012-06-26 at 01:13 PM


private final <T> long sum(long bias, Iterable<T> ts, Function<T, Integer> transformer) {
        long sum = bias;
        for(T t : ts) {
            sum += transformer.apply(t);
        }
        return sum;
    }

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-06-26 at 01:26 PM


Hrrrrm. I'd like this method to do very limited things, because if you need it to do smarter things, you can just write the implementation yourself, given that it's only a few lines at the most.

At the moment, I'm leaning towards a very dumb, straightforward implementation. IntMath.sum(int...) returns an int. If you want something more sophisticated, whip it up yourself; this is just a convenience method for the most common case.

@gissuebot
Copy link
Author

Original comment posted by orionllmain on 2013-05-14 at 01:36 PM


int IntMath.sum(int[] array)
int IntMath.average(int[] array)
long LongMath.sum(int[] array)
long LongMath.sum(long[] array)
long LongMath.average(long[] array)
double DoubleMath.sum(int[] array)
double DoubleMath.sum(long[] array)
double DoubleMath.sum(float[] array)
double DoubleMath.sum(double[] array)
double DoubleMath.average(int[] array)
double DoubleMath.average(long[] array)
double DoubleMath.average(double[] array)
BigInteger BigIntegerMath.sum(int[] array)
BigInteger BigIntegerMath.sum(long[] array)
BigInteger BigIntegerMath.sum(BigInteger[] array)
BigInteger BigIntegerMath.average(int[] array)
BigInteger BigIntegerMath.average(long[] array)
BigInteger BigIntegerMath.average(BigInteger[] array)

@orionll
Copy link

orionll commented Jul 21, 2015

Any progress on this issue?

@ghost
Copy link

ghost commented Oct 29, 2015

Should this never be done in guava as Java 8 has exactly the duplicated API?

https://docs.oracle.com/javase/8/docs/api/java/util/stream/IntStream.html

@jcogilvie
Copy link

Many people use guava as a backfill of Java 8 functionality for Java 7-and-earlier projects (like Android).

@ronshapiro
Copy link
Contributor

@lowasser shall we bring this to API review?

@Maaartinus
Copy link

Even with Java 8, there seem to be some methods missing:
...
long LongMath.sum(int[])
BigInteger BigIntegerMath.sum(long[])
...
I'm not sure about choosing the classes according to the result type, but there are already precedents in Guava (e.g., factorial(int)). Both methods are rather simple, but getting them wrong isn't too hard either.

Using IntStream::sum sounds like a bad idea in case it might overflow one day and you might spend the next one chasing the bug.

Varargs or int/long streams arguments may be better.

@jbduncan
Copy link
Contributor

jbduncan commented Oct 4, 2017

@Maaartinus With regards to LongMath.sum(int[]), I believe one can emulate it for now with both streams and Guava's help, like so. (I've not run this code example through a compiler, so it may need some tweaking, but I hope the idea comes across well.)

int[] ints = ...;
long sum = Arrays.stream(ints).mapToLong(i -> i).reduce(0, LongMath::checkedAdd);

@jbduncan
Copy link
Contributor

jbduncan commented Oct 4, 2017

Likewise, for BigIntegerMath.sum(long[]), something like this should work:

long[] longs = ...;
BigInteger sum = Arrays.stream(longs).mapToObj(BigInteger::valueOf).reduce(BigInteger.ZERO, (a, b) -> a.add(b));

@Maaartinus
Copy link

@jbduncan Agreed. I guess, your sum(int[]) is as fast as it can get (note that you need no checkedAdd), but your sum(long[]) is probably much slower than slicing the long into two ints, accumulating each into a long and using BigInteger for the final result only.

@jbduncan
Copy link
Contributor

jbduncan commented Oct 6, 2017

@Maaartinus

Agreed. I guess, your sum(int[]) is as fast as it can get (note that you need no checkedAdd)...

Am I right to think that it's because a single long is large enough to hold the sum of an int[Integer.MAX_VALUE] where all elements are equal to either Integer.MAX_VALUE or Integer.MIN_VALUE?

Regardless, I think that LongMath::checkedAdd would almost certainly be needed when dealing with Streams with more than Integer.MAX_VALUE elements (which aren't common at all, I imagine). But then again, in such a situation, I imagine mapping and reducing to a BigInteger would be more sensible than reducing to a long anyway.

...but your sum(long[]) is probably much slower than slicing the long into two ints, accumulating each into a long and using BigInteger for the final result only.

I'm a bit lost but very curious by what you mean here. Can you give me a code example?

@Maaartinus
Copy link

Am I right to think

Yes, the minimum value is > Integer.MAX_VALUE * Integer.MIN_VALUE > -2**62, which fits, the maximum value is slightly smaller in magnitude.


a code example

long MASK = 0xFFFFFFFFL;
long high = 0, low = 0;
for (long x : longs) {
     low += x & MASK;
     high += x >> 32;
}
return BigInteger.valueOf(high).shiftLeft(32).add(BigInteger.valueOf(low));

There may be tons of errors.

@netdpb
Copy link
Member

netdpb commented Aug 5, 2019

Is this covered by Stats.of(1, 2, 3, ...).sum(), etc.?

@netdpb netdpb added the P3 no SLO label Aug 5, 2019
@kevinb9n
Copy link
Contributor

kevinb9n commented Aug 5, 2019

Only technically, but I would not expect Stats to be a good choice in cases where the Stats instance is immediately thrown away. It does too much extra work.

@orionll
Copy link

orionll commented Aug 6, 2019

Stats is indeed very slow. I've written a simple benchmark which compares direct array summation and Stats.sum. Stats is 7-10x time slower:

Benchmark             (size)  Mode  Cnt    Score     Error  Units
Statistics.sum            10  avgt    4    0,010 ?   0,001  us/op
Statistics.sum          1000  avgt    4    0,861 ?   0,011  us/op
Statistics.sum        100000  avgt    4   86,060 ?   1,232  us/op
Statistics.sumStats       10  avgt    4    0,077 ?   0,017  us/op
Statistics.sumStats     1000  avgt    4    8,608 ?   0,133  us/op
Statistics.sumStats   100000  avgt    4  863,270 ?  31,961  us/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants