Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: test non-boxed, primitive collection storage with reflection #1484

Closed
wants to merge 4 commits into from
Closed

POC: test non-boxed, primitive collection storage with reflection #1484

wants to merge 4 commits into from

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Aug 12, 2016

I created a POC (not to be merged), demonstrating non-boxed primitive storage for Vector.

That is, if a Vector is created with a primitive wrapper type (e.g. Vector<Integer>), the internal array will be of the corresponding primitive type (e.g. int[]) instead of the boxed type (e.g. Integer[]).
All operations on the Vector's internal array are implemented using reflection or via specialized checks.

The memory usage for unboxed primitive storage is very promising, i.e. uses up to ~4x less memory than ArrayList or boxed Vector:

for 1024 elements
`java.util.ArrayList` uses `18.6 KB` (`0 bytes` overhead, `0.0` bytes overhead per element)
`javaslang.collection.BoxedVector` uses `19.3 KB` (`664 bytes` overhead, `0.6` bytes overhead per element)
`javaslang.collection.Vector` uses `4.7 KB` (`-14240 bytes` overhead, `-13.9` bytes overhead per element)

Boxed, no reflection (original implementation):

Operation   Impl                Params            Score  ±  Error  Unit
Create      slang_persistent      1024      550,181.653  ±  6.17% ops/s
Head        slang_persistent      1024  273,522,239.620  ± 15.05% ops/s
Get         slang_persistent      1024      253,750.035  ±  1.97% ops/s

Boxed, with reflection - without primitive conversion:

Operation   Impl                Params            Score  ± Error   Unit
Create      slang_persistent      1024      235,575.521  ± 2.90%  ops/s
Head        slang_persistent      1024   21,594,009.129  ± 0.77%  ops/s
Get         slang_persistent      1024       11,254.908  ± 1.70%  ops/s

Primitive, with reflection - note: uses ~4x less memory:

Operation   Impl                Params            Score  ± Error   Unit
Create      slang_persistent      1024       23,179.890  ± 1.36%  ops/s
Head        slang_persistent      1024   14,401,390.967  ± 2.22%  ops/s
Get         slang_persistent      1024        8,707.254  ± 1.84%  ops/s

i.e. ~10-50 times slower.

However, simple specialization (e.g. if Vector were an interface with e.g. IntVector/CharVector children) reveals (note: uses ~4x less memory):

Operation Impl                  Params            Score  ± Error   Unit
Create    slang_persistent        1024      279,322.949  ± 3.44%  ops/s
Create    slang_persistent_int    1024      655,092.717  ± 1.46%  ops/s

Head      slang_persistent        1024  168,508,482.722  ± 0.83%  ops/s
Head      slang_persistent_int    1024  269,753,246.425  ± 0.91%  ops/s

Get       slang_persistent        1024       78,920.869  ± 0.87%  ops/s
Get       slang_persistent_int    1024      252,564.494  ± 2.06%  ops/s

which is only ~60% of the original speed with boxed end result (i.e. a get that returns an Integer).
If we provide specialized methods (which could be accessed via casting to IntVector), it would have the same speed as the original (using a lot less memory).


EDIT:
specialized for most primitives and the numbers are pretty good! (26cbb13#diff-c6c1876105d111251f38e0916d4d99a4R75)
Also, I found a way of avoiding boxing at the end (making te whole solution viable), take a look :)

Operation  Impl                 Params            Score  ± Error   Unit
Create     slang_persistent       1024      353,200.887  ± 7.15%  ops/s
Create     slang_persistent_int   1024      711,118.523  ± 1.79%  ops/s

Head       slang_persistent       1024  144,142,491.461  ± 1.36%  ops/s
Head       slang_persistent_int   1024  207,905,574.556  ± 0.95%  ops/s

Get        slang_persistent       1024      106,121.968  ± 1.50%  ops/s
Get        slang_persistent_int   1024      214,445.932  ± 1.49%  ops/s

Iterate    slang_persistent       1024      116,656.843  ± 1.49%  ops/s
Iterate    slang_persistent_int   1024    2,163,161.600  ± 1.42%  ops/s

@l0rinc
Copy link
Contributor Author

l0rinc commented Aug 12, 2016

related to #1449

@l0rinc l0rinc changed the title POC: test primitive storage with reflection POC: test non-boxed, primitive collection storage with reflection Aug 12, 2016
private static final Class<?>[] WRAPPERS = {Boolean.class, Byte.class, Character.class, Double.class, Float.class, Integer.class, Long.class, Short.class, Void.class};
private static final Class<?>[] PRIMITIVES = {boolean.class, byte.class, char.class, double.class, float.class, int.class, long.class, short.class, void.class};

public static Class<?> toPrimitive(Class<?> wrapper) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to

public static Class<?> toPrimitive(Class<?> wrapper) {
    return wrapper;
}

to disable primitive conversion

@l0rinc
Copy link
Contributor Author

l0rinc commented Aug 12, 2016

return newTrailing;
}

static Object newInstance(Class<?> type, int size) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated all the reflective calls

@danieldietrich
Copy link
Contributor

Hi Lorinc,

thank you for the investigation and the resulting insights. Javaslang will not provide specializations for generic types (at least in the core module). My vision is to leverage http://openjdk.java.net/jeps/218 to get around the current restrictions of generics.

We could think about an additional module (javaslang-specialized?) that contains special versions of our collections but I think it is not worth the effort - it will lead to a maintenance hell because all changes to the core have to be duped to that module. Who will maintain it over time?

From the very beginning of Javaslang I decided to live with the shortcomings of the language (null, Objects vs. primitives, ...). That's the reason there are no specializations.

DD

@@ -319,363 +111,34 @@ public int slang_persistent() {
public int slang_persistent_int() {
int aggregate = 0;
for (int i : RANDOMIZED_INDICES) {
aggregate ^= slangPersistent.getInt(i);
int[] leafUnsafe = (int[]) slangPersistent.getLeafUnsafe(i);
Copy link
Contributor Author

@l0rinc l0rinc Aug 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting the array that contains the given index enables the user to avoid boxing :)
package private for now, could be used internally in CharSeq or BitSet

Pap Lőrinc added 4 commits August 13, 2016 18:18
The memory storage for unboxed primitive storage is very promising, i.e.:
    for 32768 elements
    `java.util.ArrayList` uses `638.4 KB` (`0 bytes` overhead, `0.0` bytes overhead per element)
    `javaslang.collection.Vector` uses `148.7 KB` (`-501392 bytes` overhead, `-15.3` bytes overhead per element)

    for 1024 elements
    `java.util.ArrayList` uses `18.6 KB` (`0 bytes` overhead, `0.0` bytes overhead per element)
    `javaslang.collection.Vector` uses `4.7 KB` (`-14240 bytes` overhead, `-13.9` bytes overhead per element)

    for 32 elements
    `java.util.ArrayList` uses `504 bytes` (`0 bytes` overhead, `0.0` bytes overhead per element)
    `javaslang.collection.Vector` uses `240 bytes` (`-264 bytes` overhead, `-8.2` bytes overhead per element)

But the cost of reflection makes it 10-100x slower :(

Boxed, no reflection:
    Target           Operation   Impl                  Params  Count            Score  ±     Error    Unit  slang_persistent
    VectorBenchmark  Create      slang_persistent          32      6   45,151,443.097  ±    16.79%   ops/s
    VectorBenchmark  Create      slang_persistent        1024      6      550,181.653  ±     6.17%   ops/s
    VectorBenchmark  Create      slang_persistent       32768      6       15,204.714  ±    14.61%   ops/s
    VectorBenchmark  Head        slang_persistent          32      6  281,538,398.064  ±     0.90%   ops/s
    VectorBenchmark  Head        slang_persistent        1024      6  273,522,239.620  ±    15.05%   ops/s
    VectorBenchmark  Head        slang_persistent       32768      6  281,954,132.384  ±     1.19%   ops/s
    VectorBenchmark  Get         slang_persistent          32      6   17,018,951.664  ±     8.07%   ops/s
    VectorBenchmark  Get         slang_persistent        1024      6      253,750.035  ±     1.97%   ops/s
    VectorBenchmark  Get         slang_persistent       32768      6        2,845.576  ±     2.53%   ops/s

Boxed with reflection:
    Target           Operation   Impl                  Params  Count            Score  ±     Error    Unit  slang_persistent
    VectorBenchmark  Create      slang_persistent          32      6   14,356,024.977  ±     1.47%   ops/s
    VectorBenchmark  Create      slang_persistent        1024      6      236,487.447  ±     3.82%   ops/s
    VectorBenchmark  Create      slang_persistent       32768      6        7,086.487  ±     0.87%   ops/s
    VectorBenchmark  Head        slang_persistent          32      6   22,301,704.436  ±     0.85%   ops/s
    VectorBenchmark  Head        slang_persistent        1024      6   21,896,434.896  ±     1.25%   ops/s
    VectorBenchmark  Head        slang_persistent       32768      6   21,572,719.717  ±     0.29%   ops/s
    VectorBenchmark  Get         slang_persistent          32      6      691,636.638  ±     1.36%   ops/s
    VectorBenchmark  Get         slang_persistent        1024      6       11,311.820  ±     0.85%   ops/s
    VectorBenchmark  Get         slang_persistent       32768      6          225.748  ±     1.31%   ops/s

Primitive with reflection:
    Target           Operation   Impl                  Params  Count            Score  ±     Error    Unit  slang_persistent
    VectorBenchmark  Create      slang_persistent          32      6      690,013.657  ±     1.16%   ops/s
    VectorBenchmark  Create      slang_persistent        1024      6       22,438.292  ±     2.39%   ops/s
    VectorBenchmark  Create      slang_persistent       32768      6          699.917  ±     1.76%   ops/s
    VectorBenchmark  Head        slang_persistent          32      6   14,328,490.550  ±     1.44%   ops/s
    VectorBenchmark  Head        slang_persistent        1024      6   14,192,942.897  ±     2.13%   ops/s
    VectorBenchmark  Head        slang_persistent       32768      6   12,364,363.839  ±     1.96%   ops/s
    VectorBenchmark  Get         slang_persistent          32      6      451,149.351  ±     1.98%   ops/s
    VectorBenchmark  Get         slang_persistent        1024      6        8,785.965  ±     0.83%   ops/s
    VectorBenchmark  Get         slang_persistent       32768      6          188.556  ±     1.81%   ops/s
    Operation  Impl                 Params            Score  ± Error   Unit
    Create     slang_persistent       1024      353,200.887  ± 7.15%  ops/s
    Create     slang_persistent_int   1024      711,118.523  ± 1.79%  ops/s

    Head       slang_persistent       1024  144,142,491.461  ± 1.36%  ops/s
    Head       slang_persistent_int   1024  207,905,574.556  ± 0.95%  ops/s

    Get        slang_persistent       1024      106,121.968  ± 1.50%  ops/s
    Get        slang_persistent_int   1024      214,445.932  ± 1.49%  ops/s

    Iterate    slang_persistent       1024      116,656.843  ± 1.49%  ops/s
    Iterate    slang_persistent_int   1024    2,163,161.600  ± 1.42%  ops/s
@@ -89,7 +89,7 @@ public Object slang_persistent() {

@Benchmark
public int slang_persistent_int() {
final int head = slangPersistent.intHead();
final int head = ((int[]) slangPersistent.getLeafUnsafe(0))[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-boxed head access

@l0rinc
Copy link
Contributor Author

l0rinc commented Aug 13, 2016

@danieldietrich, it seems I managed to work around the problem that specialization can have, i.e. I don't have to repeat any algoriths, it's contained in general getter/setter/newInstance/getLength methods, as seen in the code (no need for specialized subclasses).

Also, not everybody can wait another 4 years for automatic specialization to appear, especially that this POC proved to be a viable alternative :).
Will try to apply it to the whole Vector impl to see what benefits/drawbacks it has.
Please bare with me :).

@l0rinc
Copy link
Contributor Author

l0rinc commented Aug 14, 2016

My first full primitivization attempt for Vector: 212242a#diff-5f4a536088f7f19a51ebbef20cf7b3ef

The code looks basically the same, the only significant change in the algorithm is the way we access the array, i.e. instead of a[b] = c we have setAt(a, b, c).
Will investigate the reason why append and prepend are slower now, but otherwise the primitive internals are a lot faster for many methods (e.g. create and iterate are 3-5x faster, using 4x less total memory) :)

@danieldietrich, @ruslansennov, opinions :)?

@l0rinc l0rinc deleted the specialized branch August 15, 2016 08:55
@danieldietrich danieldietrich added this to the 2.1.0 milestone Aug 18, 2016
@danieldietrich danieldietrich removed this from the vavr-0.9.0 milestone Oct 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants