Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - State Compaction #19518

Closed
wants to merge 2 commits into from

Conversation

bdrillard
Copy link

What changes were proposed in this pull request?

This PR is the part two followup to #18075, meant to address SPARK-18016, Constant Pool limit exceptions. Part 1 implemented NestedClass code splitting, in which excess code was split off into nested private sub-classes of the OuterClass. In Part 2 we address excess mutable state, in which the number of inlined variables declared at the top of the OuterClass can also exceed the constant pool limit.

Here, we modify the addMutableState function in the CodeGenerator to check if the declared state can be easily initialized compacted into an array and initialized in loops rather than inlined and initialized with its own line of code. We identify four types of state that can compacted:

  • Primitive state (ints, booleans, etc)
  • Object state of like-type without any initial assignment
  • Object state of like-type initialized to null
  • Object state of like-type initialized to the type's base (no-argument) constructor

With mutable state compaction, at the top of the class we generate array declarations like:

private Object[] references;
private UnsafeRow result;
private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
  ...
private boolean[] mutableStateArray1 = new boolean[12507];
private InternalRow[] mutableStateArray4 = new InternalRow[5268];
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter[] mutableStateArray5 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter[7663];
private java.lang.String[] mutableStateArray2 = new java.lang.String[12477];
private int[] mutableStateArray = new int[42509];
private java.lang.Object[] mutableStateArray6 = new java.lang.Object[30];
private boolean[] mutableStateArray3 = new boolean[10536];

and these arrays are initialized in loops as:

private void init_3485() {
    for (int i = 0; i < mutableStateArray3.length; i++) {
        mutableStateArray3[i] = false;
    }
}

For compacted mutable state, addMutableState returns an array accessor value, which is then referenced in the subsequent generated code.

Note: some state cannot be easily compacted (except without perhaps deeper changes to generating code), as some state value names are taken for granted at the global level during code generation (see CatalystToExternalMap in Objects as an example). For this state, we provide an inline hint to the function call, which indicates that the state should be inlined to the OuterClass. Still, the state we can easily compact manages to reduce the Constant Pool to an tractable size for the wide/deeply nested schemas I was able to test against.

How was this patch tested?

Tested against several complex schema types, also added a test case generating 40,000 string columns and creating the UnsafeProjection.

val exprs = transformFunctions(functions.map(name =>
s"$name(${arguments.map(_._2).mkString(", ")})"))

splitExpressions(exprs, funcName, arguments)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes I made here to splitExpressions were to handle instances where the split code method references were still over 64kb. It would seem this problem is addressed by @mgaido91 in #19480, and that implementation is much more thorough, so if that PR gets merged, I'd prefer to rebase against that.

@kiszk
Copy link
Member

kiszk commented Oct 19, 2017

Thank you for creating a PR for the latest Spark.

I think that it is great to reduce # of constant pool entries. I have one high level comment.
IIUC, this PR always perform mutable state compaction. In other words, mutable states are in arrays.
I am afraid about possible performance degradation due to increasing access cost by putting states in arrays.

What do you think about putting mutable states into arrays (i.e. performing mutable state compaction) only when there are many mutable states or only for certain mutable states that are rarely accessed?
Or, can we say there is no performance degradation due to mutable state compaction?

What do you think?

@bdrillard
Copy link
Author

@kiszk You are correct that the current implementation compacts all mutable state (where the state does not have to be explicitly inlined).

To your last question, I'd attempted some analysis of the JVM bytecode of array versus inlined state initialized either through method calls or in loops. I'd posted the experiment and results: https://github.com/bdrillard/bytecode-poc

If Spark has its own benchmarking tools, I'd be happy to use those to compare Catalyst-generated classes further.

To the general question of when we compact state, I think some kind of threshold still does makes sense. It would be best to ensure that the typical code path (for typical Dataset schemas) remains un-impacted by the changes (as was the aim when generating nested classes in #18075).

I've found trying to set a global threshold for when to compact mutable state can be hard. Some state has to be inlined (state that uses parameterized constructors that can't be easily initialized with loops, like the BufferHolder and UnsafeRowWriter). I've found situations where, due to code generator flow, we began by inlining an amount of state that could have been compacted, then started compacting state as after a set threshold, but then began inlining state again that could not be compacted, forcing us over the constant pool limit.

It's difficult to tell when a certain piece of state will be referenced frequently or infrequently. For example, we do know some pieces of primitive mutable state, like global booleans that are part of conditional checks, are initialized globally, assigned once in one method, and then referenced only once in a separate caller method. These are excellent candidates for compaction, since they proliferate very quickly and are, in a sense, "only used once" (declared, initialized, re-assigned in a method, accessed in another method, never used again).

Other pieces of state, like row objects, and JavaBean objects, will be accessed a number of times relative to how many fields they have, which isn't necessarily easy info to retrieve during code generation (we'd have to reflect or do inspection of the initialization code to know how many fields such an object has). But these items are probably still good candidates for compaction in general because of how many of a given type there could be.

I'm inclined to use a threshold against the name/types of the state, rather than a global threshold. Since freshName is always monotonically increasing from 1 for a given variable prefix, we could know when a threshold for state of that type was reached, and when we could begin compacting that type of state, independently/concurrently with the other types of state. Such a scheme would allow us to ensure the usual flow of code-generation remains as it is now, with no state-compaction for typical operations, and then with state-compaction in the more extreme cases that would threaten to blow the Constant Pool limit.

@kiszk
Copy link
Member

kiszk commented Oct 19, 2017

@bdrillard I remember that we had the similar discussion about benchmarking. Could you see this discussion?

@bdrillard
Copy link
Author

@kiszk Ah, thanks for the link back to that discussion. I'll make modifications to the trials for better data.

@mgaido91
Copy link
Contributor

@bdrillard since my PR and other get merged now there are some conflicts, may you please fix them? Thanks.

@@ -801,12 +801,12 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
private[this] def castToByteCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
case StringType =>
val wrapper = ctx.freshName("wrapper")
ctx.addMutableState("UTF8String.IntWrapper", wrapper,
val wrapperAccessor = ctx.addMutableState("UTF8String.IntWrapper", wrapper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have something like

val wrapper = ctx.addMutableState("UTF8String.IntWrapper", v => s"$v = new UTF8String.IntWrapper();")

@cloud-fan
Copy link
Contributor

ping @bdrillard

@cloud-fan
Copy link
Contributor

@kiszk @maropu any of you wanna take this over? This patch becomes important as we now split codes more aggressively.

@kiszk
Copy link
Member

kiszk commented Nov 22, 2017

@cloud-fan I want to take this over if possible
cc @maropu

@maropu
Copy link
Member

maropu commented Nov 22, 2017

yea, ok @kiszk I'll review your work.

@kiszk
Copy link
Member

kiszk commented Nov 22, 2017

@cloud-fan Is it better to use this PR? Or, create a new PR?

@kiszk
Copy link
Member

kiszk commented Nov 22, 2017

ping @cloud-fan

@cloud-fan
Copy link
Contributor

let's create a new PR

@kiszk
Copy link
Member

kiszk commented Nov 22, 2017

OK, I will create a new PR

@bdrillard
Copy link
Author

Thanks for giving this the attention to shepard it on through. I haven't had the time to do the additional coding work necessary to properly benchmark it in the last few weeks. @kiszk, if there are any questions in regards to my earlier implementation as you make/review the second PR, I'm happy to make clarifications and would be able to respond to those in writing quickly.

@cloud-fan
Copy link
Contributor

@bdrillard thanks!

@kiszk
Copy link
Member

kiszk commented Nov 23, 2017

@bdrillard @cloud-fan @maropu
I created and ran a synthetic benchmark program. I think that to use an array for a compaction (as shown in Array) is slower than to use scalar instance variables (as shown in Vars). In the following my case, 20% slower in the best time.

Thus, I would like to use an approach to create inner classes to keep in scalar instance variables.
WDYT? Any comments are very appreciated.

Here are Test.java and myInsntance.py that I used.

$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && javac Test.java && java Test


Result(us): Array
   0: 145333.227
   1: 144288.262
   2: 144233.871
   3: 144536.350
   4: 144503.269
   5: 144836.117
   6: 144448.053
   7: 144744.725
   8: 144688.652
   9: 144727.823
  10: 144447.789
  11: 144500.638
  12: 144641.592
  13: 144464.106
  14: 144518.914
  15: 144844.639
  16: 144780.464
  17: 144617.363
  18: 144463.271
  19: 144508.170
  20: 144929.451
  21: 144529.697
  22: 144273.167
  23: 144362.926
  24: 144296.854
  25: 144398.665
  26: 144490.813
  27: 144435.732
  28: 144675.997
  29: 144483.581
BEST: 144233.871000, AVG: 144566.806

Result(us): Vars
   0: 120375.384
   1: 119800.238
   2: 119822.842
   3: 119830.761
   4: 119836.781
   5: 120185.751
   6: 120208.140
   7: 120274.925
   8: 120112.109
   9: 120082.120
  10: 120063.456
  11: 120112.493
  12: 120144.937
  13: 119964.356
  14: 119941.633
  15: 119825.758
  16: 119677.506
  17: 119833.236
  18: 119749.781
  19: 119723.932
  20: 120197.394
  21: 120052.820
  22: 120006.650
  23: 119939.335
  24: 119857.469
  25: 120176.229
  26: 120153.605
  27: 120345.581
  28: 120163.129
  29: 120038.673
BEST: 119677.506, AVG: 120016.567

Small MyInstance.java (N = 16, M = 4)

class MyInstance {
  final int N = 16;
  int[] instance = new int[N];
  void accessArrays00000() {
    instance[8] = instance[0];
    instance[9] = instance[1];
    instance[10] = instance[2];
    instance[11] = instance[3];
  }
  void accessArrays00001() {
    instance[12] = instance[4];
    instance[13] = instance[5];
    instance[14] = instance[6];
    instance[15] = instance[7];
  }
  void accessArrays00002() {
    instance[0] = instance[8];
    instance[1] = instance[9];
    instance[2] = instance[10];
    instance[3] = instance[11];
  }
  void accessArrays00003() {
    instance[4] = instance[12];
    instance[5] = instance[13];
    instance[6] = instance[14];
    instance[7] = instance[15];
  }
  void accessArray() {
    accessArrays00000();
    accessArrays00001();
    accessArrays00002();
    accessArrays00003();
  }

  int instance00000;
  int instance00001;
  int instance00002;
  int instance00003;
  int instance00004;
  int instance00005;
  int instance00006;
  int instance00007;
  int instance00008;
  int instance00009;
  int instance00010;
  int instance00011;
  int instance00012;
  int instance00013;
  int instance00014;
  int instance00015;
  void accessVars00000() {
    instance00008 = instance00000;
    instance00009 = instance00001;
    instance00010 = instance00002;
    instance00011 = instance00003;
  }
  void accessVars00001() {
    instance00012 = instance00004;
    instance00013 = instance00005;
    instance00014 = instance00006;
    instance00015 = instance00007;
  }
  void accessVars00002() {
    instance00000 = instance00008;
    instance00001 = instance00009;
    instance00002 = instance00010;
    instance00003 = instance00011;
  }
  void accessVars00003() {
    instance00004 = instance00012;
    instance00005 = instance00013;
    instance00006 = instance00014;
    instance00007 = instance00015;
  }
  void accessVars() {
    accessVars00000();
    accessVars00001();
    accessVars00002();
    accessVars00003();
  }
}

@viirya
Copy link
Member

viirya commented Nov 23, 2017

I'd prefer inner class approach.

@cloud-fan
Copy link
Contributor

You are comparing array vs member variables, can we compare array vs inner class member variable? And too many classes will have overhead on the classloader, we should test some extreme cases like 1 million variables.

@kiszk
Copy link
Member

kiszk commented Nov 23, 2017

@cloud-fan you are right, I am updating benchmark program and results.
I realized that we still have limitation of constant pool entries at extreme cases in both approaches .

When we use an array approach, a global variable will be accessed by~ this.globalVar[55555]. Here is a bytecode sequence. Each access to an array element (index is greater than 32768 since sipush for 0 to 32767 do not use constant pool entry) requires one constant pool entry.
While we reduce one constant pool entry for global variable, we require one constant pool entry.

When we use an inner class approach, we still require constant pool entry for accessing instance variables (e.g. `this.inner001.globalVar55555) in one class.

@bdrillard how did your implementation (probably around here) avoid this issue?

This is because it is not necessary to split methods to access instance variables if method size is not large. As a result, these usages of constant pool entries may cause 64K constant pool entries problem again.

WDYT? cc @viirya @maropu

aload 0                        // load this
getfield [constant pool index] // load this.globalVar
ldc [constant pool index]      // load 55555 from constant pool and push it
iaload

Using an array

class Foo {
  int[] globalVars = new int[1024000];

  void apply0(InternalRow i) {
    globalVars[32768] = 1;
    globalVars[32769] = 1;
    ...
  }
  void apply1(InternalRow i) {
    globalVars[65535] = 1;
    globalVars[65536] = 1;
    ...
  }
  void apply2(InternalRow i) {
    ...
    globalVars[100000] = 1;  // 100000 - 32768 > 65535 i.e. may cause an constant pool entry overflow
    ...
  }
  void apply(InternalRow i) {
    globalVars[0] = 1;
    globalVars[1] = 1;
    ...
    apply0(i);
    apply1(i);
    apply2(i);
  }
}

Using global variables

class Foo {
  int globalVars0;
  int globalVars1;
  ...
  int globalVars32768;
  int globalVars32769;
  ...

  void apply0(InternalRow i) {
    globalVars32768 = 1;  // 32768 * 2 > 65535 i.e may cause an constant pool entry overflow
    globalVars32769 = 1;
    ...
  }
  void apply(InternalRow i) {
    globalVars0 = 1;
    globalVars1 = 1;
    ...
    apply0(i);
  }
}

@cloud-fan
Copy link
Contributor

Each access to an array element requires one constant pool entry.

what do you mean by this? using array can't reduce constant pool size at all?

@kiszk
Copy link
Member

kiszk commented Nov 23, 2017

what do you mean by this? using array can't reduce constant pool size at all?

Not at all. If array index for an access is less than 32768 (e.g. a[30000]), we can reduce constant pool size since constant pool entry is not required (iconst_?, bipush, or sipush java bytecode is used).
However, when array index for an access is greater than 32767 (e.g. a[40000]), we cannot reduce constant pool size since the access requires constant pool entry. This is because integer constant of 32768 or greater uses ldc java bytecode instruction [ref][ref].

In summary, the followings are # of constant pool entries to be used for accessing an entry.

  • a[0] = 0...a[32767] = 0 : 0
  • a[32768] = 0... : 1
  • this.globalVar = 0: 3 (1 is for entry, 1 is for field, 1 is for field name)
    Omitted entries for class and type since they are common among global variables

To use an array can make it slow to reach the limit (i.e. 65535), but it would eventually occur in the extreme case.

@mgaido91
Copy link
Contributor

@kiszk thanks for your great analysis. May I have just a couple of additional questions?
1 - In all your tests, which compiler are you using? Because I see that you are linking to the Oracle docs and maybe you are using javac for your tests, but in my tests (made for other cases) I realized that janinoc works a bit differently and what is true for javac may not be for janinoc.
2 - If the problem with array occurs when we go beyond 32767, what about creating many arrays with max size 32767? I see that this is not a definitive solution and still we have some limitations, but dividing the number of constant pool entries by 32767 looks a very good achievement to me.

@viirya
Copy link
Member

viirya commented Nov 24, 2017

The hybrid approach sounds reasonable to me. Any special strategy to use to decide which fields are global variables and which are in array?

@viirya
Copy link
Member

viirya commented Nov 24, 2017

Btw, can we config the maximum number of global variables?

@viirya
Copy link
Member

viirya commented Nov 24, 2017

When we use an inner class approach, we still require constant pool entry for accessing instance variables (e.g. `this.inner001.globalVar55555) in one class.

@kiszk But we still can save the name/type and field name for global variable?

@mgaido91
Copy link
Contributor

@viirya if you take a look at the example I posted, you can see that we are not saving either NameAndType or Fieldref, thus think the only solution to save constant pool entries we have found so far is to use arrays.

What may be interesting IMHO, is to evaluate where we are using a variable. Since when we have a lot of instance variables we are very likely to have also several inner classes (for splitting the methods), I think it would be great if we were able to declare variables which are used only in an inner class in that inner class. Unfortunately, I think also that this is not trivial to achieve at all. @kiszk what do you think?

@viirya
Copy link
Member

viirya commented Nov 24, 2017

@mgaido91 Thanks. I looked at the constant pool you posted. It's clear.

Any benefit to declare the variables in the inner classes? Looks like they still occupy constant pool entries?

@mgaido91
Copy link
Contributor

@viirya in general there is no benefit. There will be a benefit if we manage to declare them where they are used, but I am not sure this is feasible. In this way, they do not add any entry to the constant pool.

For instance, if we have a inner class InnerClass1 and we use isNull_11111 only there, if we define isNull_11111 as a variable of InnerClass1 instead of a variable of the outer class we have no entry about it in the outer class.

@cloud-fan
Copy link
Contributor

For the strategy, I'd like to give priority to primitive values to be in flat global variables. We also need to decide the priority between primitive types, according to which type has largest performance difference between flat global variable and array, and which type is used more frequently(may be boolean).

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

First of all, I have to share sad news with you.
janino does not use sipush for values from 128 to 32767. Current janino 3.0.7 uses iconst..., bipush, or ldc. javac uses sipush for values from 128 to 32767. In other words, if index is greater than 127, one constant pool is used by bytecode compiled by janino. It should be fixed.

Based on the analysis in the next comment, the array approach still uses less constant pool entry than other approach.

public class Array {
  int[] a = new int[1000000];

  void access() {
    a[5] = 0;
    a[6] = 0;
    a[127] = 0;
    a[128] = 0;
    a[1023] = 0;
    a[16383] = 0;
    a[32767] = 0;
    a[32768] = 0;
  }

  static public void main(String[] argv) {
    Array a = new Array();
    a.access();
  }
}
  void access();
    descriptor: ()V
    flags:
    Code:
      stack=3, locals=1, args_size=1
         0: aload_0
         1: getfield      #12                 // Field a:[I
         4: iconst_5
         5: iconst_0
         6: iastore
         7: aload_0
         8: getfield      #12                 // Field a:[I
        11: bipush        6
        13: iconst_0
        14: iastore
        15: aload_0
        16: getfield      #12                 // Field a:[I
        19: bipush        127
        21: iconst_0
        22: iastore
        23: aload_0
        24: getfield      #12                 // Field a:[I
        27: ldc           #13                 // int 128
        29: iconst_0
        30: iastore
        31: aload_0
        32: getfield      #12                 // Field a:[I
        35: ldc           #14                 // int 1023
        37: iconst_0
        38: iastore
        39: aload_0
        40: getfield      #12                 // Field a:[I
        43: ldc           #15                 // int 16383
        45: iconst_0
        46: iastore
        47: aload_0
        48: getfield      #12                 // Field a:[I
        51: ldc           #16                 // int 32767
        53: iconst_0
        54: iastore
        55: aload_0
        56: getfield      #12                 // Field a:[I
        59: ldc           #17                 // int 32768
        61: iconst_0
        62: iastore
        63: return

Constant pool:
   #1 = Utf8               Array
   #2 = Class              #1             // Array
   #9 = Utf8               a
  #10 = Utf8               [I
  #11 = NameAndType        #9:#10         // a:[I
  #12 = Fieldref           #2.#11         // Array.a:[I

  #13 = Integer            128
  #14 = Integer            1023
  #15 = Integer            16383
  #16 = Integer            32767
  #17 = Integer            32768

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

Next, I analyzed usage of constant pool entries and java bytecode ops using janinoc. The summary is as follows:

array[4]     : 6 + 0 * n entries, 6-8 java bytecode ops / read access
outerInstance: 3 + 3 * n entries, 5 java bytecode ops / read access
innerInsntace: 9 + 3 * n entries, 6 java bytecode ops / read access

Source program

public class CP {
  int[] a = new int[1000000];
  int globalVar0;
  int globalVar1;
  private Inner inner = new Inner();

  private class Inner {
    int nestedVar0;
    int nestedVar1;
  }

  void access() {
    a[4] = 0;
    a[5] = 0;

    globalVar0 = 0;
    globalVar1 = 0;

    inner.nestedVar0 = 0;
    inner.nestedVar1 = 0;
  }

  static public void main(String[] argv) {
    CP cp = new CP();
    cp.access();
  }
}

Java bytecode

  void access();
    descriptor: ()V
    Code:
      stack=3, locals=1, args_size=1
         0: aload_0
         1: getfield      #12                 // Field a:[I
         4: iconst_4
         5: iconst_0
         6: iastore
         7: aload_0
         8: getfield      #12                 // Field a:[I
        11: iconst_5
        12: iconst_0
        13: iastore

        14: aload_0
        15: iconst_0
        16: putfield      #16                 // Field globalVar0:I
        19: aload_0
        20: iconst_0
        21: putfield      #19                 // Field globalVar1:I

        24: aload_0
        25: getfield      #23                 // Field inner:LCP$Inner;
        28: iconst_0
        29: putfield      #28                 // Field CP$Inner.nestedVar0:I
        32: aload_0
        33: getfield      #23                 // Field inner:LCP$Inner;
        36: iconst_0
        37: putfield      #31                 // Field CP$Inner.nestedVar1:I
        40: return

Constant pool

   #1 = Utf8               CP
   #2 = Class              #1             // CP
   #9 = Utf8               a
  #10 = Utf8               [I
  #11 = NameAndType        #9:#10         // a:[I
  #12 = Fieldref           #2.#11         // CP.a:[I

  #13 = Utf8               globalVar0
  #14 = Utf8               I
  #15 = NameAndType        #13:#14        // globalVar0:I
  #16 = Fieldref           #2.#15         // CP.globalVar0:I

  #17 = Utf8               globalVar1
  #18 = NameAndType        #17:#14        // globalVar1:I
  #19 = Fieldref           #2.#18         // CP.globalVar1:I

  #20 = Utf8               inner
  #21 = Utf8               LCP$Inner;
  #22 = NameAndType        #20:#21        // inner:LCP$Inner;
  #23 = Fieldref           #2.#22         // CP.inner:LCP$Inner;

  #24 = Utf8               CP$Inner
  #25 = Class              #24            // CP$Inner
  #26 = Utf8               nestedVar0
  #27 = NameAndType        #26:#14        // nestedVar0:I
  #28 = Fieldref           #25.#27        // CP$Inner.nestedVar0:I
  
  #29 = Utf8               nestedVar1
  #30 = NameAndType        #29:#14        // nestedVar1:I
  #31 = Fieldref           #25.#30        // CP$Inner.nestedVar1:I

  #32 = Utf8               LineNumberTable
  #33 = Utf8               Code
  #34 = Utf8               main
  #35 = Utf8               ([Ljava/lang/String;)V
  #36 = Utf8               <init>
  #37 = NameAndType        #36:#8         // "<init>":()V
  #38 = Methodref          #2.#37         // CP."<init>":()V
  #39 = NameAndType        #7:#8          // access:()V
  #40 = Methodref          #2.#39         // CP.access:()V
  #41 = Methodref          #4.#37         // java/lang/Object."<init>":()V
  #42 = Integer            1000000
  #43 = Utf8               (LCP;)V
  #44 = NameAndType        #36:#43        // "<init>":(LCP;)V
  #45 = Methodref          #25.#44        // CP$Inner."<init>":(LCP;)V
  #46 = Utf8               Inner
  #47 = Utf8               InnerClasses

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

Here is a PR for janino to fix a problem regarding sipush.

@viirya
Copy link
Member

viirya commented Nov 24, 2017

The PR looks good.

Although it can solve the constant pool pressure, however, I have a question. Does it mean every time a constant falling in short range is used in codes, we increase 1 byte in bytecodes because sipush is followed by two byte value? I'm afraid it may be negative to some cases like many short constants in a java program.

For our usage, will it increase the bytecode size of methods too and break possible limit?

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

In int case, the followings are the number of java byte code length for a value

1 byte: -1, 0, 1, 2, 3, 4, 5 (iconst_?)
2 bytes: -128 ~ -2, 6 ~ 127 (bipush)
3 bytes: -32768 ~ -129, 128 ~ 32767 (sipush)
4 or 5 bytes2 or 3 bytes: others (ldc or ldc_w)

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

I created and ran another synthetic benchmark program for comparing flat global variables, inner global variables, and array using janinoc for target Java file.
The performance is not much different from the previous one. In summary, the followings are performance results (small number is better).

  • 1: array
  • 0.90: inner global variables
  • 0.73: flat global variables

WDYT? Any comments are very appreciated.

Here are Test.java and myInsntance.py that I used.

$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && janinoc MyInstance.java  && javac Test.java && java -Xmx16g Test

Result(us): Array
   0: 484251.446
   1: 483374.255
   2: 483956.692
   3: 482498.241
   4: 483602.261
   5: 482654.567
   6: 482896.671
   7: 483458.625
   8: 483194.317
   9: 483387.234
  10: 484103.729
  11: 483536.493
  12: 483790.828
  13: 483590.991
  14: 483993.488
  15: 483455.164
  16: 484040.009
  17: 483225.837
  18: 483126.520
  19: 484105.989
  20: 484988.935
  21: 483766.245
  22: 483667.930
  23: 483271.499
  24: 483071.606
  25: 483174.438
  26: 483602.474
  27: 483210.405
  28: 483907.061
  29: 483071.964
BEST: 482498.241000, AVG: 483532.530

Result(us): InnerVars
   0: 437016.533
   1: 436125.481
   2: 436360.534
   3: 435857.758
   4: 436166.243
   5: 437089.913
   6: 436168.359
   7: 435570.397
   8: 435550.848
   9: 435256.088
  10: 435252.679
  11: 435765.156
  12: 435646.739
  13: 437303.993
  14: 435315.530
  15: 435752.545
  16: 434857.606
  17: 436776.190
  18: 435444.877
  19: 435657.649
  20: 436248.147
  21: 436322.998
  22: 437214.262
  23: 435907.223
  23: 435907.223
  24: 435431.025
  25: 435274.317
  26: 435412.202
  27: 435670.321
  28: 436494.045
  29: 436347.838
BEST: 434857.606, AVG: 435975.250

Result(us): Vars
   0: 353983.048
   1: 354067.690
   2: 353138.178
   3: 354093.115
   4: 354067.180
   5: 352750.571
   6: 353672.510
   7: 355179.115
   8: 353296.750
   9: 354522.113
  10: 355221.301
  11: 355178.172
  12: 353859.319
  13: 353539.817
  14: 352703.352
  15: 353923.981
  16: 354442.744
  17: 355523.145
  18: 354849.122
  19: 354082.888
  20: 354673.504
  21: 355526.218
  22: 355264.029
  23: 355455.492
  24: 355520.322
  25: 353923.520
  26: 353796.600
  27: 355021.849
  28: 355800.387
  29: 353810.567
BEST: 352703.352, AVG: 354362.887

@kiszk
Copy link
Member

kiszk commented Nov 24, 2017

In #19811, first, I will take a hybrid approach of outer (flat) global variable and arrays. The threshold for # of global variables would be configurable.

I will give high priority to primitive variables to place them at the outer class due to performance.

I think it would be great if we were able to declare variables which are used only in an inner class in that inner class. Unfortunately, I think also that this is not trivial to achieve at all.
It would be great if we could do this. For now, #19811 will not address this.

@viirya
Copy link
Member

viirya commented Nov 25, 2017

4 or 5 bytes: others (ldc or ldc_w)

I think ldc is 2 bytes and ldc_w is 3 bytes?

@kiszk
Copy link
Member

kiszk commented Nov 25, 2017

I think ldc is 2 bytes and ldc_w is 3 bytes?

You are right, thanks, updated.

@kiszk
Copy link
Member

kiszk commented Nov 26, 2017

Good new is the issue in janino has been quickly fixed. Bad new is no official date to release the next version now.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 7, 2017
## What changes were proposed in this pull request?

This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode.

* SIPUSH bytecode is not used for short integer constant [apache#33](janino-compiler/janino#33).

Please see detail in [this discussion thread](apache#19518 (comment)).

## How was this patch tested?

Existing tests

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19890 from kiszk/SPARK-22688.
asfgit pushed a commit that referenced this pull request Dec 7, 2017
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode.

* SIPUSH bytecode is not used for short integer constant [#33](janino-compiler/janino#33).

Please see detail in [this discussion thread](#19518 (comment)).

Existing tests

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19890 from kiszk/SPARK-22688.

(cherry picked from commit 8ae004b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
asfgit pushed a commit that referenced this pull request Dec 8, 2017
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode.

* SIPUSH bytecode is not used for short integer constant [#33](janino-compiler/janino#33).

Please see detail in [this discussion thread](#19518 (comment)).

Existing tests

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19890 from kiszk/SPARK-22688.

(cherry picked from commit 8ae004b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 19, 2017
…ies for mutable state

## What changes were proposed in this pull request?

This PR is follow-on of apache#19518. This PR tries to reduce the number of constant pool entries used for accessing mutable state.
There are two directions:
1. Primitive type variables should be allocated at the outer class due to better performance. Otherwise, this PR allocates an array.
2. The length of allocated array is up to 32768 due to avoiding usage of constant pool entry at access (e.g. `mutableStateArray[32767]`).

Here are some discussions to determine these directions.
1. [[1]](apache#19518 (comment)), [[2]](apache#19518 (comment)), [[3]](apache#19518 (comment)), [[4]](apache#19518 (comment)), [[5]](apache#19518 (comment))
2. [[6]](apache#19518 (comment)), [[7]](apache#19518 (comment)), [[8]](apache#19518 (comment))

This PR modifies `addMutableState` function in the `CodeGenerator` to check if the declared state can be easily initialized compacted into an array. We identify three types of states that cannot compacted:

- Primitive type state (ints, booleans, etc) if the number of them does not exceed threshold
- Multiple-dimensional array type
- `inline = true`

When `useFreshName = false`, the given name is used.

Many codes were ported from apache#19518. Many efforts were put here. I think this PR should credit to bdrillard

With this PR, the following code is generated:
```
/* 005 */ class SpecificMutableProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection {
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private InternalRow mutableRow;
/* 009 */   private boolean isNull_0;
/* 010 */   private boolean isNull_1;
/* 011 */   private boolean isNull_2;
/* 012 */   private int value_2;
/* 013 */   private boolean isNull_3;
...
/* 10006 */   private int value_4999;
/* 10007 */   private boolean isNull_5000;
/* 10008 */   private int value_5000;
/* 10009 */   private InternalRow[] mutableStateArray = new InternalRow[2];
/* 10010 */   private boolean[] mutableStateArray1 = new boolean[7001];
/* 10011 */   private int[] mutableStateArray2 = new int[1001];
/* 10012 */   private UTF8String[] mutableStateArray3 = new UTF8String[6000];
/* 10013 */
...
/* 107956 */     private void init_176() {
/* 107957 */       isNull_4986 = true;
/* 107958 */       value_4986 = -1;
...
/* 108004 */     }
...
```

## How was this patch tested?

Added a new test case to `GeneratedProjectionSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19811 from kiszk/SPARK-18016.
@maropu
Copy link
Member

maropu commented Dec 20, 2017

@bdrillard can you close this pr?

@bdrillard
Copy link
Author

This PR was addressed by #19811, closing this one.

@bdrillard bdrillard closed this Dec 20, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode.

* SIPUSH bytecode is not used for short integer constant [apache#33](janino-compiler/janino#33).

Please see detail in [this discussion thread](apache#19518 (comment)).

Existing tests

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19890 from kiszk/SPARK-22688.

(cherry picked from commit 8ae004b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants