-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - State Compaction #19518
Conversation
val exprs = transformFunctions(functions.map(name => | ||
s"$name(${arguments.map(_._2).mkString(", ")})")) | ||
|
||
splitExpressions(exprs, funcName, arguments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating a PR for the latest Spark. I think that it is great to reduce # of constant pool entries. I have one high level comment. What do you think about putting mutable states into arrays (i.e. performing mutable state compaction) only when there are many mutable states or only for certain mutable states that are rarely accessed? What do you think? |
@kiszk You are correct that the current implementation compacts all mutable state (where the state does not have to be explicitly inlined). To your last question, I'd attempted some analysis of the JVM bytecode of array versus inlined state initialized either through method calls or in loops. I'd posted the experiment and results: https://github.com/bdrillard/bytecode-poc If Spark has its own benchmarking tools, I'd be happy to use those to compare Catalyst-generated classes further. To the general question of when we compact state, I think some kind of threshold still does makes sense. It would be best to ensure that the typical code path (for typical Dataset schemas) remains un-impacted by the changes (as was the aim when generating nested classes in #18075). I've found trying to set a global threshold for when to compact mutable state can be hard. Some state has to be inlined (state that uses parameterized constructors that can't be easily initialized with loops, like the It's difficult to tell when a certain piece of state will be referenced frequently or infrequently. For example, we do know some pieces of primitive mutable state, like global booleans that are part of conditional checks, are initialized globally, assigned once in one method, and then referenced only once in a separate caller method. These are excellent candidates for compaction, since they proliferate very quickly and are, in a sense, "only used once" (declared, initialized, re-assigned in a method, accessed in another method, never used again). Other pieces of state, like row objects, and JavaBean objects, will be accessed a number of times relative to how many fields they have, which isn't necessarily easy info to retrieve during code generation (we'd have to reflect or do inspection of the initialization code to know how many fields such an object has). But these items are probably still good candidates for compaction in general because of how many of a given type there could be. I'm inclined to use a threshold against the name/types of the state, rather than a global threshold. Since |
@bdrillard I remember that we had the similar discussion about benchmarking. Could you see this discussion? |
@kiszk Ah, thanks for the link back to that discussion. I'll make modifications to the trials for better data. |
@bdrillard since my PR and other get merged now there are some conflicts, may you please fix them? Thanks. |
@@ -801,12 +801,12 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String | |||
private[this] def castToByteCode(from: DataType, ctx: CodegenContext): CastFunction = from match { | |||
case StringType => | |||
val wrapper = ctx.freshName("wrapper") | |||
ctx.addMutableState("UTF8String.IntWrapper", wrapper, | |||
val wrapperAccessor = ctx.addMutableState("UTF8String.IntWrapper", wrapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have something like
val wrapper = ctx.addMutableState("UTF8String.IntWrapper", v => s"$v = new UTF8String.IntWrapper();")
ping @bdrillard |
@cloud-fan I want to take this over if possible |
yea, ok @kiszk I'll review your work. |
@cloud-fan Is it better to use this PR? Or, create a new PR? |
ping @cloud-fan |
let's create a new PR |
OK, I will create a new PR |
Thanks for giving this the attention to shepard it on through. I haven't had the time to do the additional coding work necessary to properly benchmark it in the last few weeks. @kiszk, if there are any questions in regards to my earlier implementation as you make/review the second PR, I'm happy to make clarifications and would be able to respond to those in writing quickly. |
@bdrillard thanks! |
@bdrillard @cloud-fan @maropu Thus, I would like to use an approach to create inner classes to keep in scalar instance variables. Here are Test.java and myInsntance.py that I used.
Small MyInstance.java (N = 16, M = 4)
|
I'd prefer inner class approach. |
You are comparing array vs member variables, can we compare array vs inner class member variable? And too many classes will have overhead on the classloader, we should test some extreme cases like 1 million variables. |
@cloud-fan you are right, I am updating benchmark program and results. When we use an array approach, a global variable will be accessed by~ When we use an inner class approach, we still require constant pool entry for accessing instance variables (e.g. `this.inner001.globalVar55555) in one class. @bdrillard how did your implementation (probably around here) avoid this issue? This is because it is not necessary to split methods to access instance variables if method size is not large. As a result, these usages of constant pool entries may cause 64K constant pool entries problem again.
Using an array
Using global variables
|
what do you mean by this? using array can't reduce constant pool size at all? |
Not at all. If array index for an access is less than 32768 (e.g. In summary, the followings are # of constant pool entries to be used for accessing an entry.
To use an array can make it slow to reach the limit (i.e. 65535), but it would eventually occur in the extreme case. |
@kiszk thanks for your great analysis. May I have just a couple of additional questions? |
The hybrid approach sounds reasonable to me. Any special strategy to use to decide which fields are global variables and which are in array? |
Btw, can we config the maximum number of global variables? |
@kiszk But we still can save the name/type and field name for global variable? |
@viirya if you take a look at the example I posted, you can see that we are not saving either What may be interesting IMHO, is to evaluate where we are using a variable. Since when we have a lot of instance variables we are very likely to have also several inner classes (for splitting the methods), I think it would be great if we were able to declare variables which are used only in an inner class in that inner class. Unfortunately, I think also that this is not trivial to achieve at all. @kiszk what do you think? |
@mgaido91 Thanks. I looked at the constant pool you posted. It's clear. Any benefit to declare the variables in the inner classes? Looks like they still occupy constant pool entries? |
@viirya in general there is no benefit. There will be a benefit if we manage to declare them where they are used, but I am not sure this is feasible. In this way, they do not add any entry to the constant pool. For instance, if we have a inner class |
For the strategy, I'd like to give priority to primitive values to be in flat global variables. We also need to decide the priority between primitive types, according to which type has largest performance difference between flat global variable and array, and which type is used more frequently(may be boolean). |
First of all, I have to share sad news with you. Based on the analysis in the next comment, the array approach still uses less constant pool entry than other approach.
|
Next, I analyzed usage of constant pool entries and java bytecode ops using
Source program
Java bytecode
Constant pool
|
Here is a PR for |
The PR looks good. Although it can solve the constant pool pressure, however, I have a question. Does it mean every time a constant falling in short range is used in codes, we increase 1 byte in bytecodes because sipush is followed by two byte value? I'm afraid it may be negative to some cases like many short constants in a java program. For our usage, will it increase the bytecode size of methods too and break possible limit? |
In int case, the followings are the number of java byte code length for a value 1 byte: -1, 0, 1, 2, 3, 4, 5 (iconst_?) |
I created and ran another synthetic benchmark program for comparing flat global variables, inner global variables, and array using janinoc for target Java file.
WDYT? Any comments are very appreciated. Here are Test.java and myInsntance.py that I used.
|
In #19811, first, I will take a hybrid approach of outer (flat) global variable and arrays. The threshold for # of global variables would be configurable. I will give high priority to primitive variables to place them at the outer class due to performance.
|
I think ldc is 2 bytes and ldc_w is 3 bytes? |
You are right, thanks, updated. |
Good new is the issue in janino has been quickly fixed. Bad new is no official date to release the next version now. |
## What changes were proposed in this pull request? This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode. * SIPUSH bytecode is not used for short integer constant [apache#33](janino-compiler/janino#33). Please see detail in [this discussion thread](apache#19518 (comment)). ## How was this patch tested? Existing tests Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19890 from kiszk/SPARK-22688.
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode. * SIPUSH bytecode is not used for short integer constant [#33](janino-compiler/janino#33). Please see detail in [this discussion thread](#19518 (comment)). Existing tests Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19890 from kiszk/SPARK-22688. (cherry picked from commit 8ae004b) Signed-off-by: Sean Owen <sowen@cloudera.com>
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode. * SIPUSH bytecode is not used for short integer constant [#33](janino-compiler/janino#33). Please see detail in [this discussion thread](#19518 (comment)). Existing tests Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19890 from kiszk/SPARK-22688. (cherry picked from commit 8ae004b) Signed-off-by: Sean Owen <sowen@cloudera.com>
Can one of the admins verify this patch? |
…ies for mutable state ## What changes were proposed in this pull request? This PR is follow-on of apache#19518. This PR tries to reduce the number of constant pool entries used for accessing mutable state. There are two directions: 1. Primitive type variables should be allocated at the outer class due to better performance. Otherwise, this PR allocates an array. 2. The length of allocated array is up to 32768 due to avoiding usage of constant pool entry at access (e.g. `mutableStateArray[32767]`). Here are some discussions to determine these directions. 1. [[1]](apache#19518 (comment)), [[2]](apache#19518 (comment)), [[3]](apache#19518 (comment)), [[4]](apache#19518 (comment)), [[5]](apache#19518 (comment)) 2. [[6]](apache#19518 (comment)), [[7]](apache#19518 (comment)), [[8]](apache#19518 (comment)) This PR modifies `addMutableState` function in the `CodeGenerator` to check if the declared state can be easily initialized compacted into an array. We identify three types of states that cannot compacted: - Primitive type state (ints, booleans, etc) if the number of them does not exceed threshold - Multiple-dimensional array type - `inline = true` When `useFreshName = false`, the given name is used. Many codes were ported from apache#19518. Many efforts were put here. I think this PR should credit to bdrillard With this PR, the following code is generated: ``` /* 005 */ class SpecificMutableProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection { /* 006 */ /* 007 */ private Object[] references; /* 008 */ private InternalRow mutableRow; /* 009 */ private boolean isNull_0; /* 010 */ private boolean isNull_1; /* 011 */ private boolean isNull_2; /* 012 */ private int value_2; /* 013 */ private boolean isNull_3; ... /* 10006 */ private int value_4999; /* 10007 */ private boolean isNull_5000; /* 10008 */ private int value_5000; /* 10009 */ private InternalRow[] mutableStateArray = new InternalRow[2]; /* 10010 */ private boolean[] mutableStateArray1 = new boolean[7001]; /* 10011 */ private int[] mutableStateArray2 = new int[1001]; /* 10012 */ private UTF8String[] mutableStateArray3 = new UTF8String[6000]; /* 10013 */ ... /* 107956 */ private void init_176() { /* 107957 */ isNull_4986 = true; /* 107958 */ value_4986 = -1; ... /* 108004 */ } ... ``` ## How was this patch tested? Added a new test case to `GeneratedProjectionSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19811 from kiszk/SPARK-18016.
@bdrillard can you close this pr? |
This PR was addressed by #19811, closing this one. |
This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode. * SIPUSH bytecode is not used for short integer constant [apache#33](janino-compiler/janino#33). Please see detail in [this discussion thread](apache#19518 (comment)). Existing tests Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19890 from kiszk/SPARK-22688. (cherry picked from commit 8ae004b) Signed-off-by: Sean Owen <sowen@cloudera.com>
What changes were proposed in this pull request?
This PR is the part two followup to #18075, meant to address SPARK-18016, Constant Pool limit exceptions. Part 1 implemented
NestedClass
code splitting, in which excess code was split off into nested private sub-classes of theOuterClass
. In Part 2 we address excess mutable state, in which the number of inlined variables declared at the top of theOuterClass
can also exceed the constant pool limit.Here, we modify the
addMutableState
function in theCodeGenerator
to check if the declared state can be easily initialized compacted into an array and initialized in loops rather than inlined and initialized with its own line of code. We identify four types of state that can compacted:null
With mutable state compaction, at the top of the class we generate array declarations like:
and these arrays are initialized in loops as:
For compacted mutable state,
addMutableState
returns an array accessor value, which is then referenced in the subsequent generated code.Note: some state cannot be easily compacted (except without perhaps deeper changes to generating code), as some state value names are taken for granted at the global level during code generation (see
CatalystToExternalMap
inObjects
as an example). For this state, we provide aninline
hint to the function call, which indicates that the state should be inlined to theOuterClass
. Still, the state we can easily compact manages to reduce the Constant Pool to an tractable size for the wide/deeply nested schemas I was able to test against.How was this patch tested?
Tested against several complex schema types, also added a test case generating 40,000 string columns and creating the
UnsafeProjection
.