-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce hash table switch strategy #895
Conversation
double minDensity = 1 / Math.sqrt(strategy.getAverageEffort() * tableSwitchBonus); | ||
|
||
Optional<Hasher> hasher = Optional.empty(); | ||
if (supportsHashTableSwitch()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a supportsHashTableSwitch()
seems odd. Could we instead add a method that returns the hasher and the default implementation just returns Optional.empty()
?
Yes, that's ok. Especially since this is not a feature but an optimization. Some things are only implemented for AMD64. We just need to find a good way to abstract it (see my comment). |
@@ -88,6 +90,8 @@ | |||
|
|||
public static class Options { | |||
// @formatter:off | |||
@Option(help = "", type = OptionType.Expert) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New options must not have an empty help
value. That said, it would be best to avoid adding this option altogether. Can you not tune the heuristic to choose the hashing strategy?
compiler/src/org.graalvm.compiler.lir/src/org/graalvm/compiler/lir/hashing/HashFunction.java
Outdated
Show resolved
Hide resolved
compiler/src/org.graalvm.compiler.lir/src/org/graalvm/compiler/lir/hashing/HashFunction.java
Outdated
Show resolved
Hide resolved
compiler/src/org.graalvm.compiler.lir/src/org/graalvm/compiler/lir/hashing/Hasher.java
Show resolved
Hide resolved
masm.jmp(scratchReg); | ||
|
||
// Inserting padding so that jump table address is 4-byte aligned | ||
if ((masm.position() & 0x3) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be masm.align(4). I guess this is a copy paste from the existing TableSwitchOp, so maybe fix it too.
masm.leaq(scratchReg, new AMD64Address(AMD64.rip, 0)); | ||
final int afterLea = masm.position(); | ||
|
||
// Jump to default target if index doesn't contain the expected value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments explaining the table structure would be helpful and maybe merge the defaultTarget != null ifs.
|
||
// Jump to default target if index doesn't contain the expected value | ||
if (defaultTarget != null) { | ||
masm.movslq(valueScratchReg, new AMD64Address(scratchReg, idxScratchReg, Scale.Times8, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might consider loading 8 bytes here instead of issuing two loads. If you put key is in the low 32 bits you can still use cmpl and then just perform a shift instead of the second load. You'd want to align the table at 8 bytes instead which might be a good idea in any case when you have both key and offset in the table. Having a single load would also allow you to eliminate the move of indexReg to idxScratchReg since you'd only have a single use of it.
} | ||
} | ||
if (candidates.isEmpty()) { | ||
return Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how often this happens in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens often since minDensity
is normally too low and the hasher stops the evaluation based on it. It's necessary to review the heuristic that makes the decision between the density of the table switch x the cost of other strategies. I'll remove the bonus parameter for now and follow up with a separate PR.
In your benchmark numbers what kind of switch implementation is being used in the "without hash switch" case? Presumably it's not a tableswitch. how does that perform? |
I hope it will fix the following performance regression: #412 |
@plokhotnyuk It doesn't seem it will. The values are too sparse and the hasher can only find a function at arity 846, which would make the table too large. public static void main(String[] args) {
int[] keys = {3355, -1193609971, 3373707, -43264386, 1901043637, -1724546052, 116079, -2102114367, -608539730, -460163995, -1666926107, -1974943731, 1369680106, 226316666, 1699658638,
36848094, -1085743021, -1994383672, -1824078800, 3314158, 647263482, 583435343, 1690715422, 1592986856, -1189803120, 594447228, 1021718665, -174080651, -782008927, -690137550,
1512257332, -1412739479, 486184128, -1774134745, -101767272, 284519336, -312724053, 545330631, 765915793, 585125142, 1272354024, -475640257};
JavaConstant[] c = new JavaConstant[keys.length];
for (int i = 0; i < keys.length; i++)
c[i] = JavaConstant.forInt(keys[i]);
Optional<Hasher> forKeys = forKeys(c, 0.00001);
System.out.println(forKeys); // Optional[Hasher[function=rotateRight(x, p) ^ x, effort=5, cardinality=864]]
} |
@Use protected Value value; | ||
@Temp({REG, HINT}) protected Value hash; | ||
@Temp protected Value scratch; | ||
@Temp({REG, HINT}) protected Value entryScratch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these annotations are correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the HINT on this one and add REG to scratch. It can be tricky to get optimal register usage for these complex operations since we don't have a completely flexible way of specifying register interference to the allocator. The use of Alive combined with Temp is the safe way to go.
@christhalinger @tkrodriguez @dougxc thank you for the reviews! I've just addressed your suggestions. @tkrodriguez I changed the code to force specific strategies to collect the performance numbers:
I'm puzzled that the hash table switch performs much better than the table switch. I did a bunch of manual testing and it seems to function properly, but I wonder if there's something wrong. |
okay, I just found the bug. The register annotations were wrong. The hash table switch has performance similar to the regular table switch after the fix:
|
|
||
import jdk.vm.ci.meta.JavaConstant; | ||
|
||
class Switch03TestGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this class to automatically generate source code with switches that use all available hash functions. It'll be useful if someone adds a new hash function, but let me know if you'd prefer not to have it in the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave it in. I would suggest making it a static nested class named Generator
at the bottom of Switch03.java
to make the connection as clear as possible.
I've introduced tests and addressed the initial review comments. The PR is ready for review. |
efd3ed2
to
7e74561
Compare
* Applies the hash function. | ||
* | ||
* @param value the value to be hashed | ||
* @param min the lowest key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param min {@code value} is guaranteed to be greater or equal to this minimum
You should add a @returns
tag that explains any guarantees (if any) on the return value based on value
and min
.
public abstract int apply(long value, long min); | ||
|
||
/** | ||
* Applies the hash function to a lir value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "apply" here I think you mean "generates LIR that implements the hash function in terms of value
and min
". In that case, please update the documentation and rename the method to generate
.
public abstract Value apply(Value value, Value min, ArithmeticLIRGeneratorTool gen); | ||
|
||
/** | ||
* Returns an estimate of the hashing effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume "effort" means execution cost (in time) of the hashing function. Please update doc to clarify this.
static { | ||
//@formatter:off | ||
|
||
// `x` is the value and `s` is the min (lowest key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use the term "key" here since you describe x as the value. For consistency with the methods above, please use val
and min
instead of x
and s
.
import jdk.vm.ci.meta.Value; | ||
|
||
/** | ||
* This class holds a hash function at a specific cardinality and min value (lowest key). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain cardinality a bit more in this doc.
@dougxc I've just pushed changes to address your comments and add a new test that will fail if the set of hash functions changes and the test is not updated. |
Why did the performance drop so much after your changes? Was it executing incorrectly so it seemed faster than it really was? |
65c35d5
to
f176c28
Compare
@dougxc @tkrodriguez @christhalinger thank you for the reviews! I believe all review comments were addressed, please let me know if there's anything pending to merge this change. |
any updates on this? I'd like to create the second PR with a better heuristic to choose the switch strategy |
@dougxc @tkrodriguez since I've planned opening multiple PR for #893 and github doesn't support dependent pull requests, I'm currently blocked. Could you clarify how the merge process work so I can plan accordingly? I imagine it's not as simple as just hitting the merge button here on github. |
@tkrodriguez can you please integrate this PR. |
Sorry I hadn't responded earlier. It has to be run through out internal gate which does a much more extensive set of tests. I think there's also a question of whether this is a win overall even though it looks good on a microbenchmark. I'll run our benchmarks and see how it looks as well. So it requires a bit more attention from me and I've been juggling a few other things. Sorry for not communicating that. You can always create a new PR that includes these changes and then merge and rebase it away once it lands. Git makes that very easy. |
@tkrodriguez thanks for the update! Please keep in mind that this change is part of a larger plan detailed on #893, so unless the benchmarks have switches with sparse values, it probably won't improve performance. The hash strategy will be important to compile type switches using addresses, which are sparse values. |
Also, given the current heuristic to decide which compilation strategy to use, the hash switch is very rarely chosen. |
@@ -44,6 +44,8 @@ | |||
*/ | |||
public interface ArithmeticLIRGeneratorTool { | |||
|
|||
LIRGenerator getLIRGen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this override? It sort of defeats the purpose of the interface to expose this here. I think replacing ArithmeticLIRGeneratorTool with ArithmeticLIRGenerator in Hasher and HashFunction resolve the problem without requiring this override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! it does work, I've removed the override
The main concern is regressions caused by it. Improvements would be a welcome bonus at this early point. Can you rebase onto latest master? |
cfc55ee
to
31b045d
Compare
@tkrodriguez branch rebased |
* Tests optimization of hash table switches. | ||
* Code generated by `Switch03.TestGenerator.main` | ||
*/ | ||
public class Switch03 extends JTTTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a more descriptive name since this is specifically about the hash switch, like HashSwitchTest. Also is there any reason this is a JTTest? These are special variants of regular compiler tests that are exercised with their parameters bounds as constants in addition to a normal compile to test high level transforms. I don't think you care about that here. I think this should probably be over in org.graalvm.compiler.core.test. Or if you wanted it to be a more specific test of the LIR you could make it a LIRTest and check for the existence of a HashTableSwitchOp in the final LIR. Anyway, renaming is enough for now. You can make any improvements in a follow on PR.
keys[idx] = keyConstants[i]; | ||
targets[idx] = keyTargets[i]; | ||
} | ||
emitHashTableSwitch(hasher.get(), keys, defaultTarget, targets, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasher.get is just h isn't it?
Set<Integer> seen = new HashSet<>(keys.length); | ||
for (JavaConstant key : keys) { | ||
int hash = function.apply(key.asLong(), min) & (cardinality - 1); | ||
if (seen.contains(hash)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be simplified with the else removed?
if (seen.contains(hash)) { | |
if (!seen.add(hash)) { |
The benchmark results look ok. It doesn't trigger very often from what I can see. Should we add an option to force it's use? That could be in a follow on PR if you like. |
|
||
private static Value rotateRight(ArithmeticLIRGenerator gen, Value i, Value distance) { | ||
// (i >>> distance) | (i << -distance) | ||
return gen.emitOr(gen.emitUShr(i, distance), gen.emitShl(i, gen.emitNegate(distance))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD64ArithmeticLIRGenerator provides emitRor which would be a lot more efficient. Maybe provide a platform independent definition like above and use emitRor for AMD64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkrodriguez maybe I could add emitRor
to ArithmeticLIRGenerator
with the slow implementation as the default and then AMD64ArithmeticLIRGenerator
overrides it with a better impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I was thinking.
@tkrodriguez I've just addressed your last comments.
Yes, I have a branch with a new heuristic that will make the hash table switch more competitive. I'll create the PR once this change is merged. The binary strategy has a large code footprint, so the hash table strategy will be chosen in some cases where the binary one is chosen at the moment. |
Ok. I'm going to do some copyright data adjustments in a new commit and push it. |
} | ||
|
||
private Value toDWORD(Value v, ArithmeticLIRGenerator t) { | ||
if (v.getPlatformKind() != AMD64Kind.DWORD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of AMD64Kind in a shared file should be fixed in the next PR. I would in general expect the values to come into this function with the right sizing of either int or long with the caller ensuring that it was at least those types already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wait for the next PR? Best to fix this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix it. Should I create a new PR with the fix or are you going to reopen this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a new PR.
This is the first change related to #893.
Problem
The current switch strategies don't work well when the keys are sparse values since the density of the table switch is too low and Graal has to choose less efficient strategies.
Solution
Introduce a new switch strategy based on cheap hash functions. The
Hasher
evaluates the available functions and tries to find one that guarantees no conflicts for the provided keys. The approach uses imperfect hash functions based on the paper "Improving Switch Statement Performance with Hashing Optimized at Compile Time".HashTableSwitchOp
emits the assembly code for the hash table switch. If there's a default target, it also emits the original keys as part of the table and verifies the value to handle possible conflicts with unknown keys.I've done some benchmarking and, in some cases, the hash table strategy has similar performance to a regular table switch but with better compaction:
Even though the hash table strategy can achieve better compaction than the regular table switch for some cases, the current heuristic used to decide between table switches and other strategies isn't very robust and rarely chooses the hash switch. This issue will be addressed in another PR as described in #893
Notes
defaultTarget
is not null. I'm also not sure if the padding is ok. Suggestions are appreciated :)