Skip to content

GH-94108: Specialize for int & operation #94109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Jun 22, 2022

Benchmark:

./python -m pyperf timeit -s "a, b= 100, 200" "a & b" 

Result:

Benchmark base patch
int & 18.2 ns 16.8 ns: 1.08x faster

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@markshannon
Copy link
Member

Is int & int common enough to justify the extra instruction?

The number of left op right combinations is huge; we can't specialize for all of them.
Even if we tried to specialize for common builtin classes, we would omit potential specialization of third-party classes, like numpy arrays.

We have discussed having some sort of double-dispatching mechanism and cache, to allow fast dispatching of binary operators without an explosion of instructions, but I can't find the original discussion. Maybe @brandtbucher can find it?

The general idea is that we register a C function as an implementation of a binary operation for a specific pair of classes and operator. For example, we could register add_float_int(PyFloatObject *f, PyLongObject *i) for (float, int, "+") and have the VM dispatch to it efficiently.

We'd probably keep the specific specializations for the most common operations, like int + int, but other less common operations would use the more general instruction.

@markshannon
Copy link
Member

Looking at the stats shows the int & int is the most common specialization failure.
The question is, is it worth its own instruction, or is it worth waiting for a more general mechanism?

@kumaraditya303
Copy link
Contributor Author

Is int & int common enough to justify the extra instruction?

I created this PR based on stats and 33% (largest failure) seemed significant to me but if you don't think this is worth it, I have no objection if you reject this :).

@markshannon
Copy link
Member

I'm not rejecting this. I'm just not sure whether it is a good idea.
I think we should go with the more general mechanism long term, but that doesn't mean that we can't add this specialization until then.

I'm going to benchmark it first.

@markshannon
Copy link
Member

Pyperformance shows a small speedup, but close to noise.

@brandtbucher
Copy link
Member

We have discussed having some sort of double-dispatching mechanism and cache, to allow fast dispatching of binary operators without an explosion of instructions, but I can't find the original discussion. Maybe @brandtbucher can find it?

Here you go: faster-cpython/ideas#162

@brandtbucher
Copy link
Member

Pyperformance shows a small speedup, but close to noise.

Yeah, I made a pass at this a few months ago. Even adding in shift operations, I wasn't able to get above "1.00x faster".

I'm skeptical of adding lots of additional code here for negligible gain.

@markshannon
Copy link
Member

I'm inclined to reject this in favor of a table based approach: faster-cpython/ideas#162.

The table based approach won't give as much of a speedup per specialization, but will allow many more specializations for fewer new opcodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants