Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

Closed
wants to merge 3 commits into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Apr 20, 2013

This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully.

@pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in base.rs in trans_closure, but without adding the set_no_inline (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect.

I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved.

Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million):

$ ./intrinsics-perf.sh f32
func   new   old   speedup
sin    0.80  2.75  3.44
cos    0.80  2.76  3.45
sqrt   0.56  2.73  4.88
ln     1.01  2.94  2.91
log10  0.97  2.90  2.99
log2   1.01  2.95  2.92
exp    0.90  2.85  3.17
exp2   0.92  2.87  3.12
pow    6.95  8.57  1.23

   geometric mean: 2.97

$ ./intrinsics-perf.sh f64
func   new   old   speedup
sin    12.08  14.06  1.16
cos    12.04  13.67  1.14
sqrt   0.49  2.73  5.57
ln     4.11  5.59  1.36
log10  5.09  6.54  1.28
log2   2.78  5.10  1.83
exp    2.00  3.97  1.99
exp2   1.71  3.71  2.17
pow    5.90  7.51  1.27

   geometric mean: 1.72

So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup.

(fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)

Achieves at least 5x speed up for some functions!

Also, reorganise the delegation code so that the delegated function wrappers
have the #[inline(always)] annotation, and reduce the repetition of
delegate!(..).
@huonw
Copy link
Member Author

huonw commented Apr 20, 2013

@JensNockert and @brson were involved in the original discussion and are possibly interested in this.

@auroranockert
Copy link
Contributor

Cool!

@pcwalton
Copy link
Contributor

The set_no_inline should not be necessary for any functions, actually. It was a relic of an earlier patch; the only reason it's in is that I haven't fully tested the consequences of removing it and I wanted to land the patch in its current state.

bors added a commit that referenced this pull request Apr 20, 2013
…lton

This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully.

@pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect.

I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved.

Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million):

```
$ ./intrinsics-perf.sh f32
func   new   old   speedup
sin    0.80  2.75  3.44
cos    0.80  2.76  3.45
sqrt   0.56  2.73  4.88
ln     1.01  2.94  2.91
log10  0.97  2.90  2.99
log2   1.01  2.95  2.92
exp    0.90  2.85  3.17
exp2   0.92  2.87  3.12
pow    6.95  8.57  1.23

   geometric mean: 2.97

$ ./intrinsics-perf.sh f64
func   new   old   speedup
sin    12.08  14.06  1.16
cos    12.04  13.67  1.14
sqrt   0.49  2.73  5.57
ln     4.11  5.59  1.36
log10  5.09  6.54  1.28
log2   2.78  5.10  1.83
exp    2.00  3.97  1.99
exp2   1.71  3.71  2.17
pow    5.90  7.51  1.27

   geometric mean: 1.72
```

So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup.

(fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)
@bors bors closed this Apr 20, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 10, 2020
default_trait_access: Fix wrong suggestion

rust-lang/rust-clippy#5975 (comment)
> I think the underlying problem is clippy suggests code with complete parameters, not clippy triggers this lint even for complex types. AFAIK, If code compiles with `Default::default`, it doesn't need to specify any parameters, as type inference is working. (So, in this case, `default_trait_access` should suggest `RefCell::default`.)

Fixes rust-lang#5975 Fixes rust-lang#5990

changelog: `default_trait_access`: fixed wrong suggestion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants