-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pow function and SLEEF #1650
Comments
Hi @bnprks , thank you for reaching out. This sounds very interesting and we are very happy to collaborate. Numpy is also interested in accurate and robust implentations. I'd suggest favoring both high precision, and correctly handling edge cases. For pow, their relative cost would be low, and we can provide some compile-time template argument or If scalar is easier to translate for you, sure, why not. Do you see any potential concerns or tradeoffs there? (I know the many suffixes in their vector code are hard to read.) There is currently no active effort to extend our math functions. As mentioned, it may be interesting for numpy in future. Happy to discuss further. FYI I am out of office next week. |
Thanks for the reply! I've been doing some exploratory work looking into the Pow function and the potential for a larger automated translation of the SLEEF math functions. What I've done so far is:
Translation seems tricky but possible, and I am excited about the prospect of having Sleef's high quality math implementations available in Highway.
I've been looking both at the simd and scalar sources some, and I think this is a key discussion point for adapting the sleef algorithms, especially if an automated translation might be desired. Here's my take:
After more review, I think I will try to translate starting from the simd source, due to the presence of simd-specific efficiency tricks. My approach will be to make semantic translations between sleef's simd abstraction layer into highway's functions which will capture some but not all of the simd optimizations in sleef.
Assuming this approach continues to make sense to you, I'll start by trying to develop a semi-automated translation system to translate the Pow function, then see from there if it seems possible to expand to the rest of the sleef math functions. |
Great to hear your progress. This sounds very exciting indeed! Interesting that SLEEF does not check/fix the return value of _mm512_cvttps_epi32. I'd consider its 0x80..00 to be less useful than INT*_MAX. The cost of potential bugs/misbehavior seems much higher to me than a few extra instructions in already expensive math routines. Please don't hesitate to let us know if you'd like to use additional intrinsics. We are happy to add ops wherever something reasonable can be done for other platforms.
Sounds great! |
Hi @bnprks , I am curious how it's going with the Pow function? |
I haven't done much with this lately, so I'm still at the stage of viable-seeming strategy but definitely not something working. I solved my immediate use-case by disabling SIMD for Pow (wasn't that critical), but I might take a second look at this since I think the sleef translation is an interesting and useful problem. Judging by my slow progress to date, I wouldn't expect much to come of this soon. If you're interested in cleaning up outstanding issues we can close this and I'll reopen if I get around to porting a full Pow implementation. If you or someone else wants to tackle this instead definitely don't wait for me. I'd be happy to share notes + the code snippets I was working on for parsing + transforming the sleef source code if desired. |
Thanks for the update. I agree this would be useful and interesting. No worries/rush, we can keep this open :) |
Hi @jan-wassenberg, I've finally gotten time to take a proper look at this over the past few weeks, and have made some good initial results! I've started off by making a separate repository to handle the translation code, results, and testing which is available at bnprks/highway-sleef. Most of the details are in the README there, but for a quick summary: Current status:
Next steps:
Feedback areas from you
I've included a lot of accuracy + performance results in the repo, but my high-level summary would be that the SLEEF functions tend to provide more accurate options than Hwy at the cost of slower execution speed. When similar-precision options are available, SLEEF and Hwy are within ~25% speed of each other, but high-precision SLEEF options can become much slower (3x baseline, but some outliers at 10-30x slower). |
Wow, looks like the automation has paid off in that you've been able to port a long list of functions. Nice work!
Sounds good. We can use AdjustedReps() to reduce the number of iterations in debug builds, then it should be fine.
Testing on partial vectors can expose some bugs. I'm not sure we have to run for all powers of two, but
I'd trust your benchmarks and suggest the general principle that we replace any existing functions with something more exact, and no more than 10% slower. For example Exp(). For the higher-precision ones, which some users have indeed requested regardless of cost, we can put them perhaps in another file and with a name indicating the tradeoff. On licensing, are you in contact with Prof. Shibata? If they could re-license to BSD3 or Apache2, that would be very helpful. |
Thanks! Aside from special cases like SinCos returning two arguments, the incremental work is basically 1 line of config per Sleef function that needs translating (including SIMD ops and intermediate functions) I'll try getting a faster testing setup established, though testing platforms though personally I'll probably only be able to check Intel AVX2 and below and ARM NEON with the computers I have access to. I might look into qemu testing later.
I have not been in contact with Prof. Shibata. Do you think a github issue on sleef or an email to Prof. Shibata would be best? It looks like the files used for translation have ~10 distinct contributors, though most besides Prof. Shibata's to do not touch the actual function implementations. Prof. Shibata has the vast majority of contributions, and the most active current maintainer on github appears to be Pierre Blanchard working at ARM in Manchester, UK. EDIT: To add one question -- what would your goals/constraints be on licensing? Even if Sleef added e.g. BSD3 licensing, that would still presumably result in the derived math functions requiring special licensing treatment compared to the rest of Highway being dual-licensed. |
Nice and efficient :)
Sure, that's fine.
I figure email is better if you have it. On licensing, it would be nice if the ported file at least matches one of the existing Highway licenses, so that users of that file would just choose that one. Does that make sense? |
Since it's been a week thought I'd post a quick update -- I'm doing a bit more work on expanding the translations, particularly starting to tackle double-precision functions and checking non-AVX2 instruction sets. A couple new hiccups needed to be addressed, but so far nothing that seems like a real blocker. When I'm satisfied my translations are correct and cover most of Sleef's functions then we can discuss more concretely how to upstream functionality (and which parts). I've copied a few more licensing thoughts below, but probably nothing too useful to discuss unless right now unless you happen to have knowledge of Google's legal policies regarding the Boost license used by Sleef (namely what code counts as a "derivative work" of another author's code) Licensing thoughtsI had one additional idea, which would be just copying the full boost license into the translated header file. Boost doesn't require providing attribution with compiled binaries, so I think as long as downstream users don't delete the license/copyright comment from the source code there wouldn't be additional obligations on users of the highway library. I will admit that I'm not sure what would qualify something as becoming a "derivative work" of Boost-licensed software. If "derivative work" just covers the translated code isolated to a single file, then this solution would seem quite plausible. If "derivative work" would eventually cover large chunks of the highway library that could be problematic. (I am not a copyright lawyer, and am only familiar with US law) For the feasibility of getting a version of Sleef under BSD or Apache licensing, my main worry is the logistical difficulties of getting permission from all the contributors to relicense which might be a large ask. With permission from just Prof. Shibata, I am not sure from a legal perspective which code could be used without requiring input from others, given that Sleef is licensed under Boost but with copyright maintained by all the original contributors. Would it be code where each line has been reverted to the most recent commit from Prof. Shibata, or would it be a 2017 copy of Sleef before the first other contributors show up in the commit history? |
Thanks for sharing. |
Hi, I'm considering implementing Pow function pull request for highway and wanted to check scope/goals before jumping in too much further.
My rough plan would be:
I know a mechanical translation of SLEEF's functions has been discussed in this numpy issue, but I would be proposing a manual translation here, though sticking to the high-level outline of the SLEEF code.
Questions:
3^-Inf = 0
,0.4^Inf = 0
, and-2^3.5=NaN
. Should all those be handled as listed, or should some return garbage in order to eliminate a few checks?The text was updated successfully, but these errors were encountered: