Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broadcasting to kernel generator #1797

Merged
merged 20 commits into from
Apr 2, 2020

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Mar 24, 2020

Summary

Adds broadcasting to kernel generator. If an expression is broadcasted in a dimmension it can be used as if it has any size in that dimmension when used in an operation that accepts more than one argument. The size is determined from other arguments to same expression.

Also contains a bugfix for determining view of the result.

Tests

Added tests for broadcasting and bug.

Side Effects

None.

Release notes

Added broadcasting to kernel generator.

Checklist

  • Math issue Implement OpenCL kernel generator #1342

  • Copyright holder: Tadej Ciglarič

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.93 4.84 1.02 1.86% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -3.92% slower
eight_schools/eight_schools.stan 0.09 0.09 1.03 2.78% faster
gp_regr/gp_regr.stan 0.22 0.22 1.01 1.46% faster
irt_2pl/irt_2pl.stan 6.47 6.46 1.0 0.17% faster
performance.compilation 88.6 86.77 1.02 2.07% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.52 7.53 1.0 -0.08% slower
pkpd/one_comp_mm_elim_abs.stan 20.14 20.24 0.99 -0.51% slower
sir/sir.stan 91.96 91.29 1.01 0.73% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 -0.13% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.96 1.0 -0.33% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 0.98 -2.54% slower
arK/arK.stan 1.77 1.74 1.02 1.72% faster
arma/arma.stan 0.67 0.67 0.99 -0.53% slower
garch/garch.stan 0.52 0.51 1.01 0.6% faster
Mean result: 1.00253158827

Jenkins Console Log
Blue Ocean
Commit hash: 9d46311


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@rok-cesnovar rok-cesnovar self-requested a review March 30, 2020 11:29
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Couple lil' comments and one API question

Comment on lines +68 to +69
inline kernel_parts generate(const std::string& i, const std::string& j,
const std::string& var_name_arg) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[seperate pr]

Reading this over wrt modify_argument_indices it's kind of confusing to read i and j here as the names of the index but in modify_argument_indices as the actual index values. It may be good here to change the argument names to something like i_name and j_name. Or maybe like row_idx_name and col_idx_name

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.86 4.87 1.0 -0.15% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.78% slower
eight_schools/eight_schools.stan 0.09 0.09 1.01 1.21% faster
gp_regr/gp_regr.stan 0.22 0.22 0.99 -0.87% slower
irt_2pl/irt_2pl.stan 6.48 6.45 1.01 0.59% faster
performance.compilation 88.23 87.14 1.01 1.23% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.52 7.51 1.0 0.06% faster
pkpd/one_comp_mm_elim_abs.stan 21.48 20.04 1.07 6.71% faster
sir/sir.stan 92.94 93.88 0.99 -1.02% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.98 -2.14% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.96 1.0 -0.17% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.01 1.41% faster
arK/arK.stan 1.89 1.74 1.09 8.03% faster
arma/arma.stan 0.66 0.65 1.01 1.42% faster
garch/garch.stan 0.51 0.51 1.0 -0.37% slower
Mean result: 1.01035490013

Jenkins Console Log
Blue Ocean
Commit hash: 78852c4


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder
SteveBronder previously approved these changes Apr 1, 2020
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One doc thing that would be good to change since I figure we'll forget later but I'll approve now. If you make the change I'll re-approve rq

}

/**
* Brodcast an expression in colwise dimmension. The argument must have single
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just find/replace Bordcast with Broadcast

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.83 4.85 0.99 -0.52% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 1.28% faster
eight_schools/eight_schools.stan 0.09 0.1 0.93 -7.58% slower
gp_regr/gp_regr.stan 0.22 0.22 0.99 -1.31% slower
irt_2pl/irt_2pl.stan 6.45 6.45 1.0 0.1% faster
performance.compilation 89.37 87.09 1.03 2.54% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.52 7.57 0.99 -0.59% slower
pkpd/one_comp_mm_elim_abs.stan 20.46 20.65 0.99 -0.93% slower
sir/sir.stan 90.95 93.57 0.97 -2.89% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 0.68% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.96 1.0 -0.49% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.34 0.92 -8.99% slower
arK/arK.stan 1.73 1.73 1.0 -0.01% slower
arma/arma.stan 0.66 0.66 0.99 -0.62% slower
garch/garch.stan 0.52 0.51 1.02 1.75% faster
Mean result: 0.989293267294

Jenkins Console Log
Blue Ocean
Commit hash: 4154ff4


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1 t4c1 merged commit b225daf into stan-dev:develop Apr 2, 2020
@t4c1 t4c1 deleted the cl_kernel_generator_broadcast branch July 10, 2020 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants