Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime OpenCL device selection, allow seed=0 and fix allow-undefined user_header bug #954

Merged
merged 11 commits into from
Dec 8, 2020

Conversation

rok-cesnovar
Copy link
Member

Summary:

Fixes one part of #825
Fixes #941
Fixes #953

The main part of this PR is allowing runtime selection of OpenCL devices.

./examples/bernoulli/bernoulli sample data file=examples/bernoulli/bernoulli.data.json opencl platform=0 device=0

Specifying only platform or only device IDs will print

Please set both device and platform OpenCL IDs.

Using opencl args when the model was not compiled with STAN_OPENCL will print

opencl is either mistyped or misplaced.
Re-compile the model with STAN_OPENCL to use OpenCL CmdStan arguments.
Failed to parse arguments, terminating Stan

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Rok Češnovar

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@@ -29,7 +29,7 @@ CXXFLAGS_PROGRAM += -include-pch $(STAN)src/stan/model/model_header$(STAN_FLAGS)
$(STAN_TARGETS) examples/bernoulli/bernoulli$(EXE) $(patsubst %.stan,%$(EXE),$(wildcard src/test/test-models/*.stan)) : %$(EXE) : $(STAN)src/stan/model/model_header$(STAN_FLAGS).hpp.gch
endif

ifneq ($(findstring allow_undefined,$(STANCFLAGS)),)
ifneq ($(findstring allow_undefined,$(STANCFLAGS))$(findstring allow-undefined,$(STANCFLAGS)),)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor fix for #953

@@ -25,7 +25,7 @@ class arg_seed : public int_argument {
.total_milliseconds();
}

bool is_valid(int value) { return value > 0 || value == _default_value; }
bool is_valid(int value) { return value >= 0 || value == _default_value; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file is a fix for #941

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.45 3.58 0.96 -3.68% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.22% slower
eight_schools/eight_schools.stan 0.12 0.12 1.0 -0.25% slower
gp_regr/gp_regr.stan 0.17 0.16 1.02 1.69% faster
irt_2pl/irt_2pl.stan 5.79 5.78 1.0 0.13% faster
performance.compilation 88.51 85.87 1.03 2.98% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.4 8.45 0.99 -0.52% slower
pkpd/one_comp_mm_elim_abs.stan 30.24 29.08 1.04 3.85% faster
sir/sir.stan 131.93 133.87 0.99 -1.47% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.53% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.93 2.97 0.99 -1.27% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.4 0.96 -3.97% slower
arK/arK.stan 1.77 1.77 1.0 -0.26% slower
arma/arma.stan 0.74 0.59 1.25 19.68% faster
garch/garch.stan 0.61 0.59 1.03 3.01% faster
Mean result: 1.01646923088

Jenkins Console Log
Blue Ocean
Commit hash: 9b291e7


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.44 3.47 0.99 -0.87% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.11% slower
eight_schools/eight_schools.stan 0.12 0.12 0.99 -0.66% slower
gp_regr/gp_regr.stan 0.16 0.16 1.0 -0.38% slower
irt_2pl/irt_2pl.stan 5.8 5.82 1.0 -0.35% slower
performance.compilation 88.18 85.9 1.03 2.59% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.43 8.56 0.99 -1.52% slower
pkpd/one_comp_mm_elim_abs.stan 29.83 31.28 0.95 -4.84% slower
sir/sir.stan 131.88 129.41 1.02 1.88% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.21% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.94 2.94 1.0 -0.16% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.39 0.98 -2.46% slower
arK/arK.stan 1.79 1.75 1.02 1.95% faster
arma/arma.stan 0.75 0.59 1.27 21.49% faster
garch/garch.stan 0.61 0.59 1.03 3.1% faster
Mean result: 1.01622961698

Jenkins Console Log
Blue Ocean
Commit hash: 54ae958


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Contributor

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@rok-cesnovar rok-cesnovar merged commit 164fc7e into develop Dec 8, 2020
@rok-cesnovar rok-cesnovar deleted the feature/825-runtime-opencl-device-selection branch December 8, 2020 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

detect if model has no parameters and run fixed param sampler automatically Why is seed = 0 not allowed
4 participants