`Feature/1180 pathfinder single path csvs #1184

mitzimorris · 2023-08-17T20:28:45Z

Submisison Checklist

Run tests: ./runCmdStanTests.py src/test
Declare copyright holder and open-source license: see below

Summary:

Added flag save_single_path to method pathfinder, per discussion in #1180

Intended Effect:

Better user experience.

How to Verify:

Unit tests

Side Effects:

N/A

Documentation:

see PR stan-dev/docs#651

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

src/cmdstan/command.hpp

src/cmdstan/command_helper.hpp

WardBrian · 2023-08-17T20:46:41Z

Do we feel like this should interact with the diagnostic files at all? I know at the moment there are no multipath-pathfinder specific diagnostics in reality, so it might be odd to require the save_single_paths argument to get any diagnostics out

mitzimorris · 2023-08-17T20:53:40Z

it might be odd to require the save_single_paths argument to get any diagnostics out

but this isn't required - save_single_paths outputs CSV draws from the individual pathfinders.
the diagnostics files report on the Pathfinder trajectory via LBFGS.

WardBrian · 2023-08-17T21:00:10Z

I understand the current behavior, I'm asking if we think it should have an impact

Imagine one day we actually use the argument for multi-pathfinder to have diagnostics (independent of the single path diagnostics). Then it would seem natural that this argument also controlled whether the single-path diagnostics were saved.

mitzimorris · 2023-08-17T21:05:27Z

I see - so if you want the diagnostics file then you'd probably want the single-path pathfinders as well?
which means that we don't need "save_single_paths" - diagnostic_file arg controls everything?
and then you'd have both .json and .csv files which have the same basename?

very elegant!

WardBrian · 2023-08-17T21:06:49Z

I was thinking of more or less the flip of that - what should this argument do if there are multiple kinds of diagnostics (like there are multiple kinds of output csvs at the moment)

mitzimorris · 2023-08-17T21:18:36Z

add more arguments to pathfinder method that are fine-grained enough to handle this and forget about output arg diagnostics_file?

WardBrian · 2023-08-17T21:43:18Z

I think I was unclear.

This PR was motivated because the current “output” argument controls both the output of the PSIS draws and the individual paths. My point is just that one day, if we actually use the diagnostic file argument to the multi-path algorithm, we will have the same thing with the “diagnostic file” argument - some for the PSIS, some for single paths. Whether we care is another question, and the answer can be “no”

mitzimorris · 2023-08-18T13:22:44Z

if I understand correctly proposal is this:

diagnostic_file - is an output sub-argument which is reserved for multi-path Pathfinder.
"save_single_paths" - for method pathfinder is a boolean arg which, if true, will create both the CSV file of draws and the json file of ELBO evaluations.

and for the user docs, we should explain when save_single_paths is useful - I think this is intended as a tool to debug Pathfinder behavior on challenging or ill-specified models.

WardBrian · 2023-08-18T13:34:24Z

Currently, this PR implements:

-	`output=foo`	`diagnostic_file=bar`
`save_single_paths=0`	One file, `foo.csv`, containing PSIS draws	Several `bar_N.json` files, one for each path, containing single-pathfinder diagnostics
`save_single_paths=1`	The same `foo.csv` file, plus individual `foo_path_N.csv` files, one for each path	Several `bar_N.json` files, one for each path, containing single-pathfinder diagnostics

My question was, should the matrix look instead like:

-	`output=foo`	`diagnostic_file=bar`
`save_single_paths=0`	One file, `foo.csv`, containing PSIS draws	One file, `bar.json`, containing the diagnostics for multi-path pathfinder (currently, this is completely unused).
`save_single_paths=1`	The same `foo.csv` file, plus individual `foo_path_N.csv` files, one for each path	The same `bar.json` file, plus individual `bar_path_N.json` files containing the diagnostics for each single-path

As noted, there are currently no diagnostics for multi-path besides each individual path's diagnostics, but if we ever think there would be then maybe the second table makes more sense. Of course, if someone is asking for diagnostics, maybe it's fine just to give them more than they asked for. It's a much less common use case than output, at any rate

mitzimorris · 2023-08-18T13:42:27Z

two things:

creating an empty multi-path pathfinder diagnostics file is confusing
having to specify diagnostic file basename could be skipped in favor of adding distinguishing tags for output file basename.

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs :wq# Please enter a commit message to explain why this merge is necessary,

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

mitzimorris · 2023-08-19T16:30:15Z

per discussion w/ @WardBrian and @SteveBronder:

output sub-arg diagnostic_file will be used for outputs from multi-path Pathfinder. currently, dummy json writer passed in to service method call.
pathfinder boolean arg save_single_paths, when true, will save both single-path sample as well as single-path diagnostics which report on the ELBO evaluations. these files names will constructed from the output_file basename, plus _path + path id.

Question: what should single path pathfinder output file be named if num_paths= 1? what about num_paths=1 and save_single_paths=1? Choices for output filenames:

a. basename_path_1.csv and basename_path_1.json
b. basename.csv and basename.json

really, no preference here - option a) is perfectly reasonable and it's OK if non-default options result in complicated names. opinions? @bob-carpenter ?

WardBrian · 2023-08-19T17:34:11Z

I think if num paths is 1 it should just be “output.csv”, and the diagnostic should probably come from the diagnostic file argument

mitzimorris · 2023-08-19T19:27:30Z

how about this:

if num_paths=1 and save_single_paths=1 and arg diagnostic_file is unspecified, then single-path diagnostics are in file output.json.

if num_paths=1 and diagnostic_file="foo.json", then single-path diagnostics are in file foo.json.

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

mitzimorris · 2023-08-20T18:07:21Z

this is ready for re-review.

these are the relevant combinations of arguments / output files

mymodel pathfinder - single output file output.csv which contains PSIS draws over 4 single-path pathfinders
mymodel pathfinder num_paths=1 - single output file output.csv contains single-path pathfinder draws
mymodel pathfinder save_single_paths=1 - 9 output files:
- output.csv, as above
- output_path_1.csv ... output_path_4.csv - single-path pathfinder draws
- output_path_1.json ... output_path_1.json - single-path ELBO iterations
mymodel pathfinder num_paths=1 save_single_paths=1 - 2 output files: output.csv contains single-path pathfinder draws and output.json which contains infor from each ELBO iteration.
mymodel pathfinder num_paths=1 save_single_paths=1 output diagnostic_file="diag" - 2 output files: output.csv and diag.json.

mitzimorris · 2023-08-20T22:46:30Z

CI failed on upstream/(downstream?) performance tests - one of the benchmarks gold files is missing?
@WardBrian @serban-nicusor-toptal


Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: gp_exp_quad_cov: sigma is 0, but must be positive! (in '../stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan', line 14, column 4 to line 15, column 59)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:  100 / 2000 [  5%]  (Warmup)
Iteration:  200 / 2000 [ 10%]  (Warmup)
Iteration:  300 / 2000 [ 15%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  500 / 2000 [ 25%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  700 / 2000 [ 35%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration:  900 / 2000 [ 45%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)
Iteration: 1001 / 2000 [ 50%]  (Sampling)
Iteration: 1100 / 2000 [ 55%]  (Sampling)
Iteration: 1200 / 2000 [ 60%]  (Sampling)
Iteration: 1300 / 2000 [ 65%]  (Sampling)
Iteration: 1400 / 2000 [ 70%]  (Sampling)
Iteration: 1500 / 2000 [ 75%]  (Sampling)
Iteration: 1600 / 2000 [ 80%]  (Sampling)
Iteration: 1700 / 2000 [ 85%]  (Sampling)
Iteration: 1800 / 2000 [ 90%]  (Sampling)
Iteration: 1900 / 2000 [ 95%]  (Sampling)
Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 1.795 seconds (Warm-up)
               2.345 seconds (Sampling)
               4.14 seconds (Total)

Traceback (most recent call last):
  File "./runPerformanceTests.py", line 424, in <module>
    results = list(results)
  File "./runPerformanceTests.py", line 345, in process_test_wrapper
    time_, (fails, errors) = run(exe, data, overwrite, check_golds,
  File "./runPerformanceTests.py", line 275, in run
    summary = csv_summary(tmp)
  File "./runPerformanceTests.py", line 163, in csv_summary
    with open(csv_file, 'r', encoding = 'utf-8') as raw:
FileNotFoundError: [Errno 2] No such file or directory: 'golds/stat_comp_benchmarks_benchmarks_gp_pois_regr_gp_pois_regr.gold.tmp'
script returned exit code 1

WardBrian · 2023-08-20T23:57:26Z

It’s because the performance tests request output files that end in .tmp:
https://github.com/stan-dev/performance-tests-cmdstan/blob/ffff830fbd6d5e09dbee67dad336e957e9e19b2a/runPerformanceTests.py#L268

But the latest changes here (to make_filenames specifically) always overwrite that to be .csv

src/cmdstan/command.hpp

src/cmdstan/command_helper.hpp

mitzimorris added 2 commits August 15, 2023 22:17

added arg; redo init writers logic

f6a4110

unit tests

55edf12

mitzimorris requested a review from WardBrian August 17, 2023 20:28

yashikno and others added 2 commits August 17, 2023 16:31

Merge commit 'b15c02fad0360552e005dea266302e0407daba59' into HEAD

770b9ef

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

3d22daf

WardBrian requested changes Aug 17, 2023

View reviewed changes

src/cmdstan/command.hpp Outdated Show resolved Hide resolved

src/cmdstan/command_helper.hpp Outdated Show resolved Hide resolved

src/cmdstan/command_helper.hpp Outdated Show resolved Hide resolved

src/cmdstan/command_helper.hpp Outdated Show resolved Hide resolved

mitzimorris added 2 commits August 18, 2023 10:07

Merge branch 'feature/1180-pathfinder-single-path-csvs' of https://gi…

df42b09

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs :wq# Please enter a commit message to explain why this merge is necessary,

Merge branch 'feature/1180-pathfinder-single-path-csvs' of https://gi…

f72ef40

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

WardBrian mentioned this pull request Aug 18, 2023

Release CmdStan 2.33 #1181

Closed

27 tasks

mitzimorris and others added 3 commits August 18, 2023 17:49

checkpointing

f4a93be

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

e0d6f37

Merge branch 'feature/1180-pathfinder-single-path-csvs' of https://gi…

4abe484

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

mitzimorris and others added 5 commits August 19, 2023 17:48

checkpointing

fc279a4

checkpointing

76610fe

Merge commit 'b3fe97e202abeed6f4146829f71bdddd83c1afca' into HEAD

0c79479

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

e0fde95

Merge branch 'feature/1180-pathfinder-single-path-csvs' of https://gi…

ddf8f10

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

mitzimorris added 2 commits August 20, 2023 11:46

Merge branch 'feature/1180-pathfinder-single-path-csvs' of https://gi…

5b54d70

…thub.com/stan-dev/cmdstan into feature/1180-pathfinder-single-path-csvs

updated logic and unit tests

002c8f5

tweak pathfinder args ordering

3095496

mitzimorris mentioned this pull request Aug 20, 2023

Feature/648 pathfinder docs stan-dev/docs#651

Merged

3 tasks

WardBrian reviewed Aug 21, 2023

View reviewed changes

src/cmdstan/command.hpp Outdated Show resolved Hide resolved

src/cmdstan/command_helper.hpp Outdated Show resolved Hide resolved

mitzimorris and others added 3 commits August 21, 2023 13:45

changes per code review

b4be8cc

changes per code review

71e54dc

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

4e647f0

WardBrian approved these changes Aug 21, 2023

View reviewed changes

WardBrian merged commit 763beb9 into develop Aug 21, 2023

WardBrian deleted the feature/1180-pathfinder-single-path-csvs branch August 21, 2023 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Feature/1180 pathfinder single path csvs #1184

`Feature/1180 pathfinder single path csvs #1184

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 18, 2023 •

edited

Loading

WardBrian commented Aug 18, 2023

mitzimorris commented Aug 18, 2023

mitzimorris commented Aug 19, 2023 •

edited

Loading

WardBrian commented Aug 19, 2023

mitzimorris commented Aug 19, 2023

mitzimorris commented Aug 20, 2023 •

edited

Loading

mitzimorris commented Aug 20, 2023

WardBrian commented Aug 20, 2023

`Feature/1180 pathfinder single path csvs #1184

`Feature/1180 pathfinder single path csvs #1184

Conversation

mitzimorris commented Aug 17, 2023

Submisison Checklist

Summary:

Intended Effect:

How to Verify:

Side Effects:

Documentation:

Copyright and Licensing

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 17, 2023

WardBrian commented Aug 17, 2023

mitzimorris commented Aug 18, 2023 • edited Loading

WardBrian commented Aug 18, 2023

mitzimorris commented Aug 18, 2023

mitzimorris commented Aug 19, 2023 • edited Loading

WardBrian commented Aug 19, 2023

mitzimorris commented Aug 19, 2023

mitzimorris commented Aug 20, 2023 • edited Loading

mitzimorris commented Aug 20, 2023

WardBrian commented Aug 20, 2023

mitzimorris commented Aug 18, 2023 •

edited

Loading

mitzimorris commented Aug 19, 2023 •

edited

Loading

mitzimorris commented Aug 20, 2023 •

edited

Loading