Output behavior when only the module name is asked for #179

gaow · 2019-04-15T03:00:07Z

The test data-set is on: /project/mstephens/SuSiE/susie_z/output. The query is:

dscout = dscquery('r_compare_data_signal', targets = 'sim_gaussian.pve score_susie')

The expected behavior is to see score_susie.output.file column in the output. But we see

> head(dscout)
  DSC sim_gaussian.pve
1   1             0.05
2   1             0.05
3   1             0.05
4   1             0.05
5   1             0.05
6   1             0.05

Notice running dsc-query the output seems fine:

> dsc-query r_compare_data_signal -o test.csv --target "sim_gaussian.pve score_susie" --force
> head test.csv

DSC,sim_gaussian.pve,score_susie.output.file
1,0.05,score_susie/small_data_151_sim_gaussian_25_get_sumstats_2_susie_bhat_3_score_susie_2
1,0.05,score_susie/small_data_152_sim_gaussian_25_get_sumstats_2_susie_bhat_3_score_susie_2
1,0.05,score_susie/small_data_153_sim_gaussian_25_get_sumstats_2_susie_bhat_3_score_susie_2
1,0.05,score_susie/small_data_154_sim_gaussian_25_get_sumstats_2_susie_bhat_3_score_susie_2
...

so we might have done something problematic filtering it downstream.

The text was updated successfully, but these errors were encountered:

gaow · 2019-04-15T03:03:10Z

@pcarbo the above ticket was from Yuxin's slack DM. It seems we should set omit.filenames = FALSE by default, or at least give some warning here because filenames appear to be specifically asked for here? Not sure if there is a good motivation to set it to TRUE as of now -- one argument is with output in list format we should never need to access file names. But there are certainly cases when accessing file names is desirable.

pcarbo · 2019-04-15T12:07:51Z

@gaow What happens when you set omit.filenames = FALSE? The default is omit.filenames = TRUE.

gaow · 2019-04-15T13:00:03Z

@pcarbo it works with omit.filenames = FALSE. I'm arguing we should have it by default.

pcarbo · 2019-04-15T14:54:09Z

@gaow What do you mean by this:

Filenames appear to be specifically asked for here?

What makes you think Yuxin is requesting the file names?

gaow · 2019-04-15T14:59:15Z

What makes you think Yuxin is requesting the file names?

This is what I told her when she asked me in DM how one can access file names directly.

I said "filenames appear to be specifically asked for" because susie_score is not a module group. In this case filename is the only information we'll provide when no variable is specified.

However now I am wondering since we have the list-based output, should we just load everything into the output unless the output method is data.frame, in which case we'll return output file names instead?

pcarbo · 2019-04-15T15:03:52Z

However now I am wondering since we have the list-based output, should we just load everything into the output unless the output method is data.frame, in which case we'll return output file names instead?

@gaow This is the current behaviour.

gaow · 2019-04-15T15:12:41Z

This is the current behaviour.

Really? Then the example above would suggest a bug because the expected behavior would be to load everything from score_susie and return a list. So I do not think it is the current behavior. I'm talking about the case when no specific variable is asked for -- should we just load everything from that module into the output? We definitely have to rename omit.filenames to something else, eg something like omit.unspecified, and expect the following behavior:

omit.unspecified = F, return.type = 'list' will load everything in susie_score and return a list with an element called susie_score.output.data (please ignore the naming convention for now we can discuss that later) containing everything in that module
omit.unspecified = F, return.type = 'data.frame' will return a dataframe with a column called susie_score.output.file that contains file names.
omit.unspecified = T will ignore susie_score column regardless of return.type

Does it make sense?

pcarbo · 2019-04-15T15:23:47Z

Then the example above would suggest a bug because the expected behavior would be to load
everything from score_susie and return a list.

This would be the behaviour for return.type = "list".

But the default is return.type = "auto".

pcarbo · 2019-04-15T15:24:52Z

I'm guess I'm not sure what you mean by "everything".

pcarbo · 2019-04-15T15:27:52Z

@gaow Can you explain what would be your proposed default behaviour for these two examples?

out <- dscquery("mydsc",targets = "score")
out <- dscquery("mydsc",targets = "score.delta")

(Assuming "score" is a module group.)

gaow · 2019-04-15T15:33:36Z

For out <- dscquery("mydsc",targets = "score.delta") the behavior is the same for all my 1-3 above:

score score.delta
score_name  <some_number>

For out <- dscquery("mydsc",targets = "score") it will become something like:

score  score.output.data
score_name <something>

where <something> is

the result of readRDS("something.rds") when return.type=list or return.type=auto
just the corresponding file name when return type is data.frame
the 2nd column will not exist if omit.unspecified=T

1-3 in this post correspond to my proposal 1-3 above.

The point is to allow loading the entire data object when return.type is set to auto or list, just in case the users want to access something they did not specifically returned earlier as DSC module output but now want to extract them. Currently we do not provide a way to do it. And since this can potentially cause large objected loaded, we possibly want to set the default back to omit = T

pcarbo · 2019-04-15T15:36:04Z

@gaow I think that is potentially useful behaviour. On the other hand, I see some disadvantages (e.g., could use up a huge amount of memory if the outputs are large). I think we would need some more time to study usage before changing dscquery to accommodate this. In any case, this doesn't seem like high priority at this stage.

gaow · 2019-04-15T15:38:10Z

Sure let's keep the ticket open. I think an apparent way around the disadvantage is to set omit.unspecified to T again by default (I was editing my comment above added this point but was not quick enough to have you see it earlier :)

gaow changed the title ~~Output filename in DSC query when only the module name is asked for~~ Output behavior when only the module name is asked for Apr 15, 2019

gaow mentioned this issue Apr 17, 2019

Output only requested targets in dscquery and dsc-query #180

Closed

gaow closed this as completed Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output behavior when only the module name is asked for #179

Output behavior when only the module name is asked for #179

gaow commented Apr 15, 2019

gaow commented Apr 15, 2019 •

edited

Loading

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

pcarbo commented Apr 15, 2019

pcarbo commented Apr 15, 2019 •

edited

Loading

gaow commented Apr 15, 2019 •

edited

Loading

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

Output behavior when only the module name is asked for #179

Output behavior when only the module name is asked for #179

Comments

gaow commented Apr 15, 2019

gaow commented Apr 15, 2019 • edited Loading

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

pcarbo commented Apr 15, 2019

pcarbo commented Apr 15, 2019

pcarbo commented Apr 15, 2019 • edited Loading

gaow commented Apr 15, 2019 • edited Loading

pcarbo commented Apr 15, 2019

gaow commented Apr 15, 2019

gaow commented Apr 15, 2019 •

edited

Loading

pcarbo commented Apr 15, 2019 •

edited

Loading

gaow commented Apr 15, 2019 •

edited

Loading