Feature Request: Use `dplyr::left_join(relationship)` argument #2247

ddsjoberg · 2023-11-21T20:06:46Z

Feature Idea

Our merging functions are (very) fancy joins with many useful options. In dplyr 1.1.1, the dplyr::left_join(relationship) argument that allows users to specify the expected type of merge that will be performed.

For example, if you specify relationship = "one-to-one" the function will error if there is more than one match on the by variables.

I think it can be useful to our users to add this argument to our functions.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

The text was updated successfully, but these errors were encountered:

bundfussr · 2023-11-22T15:39:17Z

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

ddsjoberg · 2023-11-22T15:42:09Z

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

But not every join users do will be many to one. This would allow them to specify, for example, one-to-one when that is the situation they are in.

bundfussr · 2023-11-22T16:21:08Z

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

But not every join users do will be many to one. This would allow them to specify, for example, one-to-one when that is the situation they are in.

Yes, we could add an expect_one_to_one argument and call assert_one_to_one() if it is true. The assertion also provides the records which are violating the condition.
What should be the default?

ddsjoberg · 2023-11-22T16:22:35Z

I think the default should be NULL allowing users to specify it or not (like dplyr::left_join() has done)

ddsjoberg · 2023-11-22T16:28:53Z

The argument should be relationship = "many-to-one" or relationship = "one-to-one". This makes it very clear what is being done for any user that is already familiar with dplyr.

sophie-gem · 2024-03-26T08:45:08Z

Functions which use dplyr::*_join():

create_single_dose_dataset.R; row 578; create_single_dose_dataset()
derived_join.R; row 811; get_joined_data()
derive_locf_records.R; row 189; derive_locf_records()
derive_merged.R; row 423; derive_vars_merged()
derive_param_computed.R; row 320; derive_param_computed()
derive_param_computed.R; row 529; get_hori_data()
~~derive_param_tte.R; row 433; derive_param_tte()~~ - NOT NEEDED
~~derive_param_wbc_abs.R; row 142; derive_param_wbc_abs()~~ - NOT NEEDED
~~derive_vars_query.R; row 192; derive_vars_query()~~ - NOT NEEDED
derive_vars_transposed.R; row 99; derive_vars_transposed()

I believe these are all the functions that use one of dplyr::left_join() or dplyr::inner_join() or dplyr::full_join(). These are the ones I will look at updating the function call for. If there are any others, or if there are any that I don't need to do - please shout. :)

bms63 · 2024-03-26T18:57:30Z

Rock on @sophie-gem !!

sophie-gem · 2024-04-16T07:57:12Z

Hi @bms63, @ddsjoberg - if you get the chance, would you be able to review this change I've made in the above commit to the first function in the list? (Thinking if I ensure this is ok before I continue it will save some review time at the end...). Also, two questions:

Do we need to add an example to each function showing this in use?
Should a test be provided for this new argument?

bundfussr · 2024-04-16T09:15:30Z

Hi @bms63, @ddsjoberg - if you get the chance, would you be able to review this change I've made in the above commit to the first function in the list? (Thinking if I ensure this is ok before I continue it will save some review time at the end...). Also, two questions:

Do we need to add an example to each function showing this in use?

Should a test be provided for this new argument?

I think we don't need the relationship argument for create_single_dose_dataset() because the only valid value is "many-to-one". In the left_join() call we could set relationship = "many-to-one". However, this would result in an error message which is hard to understand for users. Thus either we would need to catch the error and provide a more user-friendly error message or we should check before if there are duplicates in lookup_table.

At the moment duplicates produce incorrect results. Consider for example:

> custom_lookup <- tribble(
+     ~Value,   ~DOSE_COUNT, ~DOSE_WINDOW, ~CONVERSION_FACTOR,
+     "Q30MIN", (1 / 30),    "MINUTE",                      1,
+     "Q30MIN", (1 / 30),    "MINUTE",                      1,
+     "Q90MIN", (1 / 90),    "MINUTE",                      1
+ )
> 
> data <- tribble(
+     ~USUBJID, ~EXDOSFRQ, ~ASTDT, ~ASTDTM, ~AENDT, ~AENDTM,
+     "P01", "Q30MIN", ymd("2021-01-01"), ymd_hms("2021-01-01T06:00:00"),
+     ymd("2021-01-01"), ymd_hms("2021-01-01T07:00:00"),
+     "P02", "Q90MIN", ymd("2021-01-01"), ymd_hms("2021-01-01T06:00:00"),
+     ymd("2021-01-01"), ymd_hms("2021-01-01T09:00:00")
+ )
> 
> create_single_dose_dataset(data,
+                            lookup_table = custom_lookup,
+                            lookup_column = Value,
+                            start_datetime = ASTDTM,
+                            end_datetime = AENDTM
+ )
# A tibble: 9 × 6
  USUBJID EXDOSFRQ ASTDT      ASTDTM              AENDT      AENDTM             
  <chr>   <chr>    <date>     <dttm>              <date>     <dttm>             
1 P01     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
2 P01     ONCE     2021-01-01 2021-01-01 06:30:00 2021-01-01 2021-01-01 06:30:00
3 P01     ONCE     2021-01-01 2021-01-01 07:00:00 2021-01-01 2021-01-01 07:00:00
4 P01     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
5 P01     ONCE     2021-01-01 2021-01-01 06:30:00 2021-01-01 2021-01-01 06:30:00
6 P01     ONCE     2021-01-01 2021-01-01 07:00:00 2021-01-01 2021-01-01 07:00:00
7 P02     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
8 P02     ONCE     2021-01-01 2021-01-01 07:30:00 2021-01-01 2021-01-01 07:30:00
9 P02     ONCE     2021-01-01 2021-01-01 09:00:00 2021-01-01 2021-01-01 09:00:00

…ment.

…Set `relationship = many-to-one`.

sophie-gem · 2024-05-08T07:37:18Z

Note to self: look at how to 'catch' error for create_single_dose_dataset()

bms63 · 2024-05-16T14:17:33Z

Hi @sophie-gem - we have a release on June 3rd. Do you think you can complete updates before then?

sophie-gem · 2024-05-16T14:19:34Z

I am planning on it! Will aim to have it complete by end of next week - is that ok?

bms63 · 2024-05-16T14:21:03Z

yes!

…s only the options allowed.

…uplicates in the `lookup_table` and create associated test.

sophie-gem · 2024-05-19T14:46:06Z

Hi all,

In the function documentation for get_hori_data() should analysis_value instead say set_values_to? Otherwise I'm not quite sure what analysis_value is referring to...

…ackage()`

…opposed to plain text.

bundfussr · 2024-05-21T08:31:47Z

Hi all,

In the function documentation for get_hori_data() should analysis_value instead say set_values_to? Otherwise I'm not quite sure what analysis_value is referring to...

Yes, that's a left-over from replacing analysis_value with set_values_to.

…feedback.

…to only allow one-to-one and many-to-one relationship values according to feedback.

Merge branch 'main' into 2247_left_join_relationship # Conflicts: # NEWS.md

…hip error in `derive_merged.R`, `derive_vars_transposed.R`.

…text.

…es with parent function argument names.

* #2247 - added `relationship` argument to `create_single_dose_dataset()`. * #2247 - Update `derive_vars_merged()` function with relationship argument. * #2247 - Update `create_single_dose_dataset()` according to feedback. Set `relationship = many-to-one`. * #2247 - add in assertion to check new `relationship` argument contains only the options allowed. * #2247 - Update `create_single_dose_dataset()` to error if there are duplicates in the `lookup_table` and create associated test. * #2247 - Update `derive_vars_transposed()` with relationship argument. * #2247 - Update NEWS.md, running `styler::style_pkg()`, `lintr::lint_package()` * #2247 - Update documentation to ensure URLs are inserted as links as opposed to plain text. * #2247 - Update to use `signal_duplicate_records()` as recommended by feedback. * #2247 - Revert unintended changes to `test-derive_var_atoxgr.R` * #2247 - Update `derive_vars_merged()` and `derive_vars_transposed()` to only allow one-to-one and many-to-one relationship values according to feedback. * #2247 - Update according to review comment. Catch the dplyr relationship error in `derive_merged.R`, `derive_vars_transposed.R`. * #2247 - Update documentation for `get_hori_data()` due to misaligned text. * #2247 - Running final devtools checks. * Update R/derive_merged.R Apply requested change. Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com> * #2247 - Update functions to take dplyr error and replace argument names with parent function argument names. * #2274 - Forgot to remove some dummy testing code! Have now removed. * #2247 - Re-run devtools checks. * docs: #2247 clarify arguments; tests: snapshot of my life * chore: #2247 lintr * tests: #2247 new snapshot * chore: #2247 that lint life --------- Co-authored-by: Ben Straub <ben.x.straub@gsk.com> Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>

ddsjoberg added enhancement New feature or request programming labels Nov 21, 2023

github-project-automation bot added this to admiral (sdtm/adam, dev, ci, template, core) Nov 21, 2023

bms63 moved this to Backlog in admiral (sdtm/adam, dev, ci, template, core) Nov 21, 2023

manciniedoardo changed the title ~~Feature Request: Use dplyr::left_join(relationship) argumnet~~ Feature Request: Use dplyr::left_join(relationship) argument Nov 23, 2023

manciniedoardo added the future Issue to be implemented after release label Nov 23, 2023

bms63 removed the future Issue to be implemented after release label Mar 14, 2024

manciniedoardo assigned sophie-gem Mar 20, 2024

sophie-gem added a commit that referenced this issue Apr 16, 2024

#2247 - added relationship argument to create_single_dose_dataset().

9c00840

sophie-gem added a commit that referenced this issue May 8, 2024

#2247 - Update derive_vars_merged() function with relationship argu…

a2745d3

…ment.

sophie-gem added a commit that referenced this issue May 8, 2024

#2247 - Update create_single_dose_dataset() according to feedback. …

6e81ac8

…Set `relationship = many-to-one`.

sophie-gem added a commit that referenced this issue May 19, 2024

#2247 - add in assertion to check new relationship argument contain…

3f8fac9

…s only the options allowed.

sophie-gem added a commit that referenced this issue May 19, 2024

#2247 - Update create_single_dose_dataset() to error if there are d…

d2a0383

…uplicates in the `lookup_table` and create associated test.

sophie-gem added a commit that referenced this issue May 19, 2024

#2247 - Update derive_vars_transposed() with relationship argument.

9197e43

sophie-gem linked a pull request May 19, 2024 that will close this issue

Closes #2247 left join relationship #2433

Merged

15 tasks

sophie-gem added a commit that referenced this issue May 19, 2024

#2247 - Update NEWS.md, running styler::style_pkg(), `lintr::lint_p…

58d4724

…ackage()`

sophie-gem added a commit that referenced this issue May 20, 2024

#2247 - Update documentation to ensure URLs are inserted as links as …

40ca945

…opposed to plain text.

sophie-gem added a commit that referenced this issue May 26, 2024

#2247 - Update to use signal_duplicate_records() as recommended by …

910cee1

…feedback.

sophie-gem added a commit that referenced this issue May 26, 2024

#2247 - Revert unintended changes to test-derive_var_atoxgr.R

fa263ae

sophie-gem added a commit that referenced this issue May 26, 2024

#2247 - Update derive_vars_merged() and derive_vars_transposed() …

8fd426e

…to only allow one-to-one and many-to-one relationship values according to feedback.

sophie-gem added a commit that referenced this issue May 26, 2024

#2247 - merge with main to resolve conflicts.

3aed898

Merge branch 'main' into 2247_left_join_relationship # Conflicts: # NEWS.md

sophie-gem added a commit that referenced this issue May 27, 2024

#2247 - Update according to review comment. Catch the dplyr relations…

8a63303

…hip error in `derive_merged.R`, `derive_vars_transposed.R`.

sophie-gem added a commit that referenced this issue May 27, 2024

#2247 - Update documentation for get_hori_data() due to misaligned …

c89a1c7

…text.

sophie-gem added a commit that referenced this issue May 27, 2024

#2247 - Running final devtools checks.

826b373

sophie-gem added a commit that referenced this issue May 31, 2024

#2247 - Update functions to take dplyr error and replace argument nam…

935978b

…es with parent function argument names.

sophie-gem added a commit that referenced this issue May 31, 2024

#2247 - Re-run devtools checks.

1074922

bms63 added a commit that referenced this issue May 31, 2024

docs: #2247 clarify arguments; tests: snapshot of my life

4873b9b

bms63 added a commit that referenced this issue Jun 3, 2024

chore: #2247 lintr

57c524f

bms63 added a commit that referenced this issue Jun 3, 2024

tests: #2247 new snapshot

cc52d51

bms63 added a commit that referenced this issue Jun 3, 2024

chore: #2247 that lint life

5cb86d9

bms63 closed this as completed in #2433 Jun 3, 2024

github-project-automation bot moved this from Backlog to Archive in admiral (sdtm/adam, dev, ci, template, core) Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Use `dplyr::left_join(relationship)` argument #2247

Feature Request: Use `dplyr::left_join(relationship)` argument #2247

ddsjoberg commented Nov 21, 2023

bundfussr commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

bundfussr commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

sophie-gem commented Mar 26, 2024 •

edited

Loading

bms63 commented Mar 26, 2024

sophie-gem commented Apr 16, 2024

bundfussr commented Apr 16, 2024

sophie-gem commented May 8, 2024

bms63 commented May 16, 2024

sophie-gem commented May 16, 2024

bms63 commented May 16, 2024

sophie-gem commented May 19, 2024 •

edited

Loading

bundfussr commented May 21, 2024

Feature Request: Use dplyr::left_join(relationship) argument #2247

Feature Request: Use dplyr::left_join(relationship) argument #2247

Comments

ddsjoberg commented Nov 21, 2023

Feature Idea

Relevant Input

Relevant Output

Reproducible Example/Pseudo Code

bundfussr commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

bundfussr commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

ddsjoberg commented Nov 22, 2023

sophie-gem commented Mar 26, 2024 • edited Loading

bms63 commented Mar 26, 2024

sophie-gem commented Apr 16, 2024

bundfussr commented Apr 16, 2024

sophie-gem commented May 8, 2024

bms63 commented May 16, 2024

sophie-gem commented May 16, 2024

bms63 commented May 16, 2024

sophie-gem commented May 19, 2024 • edited Loading

bundfussr commented May 21, 2024

Feature Request: Use `dplyr::left_join(relationship)` argument #2247

Feature Request: Use `dplyr::left_join(relationship)` argument #2247

sophie-gem commented Mar 26, 2024 •

edited

Loading

sophie-gem commented May 19, 2024 •

edited

Loading