Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capr Issue #99: Potential improvements to Capr #105

Open
wants to merge 155 commits into
base: develop
Choose a base branch
from

Conversation

guuswilmink
Copy link

This PR is linked to #99.

Suggested functions:
isStandard.R: Given a path to CSV files with at least fields "sourceCode" and "conceptId", along with a database connection with a CDM and vocabulary schema, return a tibble of non-standard concepts. Allows for export of results for all concepts included (including standard). This function can help in quickly identifying non-standard concepts from tables of concept ids.

isStandardDB.R: Checks whether concepts that exist in a database are standard/non-standard and returns the non-standard concepts. The full table of standard and non-standard concepts can be saved as well. This is probably the most useful out of the 3 isStandard functions.

isStandardCS.R: Similar return as isStandard.R and isStandardDB, but performs the check given a Capr ConceptSet class object rather than CSV files. Does not capture source value (as these are not relevant for concept sets).

countOccurrences.R: Given a vector of concept IDs and a connection to a CDM instance, count the number of occurrences of: 1) persons with concept 'x'; 2) records with concept 'x'; 3) persons with concept 'x' or descendants of 'x'; 4) records with concept 'x' or descendants of 'x'
To demonstrate and further clarify this function I have attached a PDF with example usage. Is this already possible within Capr?

@mdlavallee92 I have incorporated your feedback provided in issue #99; please let me know if this is to your expectations.

Copy link
Collaborator

@mdlavallee92 mdlavallee92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @guuswilmink! Many many many apologies for being entirely too late on this review. This is a really nice add, thank you! Would you be interested in other maintenance?

My main two hold ups are the on the signatures of the functions and consolidating export files.

  1. Change them to camelCase as that is part of the HADES style gudie...ie cdmDatabaseSchema, vocabDatabaseSchema. Make v conceptIds. v is way to generic.
  2. consolidate the isStandard functions to 1 file and dont export the links function. That is a stable internal. See example of the same idea here.

I need a bit more examples of how to use the isStandard functions. I believe their utility but not totally seeing its use in the day to day of the package. Do you mind doing a vignette or pointing me to some examples of how you are using this? I love the countOccurrences function. Think that is a handy tool!

#' )
#'
#' @export
countOccurrences <- function(v, tables, links, db_connection, cdm_schema, vocab_schema, save_path = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on the signature...

v is to generic of a signature...just use conceptIds as the input name if that is what you want the user to input.

signatures should be in camelCase ie conceptIds, dbConnection, cdmDatabaseSchema. this is part of the HADES style guide

Does the links table ever change? If not just make this a non-exported function called within the function.

for (table in tables) {
concept_id_field <- links[[table]][1]

# Combined SQL query for direct and descendant counts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to have the sql sourced as an inst file instead of within the code. it ends up being a bit neater. No change required

#' db_connection = db_connection, cdm_schema = "public", vocab_schema = "vocabulary"
#' )
#'
#' @export
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I like this function a lot!

#' @importFrom readr read_csv write_csv
#' @importFrom dplyr mutate across filter select inner_join
#' @importFrom DatabaseConnector connect disconnect querySql
#' @export
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here change to camelCase.

}

for (table_path in tables) {
table_name <- basename(table_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change required. Suggestion is to leverage internal function. it makes the code easier to read and reference. For example, the for loop in L56 seems to be a constant pattern. Functionalize this as like prepAndReadTable function that you leverage here.

#' @importFrom readr read_csv write_csv
#' @importFrom dplyr mutate filter rename bind_rows
#' @importFrom DatabaseConnector querySql
#' @importFrom SqlRender render translate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these isStandard examples, show me a couple of use cases. I believe you that they are needed but I havent encountered them often


for (table in tables) {
# Read concept table from SQL database
concept_table_query <- SqlRender::render(sprintf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious...why sprintf? I usually use glue. Despite adding a dependency i think its a real good package for interpreting string literals

#' print(concept_field)
#'
#' @export
links <- list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a non-exported function. It seems stable since its pegged to the CDM schematic.

@@ -0,0 +1,127 @@
source("R/isStandard.R")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consolidate into 1 isStandard file all of these functions and test. ow looks good

@@ -3,7 +3,8 @@ Title: Cohort Definition Application Programming
Version: 2.0.8
Authors@R: c(
person("Martin", "Lavallee", , "martin.lavallee@boehringer-ingelheim.com", role = c("aut", "cre")),
person("Adam", "Black", , "black@ohdsi.org", role = c("aut"))
person("Adam", "Black", , "black@ohdsi.org", role = c("aut")),
person("Guus", "Wilmink", , "guus@thehyve.nl", role = c("aut"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like to do more maintenance on Capr? As you can see the package is escaping me given how slow I was to respond to this PR

…ub/workflows/actions/download-artifact-4.1.7

Bump actions/download-artifact from 2 to 4.1.7 in /.github/workflows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants