Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes library calls from get_rcode_libraries in favor of manual specification #1015

Closed
wants to merge 4 commits into from

Conversation

averissimo
Copy link
Contributor

Pull Request

Fixes #950

Changes description

  • Removes library calls if they are already manually specified by the app developer
    • Ignore cases when library(...) is after a comment character (#)
    • Ignore cases when library(...) is written inside a string var1 <- "a weird string with library(ggplot2)"
  • Show error when loadedNamespaces package is not installed in the user
    • Being discussed in issue

Copy link
Contributor

@chlebowa chlebowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this PR we use get_rcode_libraries to list all loaded namespaces and put them in library calls but then we just remove the ones that were attached previously by calling library within teal_data. Am I correct? The following assumes that I am.

I don't think this is the right way to go. get_rcode_libraries was never a good solution to begin with. What is attached by the app dev* should be attached, and what is only loaded should be only loaded. I believe we should not add library calls to the code.

(*) Attaching packages in modules is poor practice and should be discouraged but there is no way to prevent it so this extends to the module dev. Problems may ensue if a module attaches a package that masks a function used in prepro.

Comment on lines 288 to 297
# list of code
dataset_rcode <- get_datasets_code(datanames, datasets, hashes)
loaded_libs <- get_rcode_libraries(dataset_rcode)

code <- c(
get_rcode_str_install(),
get_rcode_libraries(),
get_datasets_code(datanames, datasets, hashes)
loaded_libs,
"",
dataset_rcode
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes it a little less readable, please follow the previous version of c(<call1>, <call2>, <call3>).

Why the ""?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small UI tweak to create space between automated librarys and teal_data code

I can revert or find a less-tweaky way of achieving this

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The teal_data constructor does code <- paste(code, collapse = "\n"). Ensuring that the library calls string ends with a line break should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted in 8b6be45

@averissimo
Copy link
Contributor Author

So with this PR we use get_rcode_libraries to list all loaded namespaces and put them in library calls, but then we just remove the ones that were attached previously by calling library within teal_data. Am I correct?

This PR still keeps the previous behavior, it just filters out the library calls. Using namespaces will produce too many library calls, which I don't think we want ("Too many" (easily) means more than 50)

I don't think this is the right way to go. get_rcode_libraries was never a good solution to begin with. What is attached by the app dev* should be attached, and what is only loaded should be only loaded. I believe we should not add library calls to the code.

(*) Attaching packages in modules is poor practice and should be discouraged but there is no way to prevent it so this extends to the module dev. Problems may ensue if a module attaches a package that masks a function used in prepro.

I agree that it's poor practice. A good solution seems unnatainable.

From my POV, either:

  1. We add library calls with attached (as it is currently) or loadedNamespaces packages (which will include a lot of unnecessary packages for the code to be reproducible)
  2. We relax our standards for the reproducibility code generation and don't include any libraries, leaving to:
    • App developer to add those explicit calls
    • User to install the required packages as the code fails

From my research so far we cannot isolate the namespaces in any way. (i.e. prevent the qenv environment from using packages attached elsewhere)

Copy link
Contributor

@chlebowa chlebowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid library("<package>") is just as valid as library(<package>) and parsing fails with the quotemarks. Maybe adding gsub("\"", "", user_libraries) will be enough to fix it.

R/get_rcode_utils.R Outdated Show resolved Hide resolved
R/get_rcode_utils.R Outdated Show resolved Hide resolved
# Keep only library name
parsed_libraries <- gsub(
# library(...) must be located at beginning of line, or have a valid character before
"(^l|.*<-|.*[ ;=\\({]l)ibrary\\(([a-z][a-zA-Z0-9.]*)\\)$", "\\2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can [a-z][a-zA-Z0-9.] be replaced with [:alnum:]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, [[:alnum:].] due to dot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed the full stop when retyping 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I think I misunderstood your reply to my comment. What I meant to ask is whether you can replace

"(^l|.*<-|.*[ ;=\\({]l)ibrary\\(([a-z][a-zA-Z0-9.]*)\\)$"

with

"(^l|.*<-|.*[ ;=\\({]l)ibrary\\(([:alnum:]*)\\)$"

I don't quite get why you use <any lowercase letter><any number of letters/digits/anychar>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I see now that things like [:alnum:] are not entirely reliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the documentation about the first letter.

A valid package name needs to be <letter><any number, letter and dot>

@pawelru
Copy link
Contributor

pawelru commented Jan 5, 2024

I very much dislike operating on the code as a text using regular expressions and so on. It would be relatively easy to obey the patterns / the amount of work to make it robust is huge due to number of possible options.

Few examples

library("foo") # I believe this one is covered
library("foo", character.only = T)
library(foo)
library      (foo)
require(foo)
do.call(library, list("foo"))
pkgs <- c("foo"); for(i in pkgs) library(i, character.only = T)
pkgs <- c("foo"); lapply(pkgs, library, character.only = T)
get("library")(foo)
eval(parse(text = paste0("lib", "rary", "(", "foo", ")")))
foo <- library; foo(bar)
...

I haven't gone deep enough to the problem to suggest an alternative - it's just a warning for you that this approach is a rabbit hole.

@chlebowa
Copy link
Contributor

chlebowa commented Jan 5, 2024

The three functions used here and teal.slice::get_filter_expr called by get_datasets_code (code from main) return character.
Would you have them all modified to work on language objects?

  code <- c(
    get_rcode_str_install(),
    get_rcode_libraries(),
    get_datasets_code(datanames, datasets, hashes)
  )

@pawelru
Copy link
Contributor

pawelru commented Jan 8, 2024

The three functions used here and teal.slice::get_filter_expr called by get_datasets_code (code from main) return character. Would you have them all modified to work on language objects?

  code <- c(
    get_rcode_str_install(),
    get_rcode_libraries(),
    get_datasets_code(datanames, datasets, hashes)
  )

Eventually yes. Combine also into the language object and cast to character (and style it then) at the very end. This feels more correct to me. This obviously is beyond the scope of the task but I would like to avoid strengthening this questionable character-based approach.

@chlebowa
Copy link
Contributor

chlebowa commented Jan 8, 2024

Eventually yes.

Will this do? #1027

@pawelru
Copy link
Contributor

pawelru commented Jan 8, 2024

Eventually yes.

Will this do? #1027

Yes. Much appreciated!

@averissimo
Copy link
Contributor Author

averissimo commented Jan 30, 2024

We can restrict the rabbit hole and only comply with common/simple patterns. I see this feature as a "nice to have" to avoid duplicated code and not as a complete solution.

Otherwise, it might risk becoming a new code parser if we enter the rabbit hole 😅 (even when dealing with call/expressions)

We might have an alternative where we detect changes in the session during eval_code / within and store that as metadata along the way. (a setdiff would then be enough to remove duplicates)

@averissimo averissimo closed this Mar 6, 2024
@averissimo averissimo deleted the 950-library_calls branch March 6, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove library calls from get_rcode_header in favour of manual specification
3 participants