Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

94 save objects in qs #99

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from
Open

94 save objects in qs #99

wants to merge 10 commits into from

Conversation

bkutlu
Copy link
Collaborator

@bkutlu bkutlu commented Oct 14, 2024

Ability to write and read data objects in qs format will improve the speed with large daaps. There is a new type parameter in dp_write, which is set to 'rds' by default. I had to create some helper functions to accommodate the convention of saving RDS objects under "output_files/RDS_format" directory. Please take a look closely and test it on your environment.

benchmark

@camorosi
Copy link
Collaborator

camorosi commented Oct 17, 2024

This throws a wrench in things, but looks like qs has a planned deprecation. qsbase/qs#103 Can we test with qs2 instead?

@bkutlu
Copy link
Collaborator Author

bkutlu commented Oct 17, 2024

Great catch! We can definitely implement qs2 once it is supported by pins-r

@bkutlu bkutlu reopened this Nov 20, 2024
@bkutlu
Copy link
Collaborator Author

bkutlu commented Nov 20, 2024

The changes introduced will be backward compatible with existing daaps

Copy link
Collaborator

@camorosi camorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with local and s3 data products and everything is working smoothly.

Note that with the current implementation daaps with multiple types (ie a daap started with rds and switched to qs) may have unexpected behavior with get_data_object_path but I think this can be handled via documentation in a future qs vignette.

@@ -58,3 +55,22 @@ dp_commit <- function(project_path = fs::path_wd(),

return(repo)
}

get_data_object_path <- function(project_path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take into account the case where a data product may have been started in rds format and switched to qs format? If so, this will always return the RDS path preferentially because of the order of the list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no issue if the data product was initially generated in one format and later switched to another, provided that the repository is freshly cloned or synchronized, therefore the contents of the output_files directory are ignored. However, it could be problematic if a user continues to develop the data product within an existing repository and decides to switch the format type. While this scenario is unlikely, it would be prudent to include a warning in the vignette explaining how the new type parameter functions.

R/dp_write.R Outdated Show resolved Hide resolved
R/dp_write.R Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants