Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft specification for labels #45

Open
wants to merge 3 commits into
base: scoping-doc
Choose a base branch
from

Conversation

DougManuel
Copy link
Contributor

The PR has specifications for refactoring labels to address the approach of moving labels out of rec_with_table(). Further, we have little use of labels and metadata, but this is a priority feature.

There is a clearer and more consistent approach to variable labels. I propose we should extend the 'labelled' library, and there are a few ways to do so.

@DougManuel DougManuel requested a review from yulric November 18, 2024 19:45
@DougManuel DougManuel added the enhancement New feature or request label Nov 18, 2024
@yulric yulric marked this pull request as ready for review November 19, 2024 14:56
Copy link
Collaborator

@yulric yulric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document is pretty good but missing more text about what's in scope for the library.


## Scope and specifications for labels and metadata

Adding support for labels in recodeflow will enhance data clarity and usability by providing descriptive metadata for variables, ensuring consistent interpretation across analyses. This feature will require implementing a standardized system for label management, including functions for adding, modifying, and retrieving labels within the recodeflow framework. There are two primary uses of variable and value labels:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding, modifying, and retrieving to what? The variable and variable details sheet? The data?


Other uses include:

1. Facilitating data sharing and collaboration by providing clear documentation of variable meanings and valid values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid values, this isn't solved by labels right? Its solved by other metadata such as mins, maxes, units etc.


1. Facilitating data sharing and collaboration by providing clear documentation of variable meanings and valid values.
1. Enabling automated report generation with human-readable variable descriptions.
1. Supporting data validation by documenting expected values and ranges.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above in point 1

```{mermaid}
graph LR
A[Raw Data] --> B[Labeled Data]
B --> C[Factor]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the fork from B to factor and character. Do you mean categorical variables can be treated as a factor or a character variable?

F --> G[Export]
```

Another significant challenge is maintaining label consistency across different representations of the same data. For example, a variable might need to be represented as both a labeled numeric vector and a factor, depending on the analysis context. Each transformation between these representations requires careful handling of both the values and their associated labels to ensure that the semantic meaning of the data is preserved. Additional challenges include handling missing values and their labels consistently, supporting multiple languages or label variants for the same variable, maintaining label integrity during data reshaping operations, ensuring labels remain synchronized when subsetting or filtering data, and managing the memory overhead of storing extensive label metadata.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind providing a concrete example of your generic example here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional challenges line can be removed, its in points form below.


## Label specifications

- Follow the labelled library approach. All labelled functions should work with recodeflow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a sub-section here about which challenges mentioned above recodeflow will tackle and which ones are out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants