Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources on data validation in the full data science pipelines #10

Open
chendaniely opened this issue Dec 11, 2023 · 0 comments
Open

Comments

@chendaniely
Copy link
Contributor

We've kind of used unit tests to help validate things in the course, but those were mainly single assert or testthat cases in a script or part of a package's unit testing workflow.

We should also use tools specifically to validate our data to test our data's assumptions (e.g., no missing values, has a specific distribution, etc)

Posit conf this past year had 2 workshops in r and python that might be nice reference material as well:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant