Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/allow converting #64

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mspierenburg
Copy link
Contributor

Description

Allow converting input data according to the pandera schema so data types only need to be define in the schema and not also als load parameter.

Development notes

  • Added reading of global and dataset parameter (convert)
  • Added to return validated dataset when parameter is set
  • Enabled validation + conversion also for output datasets
  • Added unittest to test conversion example

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@mspierenburg
Copy link
Contributor Author

will cover #63

Copy link

codecov bot commented May 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (56d9cd1) to head (02eb584).

Additional details and impacted files
@@             Coverage Diff             @@
##             main       #64      +/-   ##
===========================================
+ Coverage   94.05%   100.00%   +5.94%     
===========================================
  Files           5         5              
  Lines         101       126      +25     
===========================================
+ Hits           95       126      +31     
+ Misses          6         0       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Galileo-Galilei
Copy link
Owner

Hi, thank you very much for taking the stab at it.
I think we can separate the 2 features in different PR:

  • regarding the validation of outputs, it should be almost ready to merge as is. I just need to check that it does not "double validate" intermediary datasets which are inputs and outputs because I remember their are several outputs attributes ; we also need to check that pipeline.outputs returns all outputs even if they are not memory datasets.
    Can you open a different PR for that? I'll merge and release it quickly.

  • regarding the automatic conversion, I think there may have issues with the catalog not being modified in place by the hook, I am not sure it will work as is ; even if it does, it may be the opportunity to introduce a pandera.yaml config file instead of relying on a pandera key in the parameters.yml. I'll have a look and give more precise suggestions likely next week.

@mspierenburg
Copy link
Contributor Author

@Galileo-Galilei

Thank you for your comments.
I created a new PR for validation of output datasets: #66

If this one is merged I will create a new PR for the automatic conversion where we can discuss the issues you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants