-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DIG-1045: First pass at MoH model #13
Conversation
This is working for me. Yay! I think we need some more documentation, particularly on how to use the new template (explaining the formatting, specifying the index field, etc). See related comment on the ETL_code PR |
I do have one concern about the way we're doing this now: by default, CSVConvert will try to find values for all of the template fields, even if a mapping hasn't been explicitly specified. Is this a good idea? It might make it very hard to figure out which field is the one causing an error if the user hasn't specified something. |
Also: should the running of ingest_redcap_data just happen as part of running CSVConvert, or is it better to do that as a separate script? |
Looking at the new template and code (and the example in ETL_data), here are some questions that we should probably address in the documentation:
|
I think separate (different electronic data capture systems will all need a custom massage script). We should specify the required input format for CSVConvert, though. |
We could specify the massage script as part of the manifest, though: I like being able to connect all of the scripts that were run in a single file package so that the provenance of how we got our ingest data is very clear. |
Also I think we should move all of the documentation that is currently in the data repo README to this one, since it's not a guarantee that users will have access to that private repo. |
In general, I think this is the behaviour we want. In order to make it less confusing / easier to debug, we probably want a combination of the following:
|
I would like to dispense with the In addition, is the |
I am going to put on my documentation hat and take a stab at more updates to the readme before approving. |
I think more confusing than helpful
I like this suggestion |
Pushed a first pass at updates to the readme (note that I am pulling out the mapping function documentation into a separate file) |
Based on the katsu openapi schema's DonorWithClinicalData schema, we can now generate a template file, update it with mappings, and run CSVConvert on it.
From this repo, you can only really test template generation:
Detailed testing can be done in the subsequent clinical_ETL_data pull request.