A question about #75

rjiang9 · 2024-08-29T20:10:41Z

Hi folks,

In the REDCap sample inputs folder:

sample_inputs/redcap_example/manifest.yml, the schema line is (it does not designate the schema_class):

schema: https://raw.githubusercontent.com/CanDIG/katsu/develop/chord_metadata_service/mohpackets/docs/schema.yml

In the generic folder, manifest.yml has:

schema: https://raw.githubusercontent.com/CanDIG/katsu/develop/chord_metadata_service/mohpackets/docs/schema.yml
# class of schema for validation:
schema_class: MoHSchemaV3

In the ETL_code root, there are moh_v3_template.csv and moh_v2_template.csv, generated using different schemas.

I am working on a REDCap data mapping with the most recent CanDIG and ETL_code of develop branch, which schema should I use? I am also using the template from the redcap folder, what do I need to pay attention to?

Thanks a lot,
Ray

The text was updated successfully, but these errors were encountered:

rjiang9 · 2024-08-29T20:38:30Z

Primary_site is under Donor in the sample redcap template, but in the v3 template is under primary_diagnoses.

rjiang9 · 2024-08-29T21:22:48Z

When I use schema_class: MoHSchemaV2

(candigetl) ➜  CANDIG python clinical_ETL_code/src/clinical_etl/CSVConvert.py --input carolyn-mappings/Singleton.csv --manifest carolyn-mappings/manifest.yml
Starting conversion...


 ==== Print module and schema_class and schema ...
<module 'clinical_etl.mohschemav2' from '/Users/ray.jiang/miniforge3/envs/candigetl/lib/python3.12/site-packages/clinical_etl/mohschemav2.py'>

MoHSchemaV2

https://raw.githubusercontent.com/CanDIG/katsu/develop/chord_metadata_service/mohpackets/docs/schema.yml
 ==== Print end.

Traceback (most recent call last):
  File "/Users/ray.jiang/Documents/CANDIG/clinical_ETL_code/src/clinical_etl/CSVConvert.py", line 827, in <module>
    main()
  File "/Users/ray.jiang/Documents/CANDIG/clinical_ETL_code/src/clinical_etl/CSVConvert.py", line 816, in main
    packets, errors = csv_convert(input_path, manifest_file, minify=args.minify, index_output=args.index,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ray.jiang/Documents/CANDIG/clinical_ETL_code/src/clinical_etl/CSVConvert.py", line 655, in csv_convert
    manifest = load_manifest(manifest_file)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ray.jiang/Documents/CANDIG/clinical_ETL_code/src/clinical_etl/CSVConvert.py", line 610, in load_manifest
    schema = getattr(schema_mod, schema_class)(manifest["schema"])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ray.jiang/miniforge3/envs/candigetl/lib/python3.12/site-packages/clinical_etl/schema.py", line 115, in __init__
    self.template = self.add_default_mappings(raw_template)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ray.jiang/miniforge3/envs/candigetl/lib/python3.12/site-packages/clinical_etl/schema.py", line 246, in add_default_mappings
    index_value = self.validation_schema[temp]["id"]
                  ~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: 'systemic_therapies'

But systemic_therapies looks new to schema V3.

mshadbolt · 2024-08-29T22:18:23Z

Hi Ray! The MoHCCN recently transitioned to the v3 model and is now available on their website: https://www.marathonofhopecancercentres.ca/researcher-hub/policies-and-guidelines

As you have noticed, this included a few major changes such as the addition of systemic therapies, removal of chemotherapy/immunotherapy/hormone therapy objects, and moving primary site to primary diagnosis.

The sample redcap template is from a v2 model export, we are yet to do a v3 model export at this stage. Sorry for the confusion there.

If you are running the latest develop stack, you would need to have a clinical ingest json that is valid against the v3 data model schema. Was the redcap data you are working with curated to the v2 or v3 version of the data model? Happy to take a look at your csv template and manifest file to see if I can spot anything that might be causing issues.

We are planning on making a stable release including these latest data model updates in about a month or so. There are also a few more minor changes coming in data model v3.1, then we are hoping the data model stays stable for a while...

rjiang9 · 2024-08-29T22:32:40Z

Hi Marion,

First of all, thank you so much for getting back to me. I appreciate it.

If you are running the latest develop stack, you would need to have a clinical ingest json that is valid against the v3 data model schema.

We are running CanDIG v4.1.0.

Was the redcap data you are working with curated to the v2 or v3 version of the data model?

I took the template from sample_inputs/redcap_example/redcap2moh.csv and do the mappings with customized mapping functions in redcap.py

Happy to take a look at your csv template and manifest file to see if I can spot anything that might be causing issues.

I will attach the template and manifest file to this thread. I appreciate it for your help.

We are planning on making a stable release including these latest data model updates in about a month or so. There are also a few more minor changes coming in data model v3.1, then we are hoping the data model stays stable for a while...

Thank you and the team for all these work.

redcap2moh.csv

rjiang9 · 2024-08-29T22:37:48Z

Here is the code of manifest.yml

description:  The mappings of REDCap datat to MoHpackets format for katsu
mapping: redcap2moh.csv
identifier: submitter_donor_id
schema: https://raw.githubusercontent.com/CanDIG/katsu/develop/chord_metadata_service/mohpackets/docs/schema.yml
schema_class: MoHSchemaV2
reference_date: earliest_date(Singleton.date_of_diagnosis)
date_format: YMD
functions:
    - redcap

rjiang9 · 2024-08-29T22:39:19Z

redcap.txt

mshadbolt · 2024-08-30T16:44:11Z

Hi Ray, thanks for sharing the files.

If you want to ingest the data into the stack running v4.1.0, the data will need to be compatible with data model v2. So the schema in the manifest will need to be the one on the stable branch of katsu. Can you try adjusting your manifest to:

description:  The mappings of REDCap datat to MoHpackets format for katsu
mapping: redcap2moh.csv
identifier: submitter_donor_id
schema: https://raw.githubusercontent.com/CanDIG/katsu/stable/chord_metadata_service/mohpackets/docs/schema.yml
schema_class: MoHSchemaV2
reference_date: earliest_date(Singleton.date_of_diagnosis)
date_format: YMD
functions:
    - redcap

Let me know if this works!

rjiang9 · 2024-08-30T16:49:25Z

Hi Marion, I will try that out and report back.

Thank you very much,
Ray

rjiang9 · 2024-09-04T18:26:53Z

Hi Marion,

I am trying out the sample_inputs/redcap_example but having a couple of questions,

In the [redcap2moh.csv](https://github.com/CanDIG/clinical_ETL_code/blob/develop/sample_inputs/redcap_example/redcap2moh.csv), What is the Singleton in front of the SOURCE field in the right column ? is it actually the redcap_repeat_instrument row values in the raw_redcap.csv? Please see the redcap2moh.csv screenshot for what I mean.
for the redcap exported file, I just need to put it in a folder, and when run CSV command, only the 'directory name' instead of full path/file_name is given, is that right (csvs/ is where I put the data file in my case below)?

python clinical_ETL_code/src/clinical_etl/CSVConvert.py --input csvs --manifest manifest.yml

Thanks a lot,
Ray

rjiang9 · 2024-09-04T19:47:48Z

PS: when I was trying to run the CSVConvert on example mappings, I got

I assume those names were just typo and all should be raw_redcap, is this correct?

mshadbolt · 2024-09-04T20:46:08Z

Hi Ray,

The 'Singleton' would refer to a csv with that filename in the source csvs. You would need to edit all these source csv names in the redcap2moh.csv to match which csv that field is found in your data. Do you have multiple csvs in your csvs directory?
For the --input, the path would be relative to where you are running the script I believe, so this would work if you are running the script from the same location as where your csvs directory and manifest.yml file are.
the P.S.: If you are getting this error it would indicate that you don't have the csvs listed within your csvs input directory. Is that the case?

When we worked with a redcap export, we needed to do some preprocessing of the redcap csv to split it up into the different csvs that correspond to the various schemas before running it through clinical_etl. Perhaps this is a missing step for your data currently?

rjiang9 · 2024-09-04T21:00:19Z

Thank you for clearing them up for me, Marion.

> When we worked with a redcap export, we needed to do some preprocessing of the redcap csv to split it up into the different csvs that correspond to the various schemas before running it through clinical_etl. Perhaps this is a missing step for your data currently?

This is what I missed. I thought I could just run the redcap sample mappings against the raw_redcap.csv (the single one large exported csv file) included in the repo. I did not realize that the file need to be split by the schema.

Do you happen to have those preprocessing split csvs files from that raw_redcap.csv file for me to take a look?

> 1. ... Do you have multiple csvs in your csvs directory?

At this moment, I just have one single exported csv file. As you mentioned, I need to split it up into different csv files to correspond to the various schemas.

> 2. For the --input, the path would be relative to where you are running the script I believe, so this would work if you are running the script from the same location as where your csvsdirectory andmanifest.yml file are.

Got it.

> 3. the P.S.: If you are getting this error it would indicate that you don't have the csvs listed within your csvs input directory. Is that the case?

Here I was trying to run the sample. I don't have those csvs - just that raw_redcap.csv in the repo.

mshadbolt · 2024-09-04T22:29:38Z

Hi Ray,

Ok! I think I understand. I can work on sharing the python script that will split the file into csvs in the same folder.

These files are a bit out of date since the redcap export format we were working with changed. I am not sure whether or not it will be relevant for you when you export from your own redcap database.

For now, are you just trying to see how things work using this as an example or are you trying to use the same methods to convert your own real data? Does your own data follow a similar format to the raw_redcap.csv or is it different?

We provided these files as something that worked for us previously but I am not sure how much they need to be customised for different redcap databases and what options there are when exporting out of redcap that would affect how they run so would be great to understand your experience so far.

rjiang9 · 2024-09-05T03:05:45Z

Hi Marion,

The REDCap data file I have is very similar to the raw_redcap.csv in the format. If you can share the split csvs files and the python scripts for doing that, it'll be very helpful for me to see how things are working and how you do the split.

All the best,
Ray

mshadbolt · 2024-09-05T03:31:22Z

Hi Ray, I have made a PR that adds the splitting script. It will be on the develop branch when it gets approved and merged but in the meantime you can also grab the script from here:

https://github.com/CanDIG/clinical_ETL_code/blob/mshadbolt/add-redcap-csv-split-script/sample_inputs/redcap_example/split_redcap_data.py

Hope it works for your export!

rjiang9 · 2024-09-05T15:07:28Z

This is great. Thank you so much Marion. Big help!

mshadbolt mentioned this issue Sep 5, 2024

Add RedCap export splitting script #78

Merged

rjiang9 closed this as completed Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about #75

A question about #75

rjiang9 commented Aug 29, 2024 •

edited

Loading

rjiang9 commented Aug 29, 2024

rjiang9 commented Aug 29, 2024

mshadbolt commented Aug 29, 2024

rjiang9 commented Aug 29, 2024 •

edited

Loading

rjiang9 commented Aug 29, 2024

rjiang9 commented Aug 29, 2024

mshadbolt commented Aug 30, 2024

rjiang9 commented Aug 30, 2024

rjiang9 commented Sep 4, 2024

rjiang9 commented Sep 4, 2024 •

edited

Loading

mshadbolt commented Sep 4, 2024

rjiang9 commented Sep 4, 2024

mshadbolt commented Sep 4, 2024

rjiang9 commented Sep 5, 2024

mshadbolt commented Sep 5, 2024 •

edited

Loading

rjiang9 commented Sep 5, 2024

A question about #75

A question about #75

Comments

rjiang9 commented Aug 29, 2024 • edited Loading

rjiang9 commented Aug 29, 2024

rjiang9 commented Aug 29, 2024

mshadbolt commented Aug 29, 2024

rjiang9 commented Aug 29, 2024 • edited Loading

rjiang9 commented Aug 29, 2024

rjiang9 commented Aug 29, 2024

mshadbolt commented Aug 30, 2024

rjiang9 commented Aug 30, 2024

rjiang9 commented Sep 4, 2024

rjiang9 commented Sep 4, 2024 • edited Loading

mshadbolt commented Sep 4, 2024

rjiang9 commented Sep 4, 2024

mshadbolt commented Sep 4, 2024

rjiang9 commented Sep 5, 2024

mshadbolt commented Sep 5, 2024 • edited Loading

rjiang9 commented Sep 5, 2024

rjiang9 commented Aug 29, 2024 •

edited

Loading

rjiang9 commented Aug 29, 2024 •

edited

Loading

rjiang9 commented Sep 4, 2024 •

edited

Loading

mshadbolt commented Sep 5, 2024 •

edited

Loading