Code and associated data for "TCGA-Reports: A Machine-Readable Pathology Report Resource for Benchmarking Text-Based AI Models" (Kefeli et al.).
Reports can be found above, in TCGA_Reports.csv.zip.
Due to large file size, imgs_for_aws (images for Textract input) and aws_response (response files for Textract output) can be found at the following repository: http://tatonettilab-resources.s3-website-us-west-1.amazonaws.com/?p=tcga-path-reports/