Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lengthen sample stable id field #9404

Merged
merged 1 commit into from
Mar 24, 2022

Conversation

sheridancbio
Copy link
Contributor

Describe changes proposed in this pull request:

  • sample stable ids now can be up to 63 characters
  • add check for maximum stable id to validator
  • migration script update (schema 2.12.13)
  • remove wildcard from search for cgds.sql during unit tests

Checks

  • Runs on heroku
  • Has tests or has a separate issue that describes the types of test that should be created. If no test is included it should explicitly be mentioned in the PR why there is no test.
  • The commit log is comprehensible. It follows 7 rules of great commit messages. For most PRs a single commit should suffice, in some cases multiple topical commits can be useful. During review it is ok to see tiny commits (e.g. Fix reviewer comments), but right before the code gets merged to master or rc branch, any such commits should be squashed since they are useless to the other developers. Definitely avoid merge commits, use rebase instead.
  • Is this PR adding logic based on one or more clinical attributes? If yes, please make sure validation for this attribute is also present in the data validation / data loading layers (in backend repo) and documented in File-Formats Clinical data section!

Any screenshots or GIFs?

If this is a new visual feature please add a before/after screenshot or gif
here with e.g. Giphy CAPTURE or Peek

Notify reviewers

Read our Pull request merging
policy
. It can help to figure out who worked on the
file before you. Please use git blame <filename> to determine that
and notify them either through slack or by assigning them as a reviewer on the PR

Copy link
Member

@Luke-Sikina Luke-Sikina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@@ -2667,6 +2670,14 @@ def checkLine(self, data):
'column_number': col_index + 1,
'cause': value})
continue
if len(value.strip()) > MAX_SAMPLE_STABLE_ID_LENGTH:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! We've run into this a few times.

Copy link
Member

@dippindots dippindots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Member

@inodb inodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a test failing: https://github.com/cBioPortal/cbioportal/runs/5648963463?check_suite_focus=true. Looks like we missed updating the db.version in db-scripts/src/main/resources/cgds.sql?

Making sure all db versions are the same in cgds.sql, pom.xml and migration.sql
pom.xml db version is 2.12.13
db-scripts/src/main/resources/cgds.sql db version is 2.12.12
db-scripts/src/main/resources/migration.sql db version is 2.12.13
db versions mismatch

@inodb
Copy link
Member

inodb commented Mar 22, 2022

@sheridancbio I just fixed my comment ☝️

- sample stable ids now can be up to 63 characters
- add check for maximum stable id to validator
- migration script update (schema 2.12.13)
- remove wildcard from search for cgds.sql during unit tests
@sheridancbio
Copy link
Contributor Author

Excellent ... I'm so glad that test was added! .. thanks @inodb
I've fixed cgds.sql and want to do a little testing of the validator code ... maybe adding a unit test for the validator itself. So let's hold off merging a bit. I'll post a comment here when testing is complete.

@sonarcloud
Copy link

sonarcloud bot commented Mar 22, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@sheridancbio
Copy link
Contributor Author

@inodb I have completed my system testing of the study validator. I used the study luad_tcga_pub from datahub. I changed one of the study ids to a length 63 string throughout the datafiles and the study successfully validated. Then I made that sample id length 64 and validation failed with this message in the logs:

DEBUG: data_clinical_sample.txt: Starting validation of file
ERROR: data_clinical_sample.txt: line 65: column 2: SAMPLE_ID too long (63 character maximum); value encountered: 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij1234'
INFO: data_clinical_sample.txt: Validation of file complete
INFO: data_clinical_sample.txt: Read 235 lines. Lines with warning: 0. Lines with error: 1

I was unable to get the unit tests for the validator script to run smoothly in my environment, so I did not add a unit test for this case into for example core/src/test/scripts/unit_tests_validate_data.py or core/src/test/scripts/system_tests_validate_data.py

But I think we can move forward with this now without the validator unit tests (or, if anyone can give guidance on running the unit tests, I'd be happy to put it in)

@dippindots
Copy link
Member

@sheridancbio If this is the error you get when run /unit_tests_validate_data.py

ModuleNotFoundError: No module named 'importer'

Ramya gave me this suggestion before, and it worked great for me:
You can create a copy of the importer folder in the cbioportal/core/src/test/scripts/ folder, So the module is in the same path to run the unit tests locally. Then you should be able to run that test locally.

@inodb inodb merged commit b3b7bf3 into cBioPortal:master Mar 24, 2022
@sheridancbio sheridancbio deleted the expand_sample_stable_id_width branch March 25, 2022 18:42
@inodb inodb removed the enhancement label Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants