Skip to content

inject study_type in EBI and improvements to current automatic processing pipeline #3023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 4, 2020

Conversation

antgonza
Copy link
Member

@antgonza antgonza commented Aug 3, 2020

@antgonza antgonza requested a review from ElDeveloper August 3, 2020 21:39
@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2020

Codecov Report

Merging #3023 into dev will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev    #3023   +/-   ##
=======================================
  Coverage   94.94%   94.94%           
=======================================
  Files          74       74           
  Lines       14263    14263           
=======================================
  Hits        13542    13542           
  Misses        721      721           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9275b7...bf32a46. Read the comment docs.

Copy link
Contributor

@ElDeveloper ElDeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, overall. Are the modifications to the automatic processing pipeline a result of failed runs, or what motivated those changes?

@@ -356,6 +356,11 @@ def generate_study_xml(self):
study_title = ET.SubElement(descriptor, 'STUDY_TITLE')
study_title.text = escape(clean_whitespace(self.study_title))

# study type is depricated and not displayed anywhere on EBI-ENA;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# study type is depricated and not displayed anywhere on EBI-ENA;
# study type is deprecated and not displayed anywhere on EBI-ENA;

# getting all jobs, includen hiddens, in case the job failed
jobs = a.jobs(cmd=cmd['command'], show_hidden=True)
params = [j.parameters.values for j in jobs]
params = [{k: str(v) for k, v in j.parameters.values.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the string conversion needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question; for some reason (for old jobs) the parameters are not string and the parameters here are defined as strings; the easiest is to make sure that everything is string

Comment on lines 242 to 244
k: str(v)
for k, v in cpp.values.items()
if k not in cmd['ignore_parameters']}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list comprehension pattern is repeated a couple times, any chance this can be put in a single function, or something like that?

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The changes are basically cause (1) we needed to reprocess WGS with the new pipeline (had to change versions) while the first time we only did target gene (thus, we were missing some parameters for WGS); (2) realized that we can have the same command with multiple core parameters (so we need the ignore parameters, including that some of the first version did not converted everything to string); and some general clean up.

# getting all jobs, includen hiddens, in case the job failed
jobs = a.jobs(cmd=cmd['command'], show_hidden=True)
params = [j.parameters.values for j in jobs]
params = [{k: str(v) for k, v in j.parameters.values.items()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question; for some reason (for old jobs) the parameters are not string and the parameters here are defined as strings; the easiest is to make sure that everything is string

@ElDeveloper
Copy link
Contributor

Thanks @antgonza

@ElDeveloper ElDeveloper merged commit 68ffc5a into qiita-spots:dev Aug 4, 2020
antgonza added a commit that referenced this pull request Sep 16, 2020
* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
ElDeveloper added a commit that referenced this pull request Nov 9, 2020
* Version 092020 (#3034)

* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* fix #3036

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
ElDeveloper added a commit that referenced this pull request Nov 9, 2020
…3040)

* Version 092020 (#3034)

* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* adding job submit ENVIRONMENT and minor improvement to runWorkflow

* fix test

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
ElDeveloper added a commit that referenced this pull request Nov 9, 2020
* Version 092020 (#3034)

* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* fix #2920

* fix test

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
ElDeveloper added a commit that referenced this pull request Nov 10, 2020
* Version 092020 (#3034)

* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* rm create_qiime_mapping_file

* fixing some tests

* fixing more tests

* fix even more tests

* rm npt.assert_warns

* qiime-map -> sample-file

* update not_merged_samples.txt

* adding @ElDeveloper changes

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
ElDeveloper added a commit that referenced this pull request Nov 17, 2020
* Version 092020 (#3034)

* inject study_type in EBI and improvements to current automatic processing pipeline (#3023)

* inject study_type in ebi and improvements to current automatic proecssing pipeline

* addressing @ElDeveloper comments

* some general fixes/additions for next release (#3026)

* some general fixes/additions for next release

* adding test for not None job.release_validator_job

* fix #2839

* fix #2868 (#3028)

* fix #2868

* 2nd round

* fix errors

* more changes

* fix errors

* fix ProcessingJobTest

* fix PY_PATCH

* add missing TRN.add

* encapsulated_query -> perform_as_transaction

* fix #3022 (#3030)

* fix #3022

* adding tests

* fix #2320 (#3031)

* fix #2320

* adding prints to debug

* children -> 1

* APIArtifactHandlerTest -> APIArtifactHandlerTests

* configure_biom

* qdb.util.activate_or_update_plugins

* improving code

* almost there

* add values.template

* fix filepaths

* filepaths -> files

* fixing errors

* add prep.artifact insertion

* addressing @ElDeveloper comments

* fix artifact_definition active command

* != -> ==

* Added three tutorial sections to the Qiita documentation (#3032)

* Added three tutorial sections to the Qiita documentation: 'Retrieving Public Data for Own Analysis' and 'Processing public data retrieved with redbiom' to the redbiom tab, and 'Statistical Analysis to Justify Clinical Trial Sample Size Tutorial' to the analyzing samples tab.

* Update redbiom.rst

* Update redbiom.rst

* Update redbiom.rst

* Further updates to redbiom.rst and the Stats tutorial.

* update redbiom.rst

* Finished proof-reading

* Placed all three tutorials/sections together under Introduction to the download and analysis of public Qiita data

* added a new introduction, with links to the three sections

* Added figures to stats tutorial and contexts explanation

* Added figures to stats tutorial and contexts explanation

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Antonio Gonzalez <antgonza@gmail.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* 092020 (#3033)

* 092020

* connect artifact with job

* rm INSERT qiita.artifact_processing_job

* Apply suggestions from code review [skip ci]

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

* adding qiita.study autoloaded column

* cleaning tests

* fix redbiom test

* avoiding clog submissions

* Apply suggestions from code review

Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

Co-authored-by: Daniel McDonald <danielmcdonald@ucsd.edu>
Co-authored-by: Mirte Kuijpers <67341505+mcmk3@users.noreply.github.com>
Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants