[REVIEW]: A Framework to Quality Control Oceanographic Data #2063

whedon · 2020-02-03T16:19:33Z

Submitting author: @castelao (Guilherme Castelao)
Repository: https://github.com/castelao/CoTeDe
Version: v0.21.3
Editor: @kthyng
Reviewer: @jessicaaustin, @evanleeturner
Archive: 10.5281/zenodo.3733959

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/87191fa86135614f71cc945c36b5f532"><img src="https://joss.theoj.org/papers/87191fa86135614f71cc945c36b5f532/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/87191fa86135614f71cc945c36b5f532/status.svg)](https://joss.theoj.org/papers/87191fa86135614f71cc945c36b5f532)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jessicaaustin & @evanleeturner, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kthyng know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @jessicaaustin

Conflict of interest

I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Contribution and authorship: Has the submitting author (@castelao) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @evanleeturner

Conflict of interest

I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Contribution and authorship: Has the submitting author (@castelao) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

The text was updated successfully, but these errors were encountered:

whedon · 2020-02-03T16:19:36Z

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @jessicaaustin, @evanleeturner it looks like you're currently assigned to review this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon · 2020-02-03T16:19:57Z

Reference check summary:

OK DOIs

- 10.1016/j.mio.2014.09.001 is OK

MISSING DOIs

- None

INVALID DOIs

-  http://dx.doi.org/10.13155/33951 is INVALID because of 'https://doi.org/' prefix

whedon · 2020-02-03T16:20:03Z

👉 Check article proof 📄 👈

castelao · 2020-02-24T16:28:09Z

Hi everyone. It's three weeks since the start of the review. Would you have any update for me? Thanks.

kthyng · 2020-02-25T20:32:04Z

I'll just do a specific ping to reviewers @jessicaaustin and @evanleeturner. When do you think you'll be able to work on your reviews? Thanks.

evanleeturner · 2020-02-25T20:33:31Z

I am still working on this review, thank you.

jessicaaustin · 2020-02-25T21:03:25Z

Thanks for the reminder. I have read all the instructions and made some progress on my review before getting pulled into other tasks. I will pick it up again this week and submit by the end of the week.

openjournals/joss-reviews#2063

jessicaaustin · 2020-02-29T04:46:27Z

Conflict of interest

I do not believe this is a conflict of interest as outlined in the JOSS guidelines, but I do want to state that I am a maintainer of a similar open-source QC library, https://github.com/ioos/ioos_qc .

Overall thoughts:

I agree 100% with the author's philosophy that QC procedures should be organized and distributed as code packages in order to widely and consistently used. I also agree with the goal of creating a framework that can run tests from multiple different QC test schemes (GTSPP, ARGO, QARTOD, etc) since none of those schemes alone will completely satisfy testing requirements in all situations. I think the integration with oceansdb to run Climatology tests and Location at Sea tests is particularly exciting. I hope publication in JOSS will bring this library to the attention of more groups and encourage further use and collaboration to flesh this out.

The library takes in data as a numpy array, so in theory the data could come from any source or format. However you must create a "data object" to wrap your data. Given that there are other mature and well-used libraries with data representations (xarray, pandas, netcdf, etc), why not leverage one of those? Or at least make it easy to use CoTeDe if your data is already in one of those common formats. I created a notebook to try to understand how to run QC tests using the library ( castelao/CoTeDe#42), and I was able to fairly easily load a netcdf file using xarray and run tests, but I got an error when using a pandas dataframe and it was not clear how to debug it. I think a little documentation could go a long way here, to help people get started.

Review

Items marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation.

Paper:

Overall this paper is clear and easy to read, and does a good job of describing the motivation and overall functionality of the software
Highly recommended: It would be great to list similar automated QC packages and describe how CoTeDe compares. For example, CoTeDe is more generic than ioos_qc, which depends on pandas/xarray and assumes netCDF integration, and CoTeDe allows integration with a wide range of QC test suites, whereas ioos_qc (currently) only focuses on QARTOD.
- I do understand that this takes time, so I don't want it to block acceptance here. But this is the first question I had when I looked at this library, and would love to see it in the README someday.
Highly recommended, if possible: Include a list of groups actively using this library, to give us an idea of how widely used it is. (this could also be added to the code documentation)
Recommended: Describe how to pronounce CoTeDe (though I suppose leaving some amount of mystery is the python way :-) )

Documentation:

BLOCKER: The important classes and most of the test methods have docstrings, which is great. But I was not able to find any generated API docs under https://cotede.readthedocs.io/ . Add API documentation castelao/CoTeDe#41

Installation

I was able to install and use the software using the instructions in the repo
Nice to have: Add conda-forge packages for CoTeDe and oceansdb castelao/CoTeDe#39

Contributing

I was able to install dependencies and run tests locally using the instructions in the repo
Recommended: WIP: contributing updates castelao/CoTeDe#38
Recommended: Resolve flake8 failures and/or add flake8-ignore castelao/CoTeDe#40

Performance

I did not find any mention of performance testing or guarantees in the code or documentation
Highly recommended you add performance tests. During development of https://github.com/ioos/ioos_qc we found that small tweaks to our testing algorithms could lead to severely degraded performance on large datasets. Recommend you add these tests to the travis-ci build and add it to the checklist of items to review if someone submits a PR.

castelao · 2020-03-02T15:44:43Z

@jessicaaustin , thank you for your review. I'll respond to it ASAP.

evanleeturner · 2020-03-03T16:53:00Z

Conflict of interest

I do not have a conflict of interest in this review. I do maintain internal (non-public) code repository @wdft-coastal that does contain some of the similar functions as presented in the current code but are specific for my dataset and organization.

Overall thoughts:

I am extremely pleased with the premise of this package for organizing QC procedures and being able to run multiple tests. This is a high priority item for my organization and my current data sets, as our own QAQC methods often can conflict with other agencies that follow different QAQC procedures. In the last few months I have been preparing code that can mimic QARTOD style tests on our own data internally, but I wanted the ability to apply alternative test schemes and to skip tests that are not appropriate for our data. Having a maintained package that accomplishes this task is not only helpful but it also greatly increases your data confidence as now the QC methods are maintained by other groups in a transparent mechanism. I fully expect to be implementing this package into our own website and data products at TWDB.

I share reviewer one’s sentiment about the philosophy of leveraging other data libraries such as netCDF or pandas. Although our own internal datasets are SQL we do make use of pandas where possible for data manipulation and in the future we will likely be forced to move to netCDF to process very large datasets for QAQC. This functionality is not a show-stopper for the package, but it is a severe limitation for its acceptance in the community.

Review

Items marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation.

Paper

The paper itself was succinct, well-written, and very appropriate to the code.
Recommended: Please consider adding the functionality of swapping methods for calculating seawater: add alternative methods for calculating salinity castelao/CoTeDe#43
Recommended: mirror reviewer one that I don’t know how to pronounce CoTeDe! 😊

Documentation:

BLOCKER: Please also complete the issue that was brought by reviewer one concerning the API docs as this would also be extremely helpful for me to implement this code: https://cotede.readthedocs.io/ . Add API documentation castelao/CoTeDe#41

Installation

I was able to get the package running.

Contributing

I was able to run the tests.

Performance

There were no performance tests or statements regarding performance.
I did not have an issue with performance with my own tests, but I also have very small datasets at the moment. Additionally, if I were to run this package in a production environment, the performance hit would be negated because our data would be checked as it inserted into our database which would be a heftier cost of CPU cycles. If I were going to run these tests on the entire database, I would do so offline or during non-peak hours on the production systems in any case.

castelao · 2020-03-04T03:37:07Z

@evanleeturner, thanks for the review. I already started to work on it.

Since you mentioned a potential interest in using yourself, you might appreciate how can you customize CoTeDe's configuration. In summary, you can create a setup of tests to apply and save it to use later. I do that myself a lot, so different regions or sensors have their own sequence of procedures to apply.

castelao · 2020-03-04T03:57:35Z

@whedon generate pdf

whedon · 2020-03-04T03:58:03Z

👉 Check article proof 📄 👈

castelao · 2020-03-06T03:15:44Z

Hi @evanleeturner , you might be able to edit the top post, and if so, could you click on the review items that you agree with, please? That would help me to write my response. Thanks!

castelao · 2020-03-13T15:00:02Z

Response to @jessicaaustin

Conflict of interest

I do not believe this is a conflict of interest as outlined in the JOSS guidelines, but I do want to state that I am a maintainer of a similar open-source QC library, https://github.com/ioos/ioos_qc .

Overall thoughts:

I agree 100% with the author's philosophy that QC procedures should be organized and distributed as code packages in order to widely and consistently used. I also agree with the goal of creating a framework that can run tests from multiple different QC test schemes (GTSPP, ARGO, QARTOD, etc) since none of those schemes alone will completely satisfy testing requirements in all situations. I think the integration with oceansdb to run Climatology tests and Location at Sea tests is particularly exciting. I hope publication in JOSS will bring this library to the attention of more groups and encourage further use and collaboration to flesh this out.

Thanks! As a side note, the package OceansDB was originally together with CoTeDe, as a single package (since the first generation, ~ 2006), but as CoTeDe grew larger and I realized that some people could use OceansDB outside CoTeDe’s scope, I moved it outside as an independent package (~ 2015).

The library takes in data as a numpy array, so in theory the data could come from any source or format. However you must create a "data object" to wrap your data. Given that there are other mature and well-used libraries with data representations (xarray, pandas, netcdf, etc), why not leverage one of those? Or at least make it easy to use CoTeDe if your data is already in one of those common formats. I created a notebook to try to understand how to run QC tests using the library ( castelao/CoTeDe#42), and I was able to fairly easily load a netcdf file using xarray and run tests, but I got an error when using a pandas dataframe and it was not clear how to debug it. I think a little documentation could go a long way here, to help people get started.

Thanks for sharing your opinion on using xarray or pandas. It’s now clear for me that I need to clarify this in the documentation. This will be addressed at https://cotede.readthedocs.io/en/dev/data_model.html

In summary, I believe that the data model adopted by CoTeDe does not limit the users, but the other way around: it gives more freedom. The package xarray, which was initially based on the netCDF structure, satisfies the data model expected by CoTeDe, thus most of CoTeDe’s functionalities can be used without any change, see the example. Since xarray can read directly netCDF and pandas objects can be converted into xarray objects, all those inputs can be used by CoTeDe, plus any other format that satisfies CoTeDe’s minimalist data model. CoTeDe has been used with SQL databases and MatLab based pipelines, just two examples where less requirements makes a difference.

Review

Items marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation.

Paper:

* Overall this paper is clear and easy to read, and does a good job of describing the motivation and overall functionality of the software

* **Highly recommended:** It would be great to list similar automated QC packages and describe how CoTeDe compares. For example, CoTeDe is more generic than ioos_qc, which depends on pandas/xarray and assumes netCDF integration, and CoTeDe allows integration with a wide range of QC test suites, whereas ioos_qc (currently) only focuses on QARTOD.
  
  * I do understand that this takes time, so I don't want it to block acceptance here. But this is the first question I had when I looked at this library, and would love to see it in the README someday.

I see your point and I will consider that. The brief comparison that you suggested above could be a good start, but what should be said about the similarities between the two packages? For instance, what should we say about QC configuration, which seems like ioss_qc followed a quite similar approach as the one proposed by CoTeDe. Would you agree? On a side note, since this is a highly recommended suggestion, I was expecting that ioos_qc would adopt the same policy, but I could not find any reference to CoTeDe in the ioos_qc documentation.

* Highly recommended, if possible: Include a list of groups actively using this library, to give us an idea of how widely used it is. (this could also be added to the code documentation)

Thanks, that’s a good idea. However, I believe that the choice to be listed belongs to the users. I’ll create a designated space for that on CoTeDe’s website and encourage the users to add themselves to the list.

* Recommended: Describe how to pronounce CoTeDe (though I suppose leaving some amount of mystery is the python way :-) )

At some point I removed that from the documentation, but I’ll place it back.

Documentation:

* **BLOCKER:** The important classes and most of the test methods have docstrings, which is great. But I was not able to find any generated API docs under https://cotede.readthedocs.io/ . [castelao/CoTeDe#41](https://github.com/castelao/CoTeDe/issues/41)

I have expanded the documentation (https://cotede.readthedocs.io/) to include the API reference for the public resources, i.e. everything that a new user would need to know to use CoTeDe. I’m also expanding the notebooks with practical examples to better illustrate how to use the package. I intend to extend the API reference with internal resources that might be useful for developers (https://cotede.readthedocs.io/en/latest/api.html).

Installation

* I was able to install and use the software using the instructions in the repo

* Nice to have: [castelao/CoTeDe#39](https://github.com/castelao/CoTeDe/issues/39)

Thanks for the suggestion. I understand the benefits of source-forge and I’ll definitely add CoTeDe into that in the future, but it is not the priority at the moment. I noticed that ioos_qc uses pip on its notebooks, which is yet another example of how PIP is convenient and functional.

Contributing

* I was able to install dependencies and run tests locally using the instructions in the repo

* Recommended: [castelao/CoTeDe#38](https://github.com/castelao/CoTeDe/pull/38)

* Recommended: [castelao/CoTeDe#40](https://github.com/castelao/CoTeDe/issues/40)

I removed the recommendation on flake8. I do not want to turn back any potential contribution just due to formatting. I’ve been cleaning, refactoring, and formatting the code and I will continue on doing so.

Performance

* I did not find any mention of performance testing or guarantees in the code or documentation

* Highly recommended you add performance tests. During development of https://github.com/ioos/ioos_qc we found that small tweaks to our testing algorithms could lead to severely degraded performance on large datasets. Recommend you add these tests to the travis-ci build and add it to the checklist of items to review if someone submits a PR.

Thank you, that is a very good idea. I've always assumed that by using Python it was implied that speed was not a priority, but there is nothing to lose on having an idea about the relative cost among the different tests. I’ll implement that on the first chance. Note that one of the consistency tests is to guarantee that the ProfileQC object is serializable, so that it can be transported by multiprocessing. On large scale databases I have been running CoTeDe in parallel, a functionality that I would like to move to the public repository on the first chance.

A note, when I read "performance" for a QC procedure the first thing that comes to my mind is not speed but the skill of identifying bad measurements without mistakes. For that, I’ve been adopting consistency tests since the early stages of CoTeDe.

castelao · 2020-03-13T15:08:23Z

Response to @evanleeturner

Conflict of interest

I do not have a conflict of interest in this review. I do maintain internal (non-public) code repository @wdft-coastal that does contain some of the similar functions as presented in the current code but are specific for my dataset and organization.

Overall thoughts:

I am extremely pleased with the premise of this package for organizing QC procedures and being able to run multiple tests. This is a high priority item for my organization and my current data sets, as our own QAQC methods often can conflict with other agencies that follow different QAQC procedures. In the last few months I have been preparing code that can mimic QARTOD style tests on our own data internally, but I wanted the ability to apply alternative test schemes and to skip tests that are not appropriate for our data. Having a maintained package that accomplishes this task is not only helpful but it also greatly increases your data confidence as now the QC methods are maintained by other groups in a transparent mechanism. I fully expect to be implementing this package into our own website and data products at TWDB.

Thanks, I’m glad to hear that we have the same vision on shared public QC procedures and the importance of giving freedom to the operator to decide which tests to apply, including the tunning of each test. Please, don’t hesitate to open issues with requests or discussions that could help you to implement CoTeDe in your project. My vision for the QC configuration was on cases exactly like yours. I’ll improve the documentation on that subject.

I share reviewer one’s sentiment about the philosophy of leveraging other data libraries such as netCDF or pandas. Although our own internal datasets are SQL we do make use of pandas where possible for data manipulation and in the future we will likely be forced to move to netCDF to process very large datasets for QAQC. This functionality is not a show-stopper for the package, but it is a severe limitation for its acceptance in the community.

Thanks for sharing your opinion on using netCDF or pandas. It’s now clear for me that I need to clarify this in the documentation. This will be addressed at https://cotede.readthedocs.io/en/dev/data_model.html
In summary, I believe that the data model adopted by CoTeDe does not limit the users, but the other way around: it gives more freedom. The package xarray, which was initially based on the netCDF structure, satisfies the data model expected by CoTeDe, thus most of CoTeDe’s functionalities can be used without any change, see the example:. Since xarray can read directly netCDF and pandas objects can be converted into xarray objects, all those inputs can be used by CoTeDe as it is now, plus any other format that satisfies CoTeDe’s minimalist data model. I also use CoTeDe on a PostgreSQL DB with more than a TB of data, and instead of pandas.read_sql I use psycopg2 directly. I know that CoTeDe has been also used in a MatLab data pipeline, another example where fewer dependencies facilitate its use, especially for MatLab users without Python experience. I believe that this is a case where less is more, and for CoTeDe there is no need to force dependencies on more packages.

Review

Items marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation.

Paper

* The paper itself was succinct, well-written, and very appropriate to the code.

* **Recommended:**  Please consider adding the functionality of swapping methods for calculating seawater: [castelao/CoTeDe#43](https://github.com/castelao/CoTeDe/issues/43)

Issue #43 raises a very good point. We will certainly find derived variables (salinity, density, sound speed, etc.) estimated with different standards (especially on historical data), which may affect inter-comparisons. That is indeed important but is beyond CoTeDe’s goal, which is intended to evaluate the quality of a dataset. Currently, the only place that I use GSW is to estimate sea water density when density is not directly available to evaluate inversions in the water column. For that purpose, I argue that we should use the best and most recent technique. I do see your point that in some situations it might be useful, if not necessary, to employ the old relations for a correct comparison, but I think that this should be done outside CoTeDe, by the operator, and then used as input for CoTeDe.
Note that I’m strongly considering adopting your suggestion in another package PySeabird, which is meant to parse and estimate derived variables from CTDs.

* **Recommended:** mirror reviewer one that I don’t know how to pronounce CoTeDe! 😊

At some point, I removed that from the documentation, but I’ll put it back.

Documentation:

* **BLOCKER**: Please also complete the issue that was brought by reviewer one concerning the API docs as this would also be extremely helpful for me to implement this code: https://cotede.readthedocs.io/ . [castelao/CoTeDe#41](https://github.com/castelao/CoTeDe/issues/41)

I have expanded the documentation (https://cotede.readthedocs.io/) to include the API reference for the public resources, i.e. everything that a new user would need to know to use CoTeDe. I’m also expanding the notebooks with practical examples to better illustrate how to use the package. I intend to extend the API reference with internal resources, which might be useful for developers (https://cotede.readthedocs.io/en/dev/api.html).

Installation

* I was able to get the package running.

Contributing

* I was able to run the tests.

Performance

* There were no performance tests or statements regarding performance.

* I did not have an issue with performance with my own tests, but I also have very small datasets at the moment.  Additionally, if I were to run this package in a production environment, the performance hit would be negated because our data would be checked as it inserted into our database which would be a heftier cost of CPU cycles.  If I were going to run these tests on the entire database, I would do so offline or during non-peak hours on the production systems in any case.

Thanks for raising that point. I’m adding a section in the documentation about the performance. (https://cotede.readthedocs.io/en/dev/performance.html).
I agree with you, the QC in the inserting phase of the pipeline could cause a bottleneck and propagate serious issues. Something I have done before was to implement a parallel process that would run the QC on the data already inserted in the DB, on-demand, triggered by new inserts. The system that I developed for AOML, back in 2006, loaded the Python QC tests as a PostgreSQL procedural language, so it was an internal procedure with an internal trigger, per profile (Again, easier if require less external packages). That probably results in the best speed for a SQL DB, but it is not necessarily the easiest approach to maintain.

castelao · 2020-03-13T15:10:28Z

@kthyng , could you confirm the next step, please? Is it correct that the reviewers need to checkmark each box in the very first post before we move to the next step? Thanks!

ooo · 2020-03-13T15:10:30Z

👋 Hey @castelao...

Letting you know, @kthyng is currently OOO until Sunday, March 15th 2020. ❤️

evanleeturner · 2020-03-13T15:46:39Z

So I can't seem to edit the first post. When I click on the ... my only options are "copy link" and "quote reply"

danielskatz · 2020-03-13T15:50:30Z

@whedon re-invite @evanleeturner as reviewer

whedon · 2020-03-13T15:50:33Z

The reviewer already has a pending invite.

@evanleeturner please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations

evanleeturner · 2020-03-13T15:51:09Z

I just did. It said the invitation has expired.

kthyng · 2020-04-07T13:07:48Z

@whedon add 10.5281/zenodo.583710 as archive

whedon · 2020-04-07T13:07:51Z

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands

kthyng · 2020-04-07T13:08:14Z

@whedon set 10.5281/zenodo.583710 as archive

whedon · 2020-04-07T13:08:20Z

OK. 10.5281/zenodo.583710 is the archive.

kthyng · 2020-04-07T13:10:02Z

@castelao Can you edit the metadata at your Zenodo archive so that the title and authors on your paper exactly match there?

Also, what is the proper version number for your repository currently?

castelao · 2020-04-07T15:11:15Z

@kthyng I just updated the DOI record itself, thanks for noticing that.

The current version is 0.21.3

kthyng · 2020-04-07T18:28:15Z

@whedon set 10.5281/zenodo.3733959 as archive

whedon · 2020-04-07T18:28:19Z

OK. 10.5281/zenodo.3733959 is the archive.

kthyng · 2020-04-07T18:28:35Z

Looks like the zenodo archive changed numbers too, so I updated that. Let me know if this is incorrect

kthyng · 2020-04-07T18:29:05Z

@whedon set v0.21.3 as version

whedon · 2020-04-07T18:29:09Z

OK. v0.21.3 is the version.

kthyng · 2020-04-07T18:39:16Z

@whedon accept

whedon · 2020-04-07T18:39:19Z

Attempting dry run of processing paper acceptance...

whedon · 2020-04-07T18:39:42Z

Reference check summary:

OK DOIs

- 10.13155/33951 is OK
- 10.1016/j.mio.2014.09.001 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon · 2020-04-07T18:39:56Z

👋 @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof 👉 openjournals/joss-papers#1410

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#1410, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

kthyng · 2020-04-07T18:40:37Z

@whedon accept deposit=true

whedon · 2020-04-07T18:40:41Z

Doing it live! Attempting automated processing of paper acceptance...

whedon · 2020-04-07T18:41:35Z

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

whedon · 2020-04-07T18:41:35Z

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.02063 joss-papers#1411
Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.02063
If everything looks good, then close this review issue.
Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? notify your editorial technical team...

kthyng · 2020-04-07T19:28:35Z

Congratulations to @castelao on your new publication! Many thanks to reviewers @jessicaaustin and @evanleeturner — we have relied on your time and expertise!

whedon · 2020-04-07T19:28:39Z

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.02063/status.svg)](https://doi.org/10.21105/joss.02063)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.02063">
  <img src="https://joss.theoj.org/papers/10.21105/joss.02063/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.02063/status.svg
   :target: https://doi.org/10.21105/joss.02063

This is how it will look in your documentation:

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Volunteering to review for us sometime in the future. You can add your name to the reviewer list here: https://joss.theoj.org/reviewer-signup.html
Making a small donation to support our running costs here: https://numfocus.org/donate-to-joss

whedon added the review label Feb 3, 2020

whedon assigned kthyng Feb 3, 2020

whedon mentioned this issue Feb 3, 2020

[PRE REVIEW]: A Framework to Quality Control Oceanographic Data #1985

Closed

whedon assigned jessicaaustin Feb 5, 2020

whedon assigned evanleeturner Feb 26, 2020

jessicaaustin pushed a commit to jessicaaustin/CoTeDe that referenced this issue Feb 28, 2020

contributing updates

40b11bd

openjournals/joss-reviews#2063

This was referenced Feb 28, 2020

WIP: contributing updates castelao/CoTeDe#38

Closed

WIP: notebook demo: xarray/netcdf/pandas castelao/CoTeDe#42

Closed

whedon added the recommend-accept Papers recommended for acceptance in JOSS. label Apr 7, 2020

whedon added accepted published Papers published in JOSS labels Apr 7, 2020

kthyng closed this as completed Apr 7, 2020

editorialbot mentioned this issue Sep 28, 2023

[PRE REVIEW]: Learning from Crowds with Crowd-Kit #5898

Closed

editorialbot mentioned this issue Nov 30, 2023

[PRE REVIEW]: Quantum Instrumentation Control Kit - Defect Arbitrary Waveform Generator (QICK-DAWG): A Quantum Sensing Control Framework for Quantum Defects #6102

Closed

editorialbot mentioned this issue Feb 2, 2024

[PRE REVIEW]: pyRTC: An open-source Python solution for kHz real-time control of adaptive optics systems #6316

Closed

editorialbot mentioned this issue Aug 28, 2024

[PRE REVIEW]: harmonize-wq: Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats #7135

Closed

editorialbot mentioned this issue Sep 10, 2024

[PRE REVIEW]: DemeterWatch: A Java tool to detect Law of Demeter violations in Java collections #7209

Closed

[REVIEW]: A Framework to Quality Control Oceanographic Data #2063

[REVIEW]: A Framework to Quality Control Oceanographic Data #2063

Comments

whedon commented Feb 3, 2020 • edited Loading

Status

Reviewer instructions & questions

Review checklist for @jessicaaustin

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @evanleeturner

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented Feb 3, 2020

whedon commented Feb 3, 2020

whedon commented Feb 3, 2020

castelao commented Feb 24, 2020

kthyng commented Feb 25, 2020

evanleeturner commented Feb 25, 2020

jessicaaustin commented Feb 25, 2020

jessicaaustin commented Feb 29, 2020

Conflict of interest

Overall thoughts:

Review

castelao commented Mar 2, 2020

evanleeturner commented Mar 3, 2020 • edited Loading

Conflict of interest

Overall thoughts:

Review

castelao commented Mar 4, 2020 • edited Loading

castelao commented Mar 4, 2020

whedon commented Mar 4, 2020

castelao commented Mar 6, 2020

castelao commented Mar 13, 2020

Conflict of interest

Overall thoughts:

Review

castelao commented Mar 13, 2020

Conflict of interest

Overall thoughts:

Review

castelao commented Mar 13, 2020

ooo bot commented Mar 13, 2020

evanleeturner commented Mar 13, 2020

danielskatz commented Mar 13, 2020

whedon commented Mar 13, 2020

evanleeturner commented Mar 13, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

castelao commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

whedon commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

whedon commented Apr 7, 2020

whedon commented Apr 7, 2020

kthyng commented Apr 7, 2020

whedon commented Apr 7, 2020

whedon commented Feb 3, 2020 •

edited

Loading

evanleeturner commented Mar 3, 2020 •

edited

Loading

castelao commented Mar 4, 2020 •

edited

Loading