Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: FieldCompare: A Python package for regression testing simulation results #4905

Closed
editorialbot opened this issue Nov 2, 2022 · 82 comments
Assignees
Labels
accepted published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review TeX Track: 7 (CSISM) Computer science, Information Science, and Mathematics

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Nov 2, 2022

Submitting author: @dglaeser (Dennis Gläser)
Repository: https://gitlab.com/dglaeser/fieldcompare
Branch with paper.md (empty if default branch): feature/paper
Version: 0.1.3
Editor: @danielskatz
Reviewers: @idoby, @WilliamJamieson
Archive: 10.5281/zenodo.7588449

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2"><img src="https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2/status.svg)](https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@idoby & @WilliamJamieson, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @idoby

📝 Checklist for @WilliamJamieson

@editorialbot editorialbot added Python review TeX Track: 7 (CSISM) Computer science, Information Science, and Mathematics labels Nov 2, 2022
@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.88  T=0.14 s (450.6 files/s, 62472.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          46           1154            213           4151
SVG                              2              2              2           1504
XML                              9             48              9            882
Markdown                         2             65              0            224
TeX                              1             19              0            167
YAML                             1             10              1             85
TOML                             1              8              0             52
-------------------------------------------------------------------------------
SUM:                            62           1306            225           7065
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

@editorialbot
Copy link
Collaborator Author

Wordcount for paper.md is 1592

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.advwatres.2011.03.007 is OK
- 10.5281/zenodo.1183562 is OK
- 10.11588/ans.2017.1.27447 is OK
- 10.1007/s00607-008-0004-9 is OK
- 10.1515/jnma-2021-0081 is OK
- 10.1016/j.camwa.2011.04.012 is OK
- 10.25495/7GXK-RD71 is OK
- 10.5281/zenodo.5603255 is OK
- 10.1016/j.cpc.2021.108249 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@danielskatz
Copy link

@idoby and @WilliamJamieson - Thanks for agreeing to review this submission.
This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

Reminder from pre-review issue: @WilliamJamieson won't be available to work on this until the week of 14 Nov

@idoby
Copy link

idoby commented Nov 4, 2022

Review checklist for @idoby

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://gitlab.com/dglaeser/fieldcompare?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@dglaeser) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@WilliamJamieson
Copy link

WilliamJamieson commented Nov 15, 2022

Review checklist for @WilliamJamieson

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://gitlab.com/dglaeser/fieldcompare?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@dglaeser) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@danielskatz
Copy link

@WilliamJamieson - Can you comment on the items in your review that are not checked off? What is needed for you to check them to off? As stated above

reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread.

@danielskatz
Copy link

👋 @idoby - how is your review coming along? Is anything blocking you from continuing?

@WilliamJamieson
Copy link

@WilliamJamieson - Can you comment on the items in your review that are not checked off? What is needed for you to check them to off? As stated above

reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread.

Sorry I did make a couple of issues:

@danielskatz
Copy link

Thanks - I'll add the tag in those.

@danielskatz
Copy link

I guess this isn't going to work well, since these are in gitlab, not github - oh well...

@danielskatz
Copy link

@dglaeser - Can you take a look at the issues @WilliamJamieson created?

@danielskatz
Copy link

👋 @idoby - how is your review coming along? Is anything blocking you from continuing?

@idoby
Copy link

idoby commented Dec 3, 2022

Hi, sorry, life got in the way of this.

I'm posting my concerns here because I think they're too broad for GH/GL issues. In short, I have concerns about the design of this package. While I'm not sure this falls strictly under one or more checklist items, I feel the design severely impairs the usability, maintainability and future extensibility of this package.

My main concerns are:

  1. Much of the code consists of either wrapping existing libraries (e.g. meshio, numpy) or reinventing wheels, e.g. logging. Python already has a perfectly good logging system that serves as the de-facto standard for logging in the language. This will not play nice with projects that customize their logging or use other logging frameworks (e.g. loguru); as a general rule, libraries should not use logging internally at all, or at the very least, accept a standard logger passed by the user.
  2. Despite the fact that the authors declare many interfaces for extensibility, I suspect the package is difficult to extend or adapt. For example, since the authors wrap meshio rigidly as part of their API, they had to submit a PR to meshio to be able to load their own VirtualFluids format (I couldn't find that PR, by the way). I don't feel this is a reasonable design choice. Some projects invent their own formats for valid reasons, and being unable to load in their data unless they patch meshio isn't reasonable in my eyes. Furthermore, some projects (like my own) store data in multiple files and recombine the data upon loading using multiple meshio calls. Or, more generally,
  3. The authors correctly identify that there is nothing coupling mesh I/O to the comparison logic itself, which is the reason this package exists, but their API couples the two very tightly, because sometimes the fields coming from a mesh file must be sorted and transformed before executing comparisons. On the other hand, they acknowledge that it can be useful to run comparisons on arbitrary tabular data with no mesh origin. Overall, this means one cannot mix fields from meshes with fields from tabular data, but this can be useful. For example, one might want to compare values associated with dofs on a mesh while also comparing a global series of values computed by evaluations of functionals, such as lift and drag for aerodynamics.
  4. By wrapping Python's CSV loader, the user is unable to load a CSV with pandas, for example, and use existing pandas functionality to clean/transform the data.

Therefore, I think the authors should consider the following points:

  1. Redesign the library into a pipeline-like API, where I/O is accepted either as a meshio data structure or a pandas dataframe, provide functions to extract/transform fields from a meshio object, then feed into the comparison function. In summary, I'd design two main pipelines:
    • Accept a meshio mesh object from user -> sort, transform, etc -> extract into pandas dataframe -> return dataframe to user -> let the user feed DF into comparator or apply further transformations before doing that, etc
    • Accept an arbitrary dataframe or multiple dataframes from user -> compare -> report
      Much of the above functionality doesn't need to be provided by this package, since it already exists, and the user can do anything they want between stages, such as calls to other libraries. In other words, keep the user in control of the code in their own program.
  2. In addition to the above, provide the current all-in-one API as a facade over the flexible pipeline API for users who don't want to drive manually,
  3. Avoid wrapping other libraries, such as numpy, meshio and others,
  4. Avoid reinventing the wheel; utility functions for numpy etc, are OK, but they should be namespaced separately.

I realize the above represents a major rewrite, but I sincerely believe this will improve your package and raise the chances that people (like me) will use it.

@danielskatz
Copy link

👋 @WilliamJamieson and @dglaeser - please let us know your thoughts on these ☝️ comments

@dglaeser
Copy link

dglaeser commented Dec 5, 2022

Hi @WilliamJamieson, @idoby and @danielskatz, thanks a lot for your thorough review and the helpful comments! I think we'll need some time to adress them, but I'll add a few initial thoughts/responses on the comments by @idoby here (I responded to @WilliamJamieson in some of the open issues already, thanks at this point also for your suggestions!).

First of all, thanks @idoby for the comments, they are highly appreciated! I think that they can help to converge to a solution that is best suited for users that want to interact with the library in different ways (since our own so far has been mainly via the CLI). However, I think it would be good if we can maybe converge on what the main contributions of this package may be. So far, I had the feeling that the main benefits it provides are for rather "high-level" use, but I'd be more than happy to improve usability for different use cases!

  1. "as a general rule, libraries should not use logging internally at all, or at the very least, accept a standard logger passed by the user."

Good point, thanks! I am tempted to remove the logging facilities from the public API, the main purpose was for us to have a configurable way to control the output from the CLI onto the terminal or to the test reports. If we keep this exposed to the user, it would definitely be better to accept standard loggers. Thanks, I'll address this.

  1. "Avoid wrapping other libraries, such as numpy, meshio and others"

Wrapping of numpy was mainly done for internal use such that in the rest of the code we don't do any direct calls to numpy, but keep track of which things we actually need/use while having the opportunity to later move to a different package if required (although very unlikely) with only changing a single file.. Only the factory functions are exposed to the public API to create an instance of an Array, however, users that work with numpy arrays don't have to use them. But I agree, with numpy being basically standard, we can remove the factories from the public API and require users to pass numpy arrays directly. It is unlikely that we will change to anything else than numpy.

I am not sure meshio is really "wrapped". We currently expose a simple reader function that takes a file and returns a container of fields. Meshio is used under the hood, but users should not notice or know about this? We simply use it under the hood because it provides reading functionality for many file formats of interest for us. But users never interact with meshio types, we always return our own container types. I think what's missing for your case is direct support for a meshio Mesh, currently we would require that you create a FieldContainer from that Mesh yourself.

  1. "since the authors wrap meshio rigidly as part of their API, they had to submit a PR..." -> I am not sure I agree here. We could have simply written our own reader for the respective file format, since meshio is not exposed anywhere to the user, it's an implementation detail that we can change as we like without affecting the API as far as I see it. VirtualFluids does not use its own file format. It uses standard VTK, but the cell type (voxel) was not supported by meshio (vtkVoxel support nschloe/meshio#1342). As said, we could have implemented our own reader for this case, but we thought it is better to try to incorporate it in meshio such that the rest of their users benefit from this change. Unfortunately, it seems that PRs are not really addressed there anymore, which is why VirtualFluids still uses their patched version. I agree that it would be nicer if that was not the case, and we are actually starting to consider handling VTK support ourselves (also because VTP is missing, another VTK format we use and for which another pending pull request, since quite some time, exists: vtp support attempt nschloe/meshio#1181).

  2. Regarding your point 3: In our view, the mesh sorting part is one of the main contributions of this package, avoiding "false negative" results for meshes with changing order with otherwise the same field data. I agree that it is not nice to have to differentiate mesh fields and non-mesh fields, where sorting is not enabled. We thought about generalizing this, enabling sorting (e.g. by one of the fields) also for any other data, but there are quite some peculiarities for meshes: some data is associated with points, some with cells, sorting either of them affects only a subset of the fields; for non-conforming mesh output a simple sorting by coordinates is not unique and has to be treated in a special way; ... We couldn't yet come up with an abstraction that unifies all of this, but I'll give it some more thought. For our main use-case, the CLI, we simply exposed this via the respective flags. But I agree that programmatically this distinction is not so nice. On the other hand, as a user of the API you only have to "know" about this in case you want to deactivate sorting. If you deactivate sorting (or have a "sorted" mesh already), you should also be able to compare a mesh file and a tabular data file, I don't see why this shouldn't work.

  3. Regarding your point 4: Again, I think this only applies to the CLI. Since reading is a separate concern than the comparison, if you use fieldcompare programmatically, you can read-in and preprocess the data as you like. But yes, you first have to create Fields, you cannot directly pass in pandas data frames, we currently don't have pandas as dependency at all. But, I am not sure that our package yields much benefit in this use case - the use case in which one is reading in and sorting with pandas - because I believe fuzzy comparison of dataframes can be done directly in pandas? I think so far we had considered only rather high-level programmatic use, providing users with a simple function to read in fields from files, iterating over them and expose field comparisons.

As said, I think your comments really help to improve the programmatic usage scenarios, and that it maybe would be good if we can converge on what the main contributions of this package can be (e.g. for your use case)? Or maybe asked differently, do you think there is a valuable contribution to your use case at all?

As far as I see it, we currently have

a. High-level I/O mechanisms that yield our own container type interface for fields.
b. Comparison classes that accept two such containers and compare all fields in them while allowing for configuration of different aspects of the behaviour.
c. Predicates for comparison of two Arrays (i.e. ndarrays).

Point (a) & (b) make up the CLI basically, which so far has been our own main use case. When skipping (b) and doing array comparisons (c) manually, one can also directly use numpy. So I think the main contributions so far are on the higher levels (and the CLI).

I noticed that our mesh-field-transformations are currently not exposed in the public API. They wrappers around MeshFields that expose a MeshFieldInterace themselves, while exposing the underlying fields in a transformed way. We currently already chain them (sorted points, sorted cells, etc...), and they do the transformations lazily upon request. So I guess if we make sure the transformations are user-friendly and expose them in the public API we would get closer to a potential pipeline design as suggested by you. I like the idea and will try to make it work. The way I see it, this would break up the point (a) I mentioned above in a more obvious way.

@danielskatz, is there a time limit for addressing the points? I feel that this will involve quite some rewriting. But improving the API as suggested by @idoby and adding a proper documentation as suggested by @WilliamJamieson will definitely improve this package a lot.

@idoby
Copy link

idoby commented Dec 5, 2022

Thanks for replying to my comments. My thoughts:

Good point, thanks! I am tempted to remove the logging facilities from the public API, the main purpose was for us to have a configurable way to control the output from the CLI onto the terminal or to the test reports. If we keep this exposed to the user, it would definitely be better to accept standard loggers. Thanks, I'll address this.

In my opinion, the library should should either not use logging at all, or it should let the user pass in a standard logger. I'd prefer the library not to log anything unless a debug mode is enabled explicitly. I don't really see why a library like this should throw lines into my output. I'd prefer a nice function to nicely format the data structures returned by the lib, instead, that I can call whenever I want, send to my server, etc.

Wrapping of numpy was mainly done for internal use such that in the rest of the code we don't do any direct calls to numpy, but keep track of which things we actually need/use while having the opportunity to later move to a different package if required (although very unlikely) with only changing a single file.. Only the factory functions are exposed to the public API to create an instance of an Array, however, users that work with numpy arrays don't have to use them. But I agree, with numpy being basically standard, we can remove the factories from the public API and require users to pass numpy arrays directly. It is unlikely that we will change to anything else than numpy.

Wrapping numpy is not a big issue, but I'd prefer public helper functions to be under fieldcompare.utils.numpy or something, to make their purpose clear. I don't think there's a need to hedge against numpy getting replaced, but I don't mind if you do. It's not a big issue. My main point was about meshio.

I am not sure meshio is really "wrapped". We currently expose a simple reader function that takes a file and returns a container of fields. Meshio is used under the hood, but users should not notice or know about this? We simply use it under the hood because it provides reading functionality for many file formats of interest for us. But users never interact with meshio types, we always return our own container types. I think what's missing for your case is direct support for a meshio Mesh, currently we would require that you create a FieldContainer from that Mesh yourself.

I understand. My issue with this was that you were assuming my data pipeline begins by reading a single file my meshio, so might as well let fieldcompare drive meshio. But, this assumption makes many use cases I want to use, impossible. In any case, the difference between (pseudocode)

mesh = meshio.load(...)
fieldcompare.sort(mesh).compare()

and

fieldcompare.load(filepath).sort().compare()

is a minimal burden to your user, but in return, this enables fieldcompare to integrate into existing data pipelines.
My use case is that I'm loading meshes from several files and then combining them together to form a final mesh with all of my fields present. Sometimes I load only some of the fields, etc, depending on the need. This is partially because I have a massive amount of data and many timesteps, and partially because saving the data this way during the simulation is faster.

  1. "since the authors wrap meshio rigidly as part of their API, they had to submit a PR..." -> I am not sure I agree here. We could have simply written our own reader for the respective file format, since meshio is not exposed anywhere to the user, it's an implementation detail that we can change as we like without affecting the API as far as I see it. VirtualFluids does not use its own file format. It uses standard VTK, but the cell type (voxel) was not supported by meshio (vtkVoxel support nschloe/meshio#1342). As said, we could have implemented our own reader for this case, but we thought it is better to try to incorporate it in meshio such that the rest of their users benefit from this change. Unfortunately, it seems that PRs are not really addressed there anymore, which is why VirtualFluids still uses their patched version. I agree that it would be nicer if that was not the case, and we are actually starting to consider handling VTK support ourselves (also because VTP is missing, another VTK format we use and for which another pending pull request, since quite some time, exists: vtp support attempt nschloe/meshio#1181).

I see. In that case I recommend to add a few words about this in the paper, but it's not as big of a deal as I thought. Since you're importing meshio from Python and not your own baked-in version. So that only requires patching meshio in site-packages, which you'd have to do anyway to add a new cell type. So it's not that bad. I still think if you avoid wrapping meshio then that extra code can live outside of a meshio fork, in the user's own project, and also allow the use of other mesh loaders, like trimesh, but it's not as bad as I originally thought.

In summary, I think avoiding driving the I/O (and specifically meshio) makes fieldcompare much more useful as a library than it currently is, although I agree it doesn't matter for CLI usage. However, I don't think the CLI is the primary contribution here, even if considering CI/CD use cases, because again, your user is likely to have their own data pipeline, use additional libraries, and they might not map data to files 1:1.

  1. Regarding your point 3: In our view, the mesh sorting part is one of the main contributions of this package, avoiding "false negative" results for meshes with changing order with otherwise the same field data. I agree that it is not nice to have to differentiate mesh fields and non-mesh fields, where sorting is not enabled. We thought about generalizing this, enabling sorting (e.g. by one of the fields) also for any other data, but there are quite some peculiarities for meshes: some data is associated with points, some with cells, sorting either of them affects only a subset of the fields; for non-conforming mesh output a simple sorting by coordinates is not unique and has to be treated in a special way; ... We couldn't yet come up with an abstraction that unifies all of this, but I'll give it some more thought. For our main use-case, the CLI, we simply exposed this via the respective flags. But I agree that programmatically this distinction is not so nice. On the other hand, as a user of the API you only have to "know" about this in case you want to deactivate sorting. If you deactivate sorting (or have a "sorted" mesh already), you should also be able to compare a mesh file and a tabular data file, I don't see why this shouldn't work.

I agree that mesh sorting is a great contribution, and one of the two main contributions in your package. I even think I might want to use it without the comparison logic, e.g., to sort the mesh for use in loss functions. So I think fieldcompare could benefit from more flexibility across the board. I
I think the unifying abstraction is the pipeline-like design. Therefore, I'd like to see the ability to use FC as needed in three increasing levels of abstraction:

  1. pipeline building blocks,
  2. easy sort-and-compare facade and
  3. CLI.

In the pipeline-like API, the user "deactivates" sorting in the pipeline API by not calling it :) .
I also like the self-describing comparison operators, by the way. Pretty cool.

  1. Regarding your point 4: Again, I think this only applies to the CLI. Since reading is a separate concern than the comparison, if you use fieldcompare programmatically, you can read-in and preprocess the data as you like. But yes, you first have to create Fields, you cannot directly pass in pandas data frames, we currently don't have pandas as dependency at all. But, I am not sure that our package yields much benefit in this use case - the use case in which one is reading in and sorting with pandas - because I believe fuzzy comparison of dataframes can be done directly in pandas? I think so far we had considered only rather high-level programmatic use, providing users with a simple function to read in fields from files, iterating over them and expose field comparisons.

Good point. This can probably be done in pandas alone and FC isn't needed in that case, though it could be nice if it was possible to extract a Fields from a pandas DF to use with FC, if my code already uses the FC logic for the meshes. It would probably be a good idea to hide a pandas dependency behind a fieldcompare[all] install flag in pip or something. You're right that FC shouldn't pull pandas just for that one function. Anyway, not a big deal and should not be a part of this review.

As said, I think your comments really help to improve the programmatic usage scenarios, and that it maybe would be good if we can converge on what the main contributions of this package can be (e.g. for your use case)? Or maybe asked differently, do you think there is a valuable contribution to your use case at all?

For me, the main contributions are 1) mesh field handling/sorting/transformation and 2) comparison and reporting logic. Both will probably be useful for my use case, but I don't think one interface should be favored over the others. Please consider making all three interfaces I described above equally usable.

I think this also addresses the rest of your comments? I just don't see the CLI as being more or less important than the programmatic interface, and believe the contribution here is greater than just a neat CLI tool.

Either way, I will not block the review on these grounds, if you consider the CLI to be the main use case and the library itself not worth investing in, I respect that decision.

@danielskatz Please consider adding something about design and extensibility to the checklist. The JOSS docs do mention these words, but it would be nice to have them explicitly in the checklist IMO, for anyone who wants to conduct reviews with these considerations in mind.

@danielskatz
Copy link

@idoby

@danielskatz Please consider adding something about design and extensibility to the checklist. The JOSS docs do mention these words, but it would be nice to have them explicitly in the checklist IMO, for anyone who wants to conduct reviews with these considerations in mind.

If you want to suggest changes for JOSS, doing this by an issue in https://github.com/openjournals/joss/issues is probably the best way. If you suggest specific language you would like, that will give us a starting point for discussion.

@dglaeser
Copy link

dglaeser commented Dec 5, 2022

@idoby, thanks a lot again for your detailed message, it helped me to understand some of your points much better. I am happy you bring up usage scenarios that we had not in mind, and I really appreciate the valuable comments/ideas on how to make them possible.

Either way, I will not block the review on these grounds, if you consider the CLI to be the main use case and the library itself not worth investing in, I respect that decision.

As said, our focus was - above all - the CLI, but I would be happy to invest time and effort to improve the API by addressing the points you raised. @danielskatz, do we have a limit on when we have to have this finished?

@danielskatz
Copy link

there's not a strict limit - How long do you think this will take?

@dglaeser
Copy link

dglaeser commented Dec 5, 2022

It is going to be very difficult to address this before Christmas, I think. I'll try, but as usual many things pile up towards the end of the year... January is usually a bit more relaxed, so I am pretty confident that I can propose a version with the points addressed until the end of January (maybe earlier). Would that be too late?

@danielskatz
Copy link

I think that's ok - let's try it and see

@danielskatz
Copy link

@editorialbot recommend-accept

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@danielskatz
Copy link

danielskatz commented Jan 31, 2023

@dglaeser - please check the proof version that is being generated now, to make sure nothing was incorrectly changed in my minor edits, and let me know

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.advwatres.2011.03.007 is OK
- 10.5281/zenodo.1183562 is OK
- 10.11588/ans.2017.1.27447 is OK
- 10.1016/j.camwa.2020.02.012 is OK
- 10.1007/s00607-008-0004-9 is OK
- 10.1016/j.camwa.2020.06.007 is OK
- 10.1515/jnma-2022-0054 is OK
- 10.1016/j.camwa.2011.04.012 is OK
- 10.25495/7GXK-RD71 is OK
- 10.5281/zenodo.5603255 is OK
- 10.1016/j.cpc.2021.108249 is OK
- 10.24355/dbbs.084-202210121528-0 is OK
- 10.5281/zenodo.7469695 is OK
- 10.1109/IEEESTD.1985.82928 is OK
- 10.1007/978-3-642-23099-8 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#3926, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@dglaeser
Copy link

@danielskatz, I stumbled across three things, which I think were introduced in one of your merge requests. I edited them here: https://gitlab.com/dglaeser/fieldcompare/-/merge_requests/195, would you mind double-checking?

I noticed after my modifications in !195 that you had just introduced the changes related to putting the punctuation inside the parentheses. I wasn't aware that this is the rule, but I also couldn't verify that it is with a (quick) google search. The top matches of my search suggest the opposite: https://blog.apastyle.org/apastyle/2013/03/punctuation-junction-periods-and-parentheses.html

But if this is the correct way to go, I can simply kick out the two commits again. I guess the commit fixing the newline should be fine.

@danielskatz
Copy link

the punctuation change I made is correct for most US English, which I think is what is being used here. I don't think the newline matters either way, as both should generate the same PDF, I think

@danielskatz
Copy link

sorry, you are right - the punctuation change I made was wrong.

@dglaeser
Copy link

dglaeser commented Jan 31, 2023

ok, so I merge !195? Everything else looked fine to me.

I noticed the newline while reading the proof version of the pdf, this is what I get:

Screenshot 2023-01-31 at 16 10 25

@danielskatz
Copy link

I guess so - I'm not sure why that is happening, but let's try to fix it

@dglaeser
Copy link

I merged it, the .pdf looks fine to me now.

@danielskatz
Copy link

@editorialbot recommend-accept

Let's try an official preview again. Can you again check this after it completes, and verify that it look good to you?

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.advwatres.2011.03.007 is OK
- 10.5281/zenodo.1183562 is OK
- 10.11588/ans.2017.1.27447 is OK
- 10.1016/j.camwa.2020.02.012 is OK
- 10.1007/s00607-008-0004-9 is OK
- 10.1016/j.camwa.2020.06.007 is OK
- 10.1515/jnma-2022-0054 is OK
- 10.1016/j.camwa.2011.04.012 is OK
- 10.25495/7GXK-RD71 is OK
- 10.5281/zenodo.5603255 is OK
- 10.1016/j.cpc.2021.108249 is OK
- 10.24355/dbbs.084-202210121528-0 is OK
- 10.5281/zenodo.7469695 is OK
- 10.1109/IEEESTD.1985.82928 is OK
- 10.1007/978-3-642-23099-8 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#3927, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@dglaeser
Copy link

@danielskatz looks good to me, thanks a lot!

@danielskatz
Copy link

@editorialbot accept

@editorialbot
Copy link
Collaborator Author

Doing it live! Attempting automated processing of paper acceptance...

@editorialbot
Copy link
Collaborator Author

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

@editorialbot
Copy link
Collaborator Author

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

@editorialbot
Copy link
Collaborator Author

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.04905 joss-papers#3928
  2. Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.04905
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

@editorialbot editorialbot added accepted published Papers published in JOSS labels Jan 31, 2023
@dglaeser
Copy link

thanks a lot for your help @danielskatz, @idoby, @WilliamJamieson !

@idoby
Copy link

idoby commented Jan 31, 2023

@dglaeser Sure thing, congrats! 🥳
Can't wait to use this

@danielskatz
Copy link

Congratulations to @dglaeser (Dennis Gläser) and co-authors!!

And thanks to @idoby and @WilliamJamieson for reviewing!
We couldn't do this without you

@editorialbot
Copy link
Collaborator Author

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.04905/status.svg)](https://doi.org/10.21105/joss.04905)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.04905">
  <img src="https://joss.theoj.org/papers/10.21105/joss.04905/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.04905/status.svg
   :target: https://doi.org/10.21105/joss.04905

This is how it will look in your documentation:

DOI

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review TeX Track: 7 (CSISM) Computer science, Information Science, and Mathematics
Projects
None yet
Development

No branches or pull requests

5 participants