Keeping QCSchema in sync with QCElemental #68

mattwelborn · 2020-01-23T15:15:08Z

QCSchema lags behind what's actually implemented AND DOCUMENTED in QCElemental. The QCSchema docs are where people look for info, so this creates a misleading impression about what QCSchema is/does in practice.

Some possible solutions:

QCSchema/docs is always what you get from running .schema()/autodoc on the myriad models in QCElemental
QCElemental models represent schema in development, and from time to time we concretize its state into release versions of QCSchema.
QCElemental models are not allowed to change until QCSchema changes.

My favorite option is 1. I'm okay with 2. I think 3 is a bad idea.

@dgasmith @bennybp

The text was updated successfully, but these errors were encountered:

loriab · 2020-01-23T18:45:50Z

+1 option 1

extras field is effectively the "scratch space"/"in development" role

ghutchis · 2020-01-23T22:33:45Z

As an external user, I would strongly push for choice 1.

Promoting option 1, IMHO sends a strong signal that this schema is only appropriate for QCElemental, and it's no longer intended as a cross-program interchange.

Option 3 sends the signal that the community input is driving the schema, but I can understand that's frustrating for QCelemental.

So I would strongly push for choice 2.

loriab · 2020-01-23T23:45:25Z

Promoting option 1, IMHO sends a strong signal that this schema is only appropriate for QCElemental, and it's no longer intended as a cross-program interchange.

That's certainly not what I'd want signaled. Rather, I like option 1 for its single-source-of-truth property. Do you think calling qcel .schema() the docs/source with qcel as a reference implementation and doing the part of option 2 of periodically concretizing and publishing to QCSc versions would assuage that impression?

PRs suggesting changes to the data layout of qcel.models could be better marked with a "Schema" label and held open for comment to make sure changes don't sneak up on the community.

ghutchis · 2020-01-24T01:11:11Z

I feel pretty strongly about this.

This is an independent repo with a set of existing issues, PR, etc. for the schema. IMHO, QCElemental should be a client of this repo (i.e., it depends on the schema). The single source of truth should be here.

Frankly, I'm unlikely to use QCElement because every project already has similar functionality (cclib, Open Babel, Avogadro, etc.). Until I saw this issue, I frankly didn't know it existed.

This repo was created for the express purpose (and declared by its description) of being the home of the QC schema.

Consider also that https://molssi-qc-schema.readthedocs.io/en/latest/auto_topology.html is not a subset of the QCElemental docs.

Please don't make the schema subservient to another project. If you want to use QCElemental to work on development versions, that's fine. But the standard version should be declared here and IMHO, all issues, discussion relevant to the schema should be in this repo, not via some tag in a different repo.

dgasmith · 2020-01-24T14:09:14Z

I think a clear distinction here is that developing in QCElemental with option 1/2 doesn't necessitate using QCElemental as we will be exporting JSON-Schema to a repo (likely here). QCElemental is used to compose the schema, provide a Python-based structure for those who want it, or generate JSON-Schema like we do here. In the above scenarios I would much prefer 2) over 1) to for the comments listed above.

In general, we are finding QCElemental as a driving force for cross-program interchange because there is an implementation behind it that does something useful. A concrete schema implementation that has the bells and whistles needed forms a cornerstone of an ecosystem. We can now compute a dozen programs using QCSchema with QCEngine and I know of ~40M results computed with QCFractal using the Schema. There are now active learning programs, force field fitting toolkits, visualization tools, etc all built on top of the Schema and data repos like Fractal.

For example it would be great if Avogadro picked up QCEngine (which uses CCLib) instead of writing their own quantum chemistry compute routines. You probably need a few more programs with orbitals built in, but thats easily done. Come talk to us.

cryos · 2020-01-24T14:23:59Z

I think if we are to drive adoption then option 3 is the best, with 2 being a more pragmatic second best. I don't think we are going to drop directly using QM codes in Avogadro, but adding something like QCEngine as another option would work assuming it offers the basic capabilities we need. I see the most value in codes adopting QC JSON natively, but there is a lot of value in something like QCEngine doing the translation.

As you know Chemical JSON was developed using something akin to option 1 from the start. I think it has utility, but it is something different to what we set out to achieve with QC JSON. It would be a shame to see the goal of forming a community schema recede, but we knew from the start it was tough to do. MolSSI is the organization with the funding and time to put in to efforts like this.

ghutchis · 2020-01-24T16:01:01Z

@dgasmith - while I agree that you don't have to use QCElemental under option one, that choice implies that the 'ground truth' for the schema falls to that project, not to this repo. I'm repeatedly telling you that sends the wrong message to the community.

If you want to get more community feedback, I'm happy to ping others from the meeting to come here and comment.

Doing something useful with the schema is incredibly important, as is continual development.

I have a pretty big pool of calculations myself (~20M) and Harvard Clean Energy is.. 150M? That shouldn't factor into where and how the schema is standardized.

IMHO option 2 sounds like something you'll accept, and I think that's the pragmatic way forward.

loriab · 2020-01-24T16:30:52Z

My primary goals are to:

Not write schema docs & types twice, where one gets autotranslated into proper JSON and basic sanity checked with pydantic and the other sits here with perhaps my "number"/"float"-type mistakes uncaught.
Collect issues and discussion on schema where the opinions are. That seemed to have dwindled in this repo, but if people are here, so let the discussion be also.

So option 2 is fine by me.

dgasmith · 2020-01-24T18:32:03Z

I see the most value in codes adopting QC JSON natively, but there is a lot of value in something like QCEngine doing the translation.

I completely agree, but this is a very slow process as noted. Just to echo: QCEngine exists to help move the process along and provide demonstrable use cases.

I'm repeatedly telling you that sends the wrong message to the community.

I get that, but I am confused of why this is the case. A move to Elemental seems more like "MolSSI is building tools for a community built schema, let us build out a Python ecosystem for communication that can also be used in any language." Which seems like a reasonable message otherwise we are saying "here is a schema that has no uses or community".

I have a pretty big pool of calculations myself (~20M) and Harvard Clean Energy is.. 150M? That shouldn't factor into where and how the schema is standardized.

Definitely, which is why the number was a pretty minor part of a paragraph. I was attempting to paint a picture that schema is beginning to be accepted by some of the community by having useful libraries that support it.

Collect issues and discussion on schema where the opinions are. That seemed to have dwindled in this repo, but if people are here, so let the discussion be also.

I echo this quite strongly, Elemental has a strong user base that we just haven't been able to build here. If we can kick this repo along let us do it, but we so far haven't been successful at that.

I should note here that we have largely been taking strategy 3 so far where we add first to this repo and then to QCElemental. I think the only place where this isn't the case and we take strategy 2 is a geometry optimization input/output schema where we currently acknowledge that this repo is the single source of truth and we may have to change.

mattwelborn · 2020-01-24T18:58:38Z

Below is a proposal that tries to strike a balance between @loriab's points about correctness and de-duplication, and @ghutchis's points about community engagement, a single source of truth, and the QCSchema existing separate from QCElemental. It sort of falls between options 2 and 3.

Community input

1: The MolSSI/QCSchema repository MUST continue to exist.
1.1: Input from the community about QCSchema MAY be filed as issues on MolSSI/QCSchema.
1.1.1: These issues MUST be linked as issues in MolSSI/QCElemental.
1.1.2: The maintainers of MolSSI/QCSchema MAY encourage discussion to be moved to MolSSI/QCElemental if it is more model-related.
1.2: Input from the community about QCSchema MAY be filed as issues on MolSSI/QCElemental.
1.2.1: These issues MUST be linked as issues in MolSSI/QCSchema if they affect data layout/schema.
1.2.2: These issues MAY be linked as issues in MolSSI/QCSchema if they affect implementation details.
1.2.3: The maintainers of MolSSI/QCElemental MAY encourage discussion to be moved to MolSSI/QCSchema if it is more schema-related.

Releases

2: MolSSI/QCSchema MUST have versioned releases.

QCElemental models

3: Models in MolSSI/QCElemental MUST be marked as to whether they are part of QCSchema.
3.1: Models marked as part of QCSchema implemented in MolSSI/QCElemental/master are the single source of truth. The actual JSONSchema files in MolSSI/QCSchema/master MUST be generated from these models.
3.1.1: Accepted changes to the QCSchema (see section 1) MUST be implemented in MolSSI/QCElemental, and MolSSI/QCSchema MUST be subsequently updated.
3.1.2: PRs that change MolSSI/QCSchema MUST originate in MolSSI/QCElemental.
3.1.2.1: These PRs MUST be cross-linked as issues in MolSSI/QCSchema.

Documentation

3.1.2: The documentation of QCSchema in MolSSI/QCSchema MUST be auto-generated from the QCSchema models in MolSSI/QCElemental.
4: The default documentation on RTD MUST correspond to the latest released version of MolSSI/QCSchema.
4.1: The RTD corresponding to MolSSI/QCSchema/master MAY also be made available.

Items 3.1 and 3.1.1 are intended to establish a single source of truth while keeping MolSSI/QCSchema and MolSSI/QCElemental up to date and synchronized. Ideally, MolSSI/QCSchema would be the single source of truth, but models are a superset of the schema, and the thing that actually gets tested. As a result, we are likely to end up with correct models and incorrect schema.

loriab · 2020-01-25T19:24:19Z

So now we do line-item voting, Security Council veto rules?

Just kidding. Matt's proposal sounds good to me. Only a couple items I want to make sure we're on the same page about:

1.2.1 QCEl MUST cross link to QCSc on when QCEl issues/PRs arise affecting data layout and MAY cross link on those affecting implementation details. So I'd expect Result --> SingleResult QCElemental#155 to need an issue opened here but probably not Sparse Molecules QCElemental#169. For PRs, I can probably put in a linter that checks if PRs will affect the rendered schema, so a cross post can be made. Does this sound right or is it wanted that implementation issues/PRs cross-link, too?
Difference between schema stages.
- preliminary schema, those being pre-implementation like Charges (AKA populations) #67, whose discussion and any PRs (about which we can be more relaxed about strict JSON formatting) belong here in QCSchema
- operational schema, those that are adopted by QCSc and implemented in QCEl, can have issues in either repo (section 1 cross-link guidelines apply) but PRs should be solely in QCEl.
- qcel schema, those like the OptimizationResult that are useful for QCEl and dependent projects but won't or haven't yet been broached to QCSc.

ghutchis · 2020-01-25T20:01:32Z

+1 I think this seems like a pragmatic approach.

ghutchis · 2020-01-25T20:02:27Z

@loriab - If you can add a linter or Github action to check if PRs affect the schema, that would be a great solution.

cryos · 2020-01-25T20:09:24Z

Sounds like a reasonable and pragmatic approach to me too. I personally never thought this would be easy, but I am passionate about seeing it move forward and gain wider adoption as features are added.

cryos · 2020-01-25T20:09:54Z

Apologies for accidentally closing this - mouse slippage...

bennybp · 2020-01-25T20:14:02Z

I am not ignoring this on purpose. I do want to weigh in on this soon, however I am currently traveling and teaching (and there is a lot here to absorb). I will try to come up with some coherent thoughts tonight.

awvwgk · 2020-01-27T20:45:32Z

Thanks for this issue! Similar to @ghutchis I didn't got the connection between QCElemental and QCSchema in the first place. It would be really helpful to have the list from above, or whatever the final form will be, as a contributing guideline for this repository or in some visable place. I for one didn't consider contributing to QCSchema so far because I didn't really knew how.

mattwelborn · 2020-01-28T15:27:58Z

So now we do line-item voting, Security Council veto rules?

Just kidding. Matt's proposal sounds good to me. Only a couple items I want to make sure we're on the same page about:

1.2.1 QCEl MUST cross link to QCSc on when QCEl issues/PRs arise affecting data layout and MAY cross link on those affecting implementation details. So I'd expect MolSSI/QCElemental#155 to need an issue opened here but probably not MolSSI/QCElemental#169. For PRs, I can probably put in a linter that checks if PRs will affect the rendered schema, so a cross post can be made. Does this sound right or is it wanted that implementation issues/PRs cross-link, too?

Difference between schema stages.

preliminary schema, those being pre-implementation like Charges (AKA populations) #67, whose discussion and any PRs (about which we can be more relaxed about strict JSON formatting) belong here in QCSchema

operational schema, those that are adopted by QCSc and implemented in QCEl, can have issues in either repo (section 1 cross-link guidelines apply) but PRs should be solely in QCEl.

qcel schema, those like the OptimizationResult that are useful for QCEl and dependent projects but won't or haven't yet been broached to QCSc.

Yes to all of this. I have updated my earlier comment accordingly.

We should look into making a bot that creates cross-linked issues. For example, if an issue is made on MolSSI/QCSchema titled "Foo", then a corresponding issue gets made on MolSSI/QCElemental titled "QCSchema issue #123: Foo". This issue's body would then just be a link to MolSSI/QCSchema#123. On the QCEl side, we would do the same thing, but only for issues that have the "Schema" tag.

loriab · 2020-01-28T18:32:25Z

We should look into making a bot that creates cross-linked issues. For example, if an issue is made on MolSSI/QCSchema titled "Foo", then a corresponding issue gets made on MolSSI/QCElemental titled "QCSchema issue #123: Foo". This issue's body would then just be a link to MolSSI/QCSchema#123. On the QCEl side, we would do the same thing, but only for issues that have the "Schema" tag.

I wasn't going to bring it up until it was proof-of-principle, but since Matt mentioned it, I'm working on MolSSI/QCElemental#204.

The notion is that there's a GHA that runs on every PR of qcel and generates the json schema files from a list of models (provenance, molecule, etc.). Then it runs a diff against qcsk (QCSchema) master and if there's a change, creates a branch (working up to this point) and PRs that to qcsk and returns red X to qcel. The PR here can be a place for discussion, any changes can be added to the qcel PR, and when auto PR is merged here, then qcel will show green check. This is my first foray into GHA, so I or GHA may still fail, but it's going easily so far. How does the plan sound?

A side issue is that the QCSchema repo is presently a python module. This is for historical reasons of testing and documentation. Since the former is handled at qcel and the latter can be reworked, what does everyone think of the qcsk repo being JSON files instead, like https://github.com/loriab/QCSchema/tree/py2json/qcschema_json ?

coltonbh · 2022-06-10T16:20:48Z

Adding here a link to a PR I created a while back that seeks to standardize the QCSchema models to a common interface. I'm in favor of implementing the models only once, and exporting that json schema from the pydantic models. What I propose in this (draft) PR would be a suggestion for what v2 of the schema could look like: MolSSI/QCElemental#287

Open to comments/suggestions. Thanks!

cryos closed this as completed Jan 25, 2020

cryos reopened this Jan 25, 2020

This was referenced Mar 3, 2020

GitHub Action btwn QCEl and QCSk MolSSI/QCElemental#218

Closed

python module to json files #69

Open

This was referenced Sep 3, 2020

replace schema with autogen from qcelemental #77

Open

QCSchema coordination, testing, and export MolSSI/QCElemental#237

Merged

bgpeyton mentioned this issue Sep 24, 2020

Additional tensorial properties: pair with QCEl#241 #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping QCSchema in sync with QCElemental #68

Keeping QCSchema in sync with QCElemental #68

mattwelborn commented Jan 23, 2020 •

edited

Loading

loriab commented Jan 23, 2020

ghutchis commented Jan 23, 2020 •

edited

Loading

loriab commented Jan 23, 2020

ghutchis commented Jan 24, 2020

dgasmith commented Jan 24, 2020

cryos commented Jan 24, 2020

ghutchis commented Jan 24, 2020

loriab commented Jan 24, 2020

dgasmith commented Jan 24, 2020

mattwelborn commented Jan 24, 2020 •

edited

Loading

loriab commented Jan 25, 2020

ghutchis commented Jan 25, 2020

ghutchis commented Jan 25, 2020

cryos commented Jan 25, 2020

cryos commented Jan 25, 2020 •

edited

Loading

bennybp commented Jan 25, 2020

awvwgk commented Jan 27, 2020

mattwelborn commented Jan 28, 2020

loriab commented Jan 28, 2020

coltonbh commented Jun 10, 2022

Keeping QCSchema in sync with QCElemental #68

Keeping QCSchema in sync with QCElemental #68

Comments

mattwelborn commented Jan 23, 2020 • edited Loading

loriab commented Jan 23, 2020

ghutchis commented Jan 23, 2020 • edited Loading

loriab commented Jan 23, 2020

ghutchis commented Jan 24, 2020

dgasmith commented Jan 24, 2020

cryos commented Jan 24, 2020

ghutchis commented Jan 24, 2020

loriab commented Jan 24, 2020

dgasmith commented Jan 24, 2020

mattwelborn commented Jan 24, 2020 • edited Loading

Community input

Releases

QCElemental models

Documentation

loriab commented Jan 25, 2020

ghutchis commented Jan 25, 2020

ghutchis commented Jan 25, 2020

cryos commented Jan 25, 2020

cryos commented Jan 25, 2020 • edited Loading

bennybp commented Jan 25, 2020

awvwgk commented Jan 27, 2020

mattwelborn commented Jan 28, 2020

loriab commented Jan 28, 2020

coltonbh commented Jun 10, 2022

mattwelborn commented Jan 23, 2020 •

edited

Loading

ghutchis commented Jan 23, 2020 •

edited

Loading

mattwelborn commented Jan 24, 2020 •

edited

Loading

cryos commented Jan 25, 2020 •

edited

Loading