Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

Merged
merged 11 commits into from
Nov 10, 2023

Conversation

RubelMozumder
Copy link

@RubelMozumder RubelMozumder commented Nov 10, 2023

Changes:

  1. term will be accept anything in-between a space and standard
  2. Any kind of URL will be accepted.

Fixes #115

@RubelMozumder RubelMozumder linked an issue Nov 10, 2023 that may be closed by this pull request
@lukaspie
Copy link
Collaborator

Thanks for fixing the original issue that I raised this morning.

In the meantime, I also tested some more edges case and I am not sure that the behavior is the one we intended. For example, the following three xrefs would convert properly:

NXarchive(NXobject):
  (NXentry)entry:
    \@index:
    title:
    experiment_identifier(NX_CHAR):
      doc:
      - |
       Test doc.
      - |                                       # Example0
       xref:   
        spec: spec0
        term: term0
        url: url0"
      - |                                       # Example1
        "    xref:
          spec: spec1
        term: term1
            url: url1"
      - |                                       # Example2
        spec: spec2
        xref:
        term: term2
        url: url2"

Especially the last one seems weird since spec comes before xref. On the other hand, the one that I do expect to work does not convert with AttributeError: 'NoneType' object has no attribute 'group'.

- |                                       # Example3
  xref:
    spec: spec3
    term: term3
    url: url3

I am therefore a bit confused as to what the rules are for using the xref feature. I thought the plan was to have it as described in example 3, i.e., you first check if xref is there and then if the other three keywords are present and properly indented. Can I just write the xref anyway I like?

@RubelMozumder
Copy link
Author

RubelMozumder commented Nov 10, 2023

Thanks for fixing the original issue that I raised this morning.

In the meantime, I also tested some more edges case and I am not sure that the behavior is the one we intended. For example, the following three xrefs would convert properly:

NXarchive(NXobject):
  (NXentry)entry:
    \@index:
    title:
    experiment_identifier(NX_CHAR):
      doc:
      - |
       Test doc.
      - |                                       # Example0
       xref:   
        spec: spec0
        term: term0
        url: url0"
      - |                                       # Example1
        "    xref:
          spec: spec1
        term: term1
            url: url1"
      - |                                       # Example2
        spec: spec2
        xref:
        term: term2
        url: url2"

Especially the last one seems weird since spec comes before xref. On the other hand, the one that I do expect to work does not convert with AttributeError: 'NoneType' object has no attribute 'group'.

- |                                       # Example3
  xref:
    spec: spec3
    term: term3
    url: url3

I am therefore a bit confused as to what the rules are for using the xref feature. I thought the plan was to have it as described in example 3, i.e., you first check if xref is there and then if the other three keywords are present and properly indented. Can I just write the xref anyway I like?

Thank you for creating such a variety of examples. here is the docs

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

@lukaspie
Copy link
Collaborator

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to

    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2

Wouldn't we want to catch that and prevent the user from writing the nyaml like this?

@RubelMozumder
Copy link
Author

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to

    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2

Wouldn't we want to catch that and prevent the user from writing the nyaml like this?

your are absolutely right. In that case we should raise and error (this is your proposal, right). Thanks for bringing it.

@lukaspie
Copy link
Collaborator

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to

    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2

Wouldn't we want to catch that and prevent the user from writing the nyaml like this?

your are absolutely right. In that case we should raise and error (this is your proposal, right). Thanks for bringing it.

I am thinking that maybe it is a good idea to enforce a certain structure:

  • First, one should check for xref as the first string of the doc part. If the doc part does not start with xref, we treat it as a regular text doc. So, xref really is taken as a keyword to indicate that this doc part is about a reference.
  • If xref is there, it should be checked that the next three lines start with one of spec, term, and url (there can be whitespace). Each of them should be present once and only once.
  • If the order is wrong (i.e., spec before xref), we should throw an error.

@domna
Copy link

domna commented Nov 10, 2023

@RubelMozumder @lukaspie I made a suggestion for a new xref parsing function. Can you check and test it? Probably we need to add error catching when yaml cannot be parsed.

Edit: I added an additional test where we can parametrise different inputs. Maybe we should have one for cases where we wan't it to fail, too.

@lukaspie
Copy link
Collaborator

Thanks @domna. Exactly, what I wanted to say is that the xref part should be a valid yaml! This function works as expected for all of the examples outlined above. The tests also look good.

My proposal for the suggested use in the documentation is to use 2-space indentation. Like this, you don't need to use " in the docstrings (which you need if you start the string at the level of - |.

doc:
  - |
    Test doc.
  - | 
    xref:   
      spec: spec0
      term: term0
      url: url0

@domna
Copy link

domna commented Nov 10, 2023

Thanks @domna. Exactly, what I wanted to say is that the xref part should be a valid yaml! This function works as expected for all of the examples outlined above. The tests also look good.

My proposal for the suggested use in the documentation is to use 2-space indentation. Like this, you don't need to use " in the docstrings (which you need if you start the string at the level of - |.

I agree, maybe we should even disallow the usage of " entirely, because the xref property is inherently multiline. When we add a " it actually means that this is contained in the yaml text as is. We can simply remove this by removing the strip('"') for the clean_txt. We just have to update the docs accordingly.

2-space indentation should work. As long as all the keys are lined up any indentation should work.

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

Copy link
Collaborator

@lukaspie lukaspie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@domna
Copy link

domna commented Nov 10, 2023

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

I added some checks for edge cases. Should be all fine now (except that removing of " lets all the other tests fail)
Do we want to fail if the xref docstring starts with " or do we want to allow both (current status)?

Edit: I removed the support for it, because it also introduces artifact's for other docstrings. Starting and ending " are carried over even though they should be kept according to the yaml rules.

@RubelMozumder
Copy link
Author

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

I added some checks for edge cases. Should be all fine now (except that removing of " lets all the other tests fail) Do we want to fail if the xref docstring starts with " or do we want to allow both (current status)?

Edit: I removed the support for it, because it also introduces artifact's for other docstrings. Starting and ending " are carried over even though they should be kept according to the yaml rules.

I see there are a few tests for indivisual correct and wrong one, thanks @domna.
I added the artifact " around the text so that any de-indentation does not affect yaml rules, as each line contains a :, and therefore yaml user should not bother of it.

@RubelMozumder
Copy link
Author

@RubelMozumder @lukaspie I made a suggestion for a new xref parsing function. Can you check and test it? Probably we need to add error catching when yaml cannot be parsed.

Edit: I added an additional test where we can parametrise different inputs. Maybe we should have one for cases where we wan't it to fail, too.

Sorry, before getting there you finished it.

@domna
Copy link

domna commented Nov 10, 2023

I see there are a few tests for indivisual correct and wrong one, thanks @domna. I added the artifact " around the text so that any de-indentation does not affect yaml rules, as each line contains a :, and therefore yaml user should not bother of it.

Doesn't matter for yaml as long as it's inside doc: |. Then basically everything is allowed.

@domna domna removed their request for review November 10, 2023 16:26
@domna domna merged commit 15624c0 into fairmat Nov 10, 2023
6 checks passed
@lukaspie lukaspie deleted the 115-nyaml2nxdl-xref-edge-cases branch June 4, 2024 10:08
lukaspie pushed a commit that referenced this pull request Sep 24, 2024
* Resolving xref issue 115-nyaml2nxdl-xref-edge-cases

* Suggestion for new xref function

* Remove unused re

* Adds test for xref handling

* Use strip short notation

* Adds tests for invalid xref

* update nyaml2nxdl readme

* Add edge cases for xref

* Re-introduce removal of starting "

* Remove unused import

* Remove support for " in xref docstring

---------

Co-authored-by: domna <florian.dobener@physik.hu-berlin.de>
Co-authored-by: Lukas Pielsticker <50139597+lukaspie@users.noreply.github.com>
# Conflicts:
#	dev_tools/nyaml2nxdl/README.md
#	dev_tools/nyaml2nxdl/nyaml2nxdl_backward_tools.py
#	dev_tools/nyaml2nxdl/nyaml2nxdl_forward_tools.py
#	dev_tools/tests/test_nyaml2nxdl.py
lukaspie pushed a commit that referenced this pull request Oct 4, 2024
* Resolving xref issue 115-nyaml2nxdl-xref-edge-cases

* Suggestion for new xref function

* Remove unused re

* Adds test for xref handling

* Use strip short notation

* Adds tests for invalid xref

* update nyaml2nxdl readme

* Add edge cases for xref

* Re-introduce removal of starting "

* Remove unused import

* Remove support for " in xref docstring

---------

Co-authored-by: domna <florian.dobener@physik.hu-berlin.de>
Co-authored-by: Lukas Pielsticker <50139597+lukaspie@users.noreply.github.com>
# Conflicts:
#	dev_tools/nyaml2nxdl/README.md
#	dev_tools/nyaml2nxdl/nyaml2nxdl_backward_tools.py
#	dev_tools/nyaml2nxdl/nyaml2nxdl_forward_tools.py
#	dev_tools/tests/test_nyaml2nxdl.py
lukaspie pushed a commit that referenced this pull request Oct 4, 2024
* Resolving xref issue 115-nyaml2nxdl-xref-edge-cases

* Suggestion for new xref function

* Remove unused re

* Adds test for xref handling

* Use strip short notation

* Adds tests for invalid xref

* update nyaml2nxdl readme

* Add edge cases for xref

* Re-introduce removal of starting "

* Remove unused import

* Remove support for " in xref docstring

---------

Co-authored-by: domna <florian.dobener@physik.hu-berlin.de>
Co-authored-by: Lukas Pielsticker <50139597+lukaspie@users.noreply.github.com>
# Conflicts:
#	dev_tools/nyaml2nxdl/README.md
#	dev_tools/nyaml2nxdl/nyaml2nxdl_backward_tools.py
#	dev_tools/nyaml2nxdl/nyaml2nxdl_forward_tools.py
#	dev_tools/tests/test_nyaml2nxdl.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nyaml2nxdl xref edge cases
3 participants