Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

RubelMozumder · 2023-11-10T11:10:13Z

Changes:

term will be accept anything in-between a space and standard
Any kind of URL will be accepted.

Fixes #115

lukaspie · 2023-11-10T13:17:04Z

Thanks for fixing the original issue that I raised this morning.

In the meantime, I also tested some more edges case and I am not sure that the behavior is the one we intended. For example, the following three xrefs would convert properly:

NXarchive(NXobject):
  (NXentry)entry:
    \@index:
    title:
    experiment_identifier(NX_CHAR):
      doc:
      - |
       Test doc.
      - |                                       # Example0
       xref:   
        spec: spec0
        term: term0
        url: url0"
      - |                                       # Example1
        "    xref:
          spec: spec1
        term: term1
            url: url1"
      - |                                       # Example2
        spec: spec2
        xref:
        term: term2
        url: url2"

Especially the last one seems weird since spec comes before xref. On the other hand, the one that I do expect to work does not convert with AttributeError: 'NoneType' object has no attribute 'group'.

- |                                       # Example3
  xref:
    spec: spec3
    term: term3
    url: url3

I am therefore a bit confused as to what the rules are for using the xref feature. I thought the plan was to have it as described in example 3, i.e., you first check if xref is there and then if the other three keywords are present and properly indented. Can I just write the xref anyway I like?

RubelMozumder · 2023-11-10T13:29:50Z

Thanks for fixing the original issue that I raised this morning.

In the meantime, I also tested some more edges case and I am not sure that the behavior is the one we intended. For example, the following three xrefs would convert properly:
NXarchive(NXobject):
  (NXentry)entry:
    \@index:
    title:
    experiment_identifier(NX_CHAR):
      doc:
      - |
       Test doc.
      - |                                       # Example0
       xref:   
        spec: spec0
        term: term0
        url: url0"
      - |                                       # Example1
        "    xref:
          spec: spec1
        term: term1
            url: url1"
      - |                                       # Example2
        spec: spec2
        xref:
        term: term2
        url: url2"
Especially the last one seems weird since spec comes before xref. On the other hand, the one that I do expect to work does not convert with AttributeError: 'NoneType' object has no attribute 'group'.
- |                                       # Example3
  xref:
    spec: spec3
    term: term3
    url: url3
I am therefore a bit confused as to what the rules are for using the xref feature. I thought the plan was to have it as described in example 3, i.e., you first check if xref is there and then if the other three keywords are present and properly indented. Can I just write the xref anyway I like?

Thank you for creating such a variety of examples. here is the docs

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

lukaspie · 2023-11-10T13:34:23Z

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to

    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2

Wouldn't we want to catch that and prevent the user from writing the nyaml like this?

RubelMozumder · 2023-11-10T13:59:07Z

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to
    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2
Wouldn't we want to catch that and prevent the user from writing the nyaml like this?

your are absolutely right. In that case we should raise and error (this is your proposal, right). Thanks for bringing it.

lukaspie · 2023-11-10T14:09:23Z

But the last one does not make any sense because xref should be in the upper hierarchy or administrative location. xref refers that it has some amount of reference information which are spec, term and URL probably more other info may be added later. But reshuffling in spec, term and URL makes sense.

I agree that example 2 above does not make sense, but even this one converts to
    This concept is related to term `term2`_ of the spec2 standard.
.. _term2: url2
Wouldn't we want to catch that and prevent the user from writing the nyaml like this?
your are absolutely right. In that case we should raise and error (this is your proposal, right). Thanks for bringing it.

I am thinking that maybe it is a good idea to enforce a certain structure:

First, one should check for xref as the first string of the doc part. If the doc part does not start with xref, we treat it as a regular text doc. So, xref really is taken as a keyword to indicate that this doc part is about a reference.
If xref is there, it should be checked that the next three lines start with one of spec, term, and url (there can be whitespace). Each of them should be present once and only once.
If the order is wrong (i.e., spec before xref), we should throw an error.

domna · 2023-11-10T14:10:57Z

@RubelMozumder @lukaspie I made a suggestion for a new xref parsing function. Can you check and test it? Probably we need to add error catching when yaml cannot be parsed.

Edit: I added an additional test where we can parametrise different inputs. Maybe we should have one for cases where we wan't it to fail, too.

lukaspie · 2023-11-10T14:50:49Z

Thanks @domna. Exactly, what I wanted to say is that the xref part should be a valid yaml! This function works as expected for all of the examples outlined above. The tests also look good.

My proposal for the suggested use in the documentation is to use 2-space indentation. Like this, you don't need to use " in the docstrings (which you need if you start the string at the level of - |.

doc:
  - |
    Test doc.
  - | 
    xref:   
      spec: spec0
      term: term0
      url: url0

domna · 2023-11-10T15:10:57Z

Thanks @domna. Exactly, what I wanted to say is that the xref part should be a valid yaml! This function works as expected for all of the examples outlined above. The tests also look good.

My proposal for the suggested use in the documentation is to use 2-space indentation. Like this, you don't need to use " in the docstrings (which you need if you start the string at the level of - |.

I agree, maybe we should even disallow the usage of " entirely, because the xref property is inherently multiline. When we add a " it actually means that this is contained in the yaml text as is. We can simply remove this by removing the strip('"') for the clean_txt. We just have to update the docs accordingly.

2-space indentation should work. As long as all the keys are lined up any indentation should work.

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

lukaspie

lgtm

domna · 2023-11-10T15:47:21Z

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

I added some checks for edge cases. Should be all fine now (except that removing of " lets all the other tests fail)
Do we want to fail if the xref docstring starts with " or do we want to allow both (current status)?

Edit: I removed the support for it, because it also introduces artifact's for other docstrings. Starting and ending " are carried over even though they should be kept according to the yaml rules.

RubelMozumder · 2023-11-10T16:07:29Z

However, we still have the problem if there are duplicate keys. While yaml says keys should be unique pyyaml just takes the last value. It's a known bug/feature yaml/pyyaml#165 (the community seems to not have agreed yet whether this is a bug or a feature). But I think we could add a simple check (split to lines and check if len(flatten(xref_dict)) is the same - that way we also avoid nested keys).

I added some checks for edge cases. Should be all fine now (except that removing of " lets all the other tests fail) Do we want to fail if the xref docstring starts with " or do we want to allow both (current status)?

Edit: I removed the support for it, because it also introduces artifact's for other docstrings. Starting and ending " are carried over even though they should be kept according to the yaml rules.

I see there are a few tests for indivisual correct and wrong one, thanks @domna.
I added the artifact " around the text so that any de-indentation does not affect yaml rules, as each line contains a :, and therefore yaml user should not bother of it.

RubelMozumder · 2023-11-10T16:08:52Z

@RubelMozumder @lukaspie I made a suggestion for a new xref parsing function. Can you check and test it? Probably we need to add error catching when yaml cannot be parsed.

Edit: I added an additional test where we can parametrise different inputs. Maybe we should have one for cases where we wan't it to fail, too.

Sorry, before getting there you finished it.

domna · 2023-11-10T16:10:36Z

I see there are a few tests for indivisual correct and wrong one, thanks @domna. I added the artifact " around the text so that any de-indentation does not affect yaml rules, as each line contains a :, and therefore yaml user should not bother of it.

Doesn't matter for yaml as long as it's inside doc: |. Then basically everything is allowed.

* Resolving xref issue 115-nyaml2nxdl-xref-edge-cases * Suggestion for new xref function * Remove unused re * Adds test for xref handling * Use strip short notation * Adds tests for invalid xref * update nyaml2nxdl readme * Add edge cases for xref * Re-introduce removal of starting " * Remove unused import * Remove support for " in xref docstring --------- Co-authored-by: domna <florian.dobener@physik.hu-berlin.de> Co-authored-by: Lukas Pielsticker <50139597+lukaspie@users.noreply.github.com> # Conflicts: # dev_tools/nyaml2nxdl/README.md # dev_tools/nyaml2nxdl/nyaml2nxdl_backward_tools.py # dev_tools/nyaml2nxdl/nyaml2nxdl_forward_tools.py # dev_tools/tests/test_nyaml2nxdl.py

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases

fbaee6e

RubelMozumder linked an issue Nov 10, 2023 that may be closed by this pull request

nyaml2nxdl xref edge cases #115

Closed

RubelMozumder requested review from domna and lukaspie November 10, 2023 11:10

Suggestion for new xref function

01c7b20

domna added 4 commits November 10, 2023 15:11

Remove unused re

f70741c

Adds test for xref handling

783618f

Use strip short notation

29531ac

Adds tests for invalid xref

bbb0578

lukaspie and others added 2 commits November 10, 2023 16:36

update nyaml2nxdl readme

165dcbb

Add edge cases for xref

71214e7

lukaspie approved these changes Nov 10, 2023

View reviewed changes

Re-introduce removal of starting "

b75592e

domna added 2 commits November 10, 2023 16:48

Remove unused import

1d259b7

Remove support for " in xref docstring

c22a370

domna removed their request for review November 10, 2023 16:26

domna merged commit 15624c0 into fairmat Nov 10, 2023
6 checks passed

lukaspie deleted the 115-nyaml2nxdl-xref-edge-cases branch June 4, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

RubelMozumder commented Nov 10, 2023 •

edited by domna

Loading

lukaspie commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023 •

edited

Loading

lukaspie commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023

lukaspie commented Nov 10, 2023

domna commented Nov 10, 2023 •

edited

Loading

lukaspie commented Nov 10, 2023

domna commented Nov 10, 2023 •

edited

Loading

lukaspie left a comment

domna commented Nov 10, 2023 •

edited

Loading

RubelMozumder commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023

domna commented Nov 10, 2023

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

Resolving xref issue 115-nyaml2nxdl-xref-edge-cases #116

Conversation

RubelMozumder commented Nov 10, 2023 • edited by domna Loading

lukaspie commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023 • edited Loading

lukaspie commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023

lukaspie commented Nov 10, 2023

domna commented Nov 10, 2023 • edited Loading

lukaspie commented Nov 10, 2023

domna commented Nov 10, 2023 • edited Loading

lukaspie left a comment

Choose a reason for hiding this comment

domna commented Nov 10, 2023 • edited Loading

RubelMozumder commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023

domna commented Nov 10, 2023

RubelMozumder commented Nov 10, 2023 •

edited by domna

Loading

RubelMozumder commented Nov 10, 2023 •

edited

Loading

domna commented Nov 10, 2023 •

edited

Loading

domna commented Nov 10, 2023 •

edited

Loading

domna commented Nov 10, 2023 •

edited

Loading