Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse other map keys after a sequence if the sequence items are not indented #28

Closed
leoetlino opened this issue Jan 22, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@leoetlino
Copy link
Contributor

First, let me thank you for making this library! I am currently in the process of switching from PyYAML to ryml for performance reasons, and so far the initial results are very encouraging. ryml is typically 20-25x faster for emitting and one file that took 2s to emit with PyYAML or yaml-cpp only takes ~90ms with ryml.

However, parsing existing documents that were emitted by PyYAML (or libyaml) is proving to be a bit more problematic.

Currently any documents that look like this fail to parse with ryml:

enemy: #L1
- actors: #L2
  - {name: Enemy_Bokoblin_Junior, value: 4.0} #L3
  - {name: Enemy_Bokoblin_Middle, value: 16.0} #L4
  - {name: Enemy_Bokoblin_Senior, value: 32.0} #L5
  - {name: Enemy_Bokoblin_Dark, value: 48.0} #L6
  species: BokoblinSeries #L7
# ...
/home/leo/projects/oead/src/lib/rapidyaml/src/c4/yml/parse.cpp:597: ERROR parsing yml: parse error
line 7: '  species: BokoblinSeries' (sz=25)
           ^~~~~~~~~~~~~~~~~~~~~~~  (cols 3-26)
top state: RSEQ|RNXT

If the actors array is indented as follows:

enemy:
- actors:
    - {name: Enemy_Bokoblin_Junior, value: 4.0}
    - {name: Enemy_Bokoblin_Middle, value: 16.0}
    - {name: Enemy_Bokoblin_Senior, value: 32.0}
    - {name: Enemy_Bokoblin_Dark, value: 48.0}
  species: BokoblinSeries

then the file is parsed correctly. It seems that the lack of extra indentation for the array is throwing ryml off and causing it to miss the end of the sequence?

@cburgard
Copy link
Contributor

Possibly not even related to map keys. I get the same message here:

meas:
  createParameterList:
    - Lumi
      value: 1
      relErr: 0.1
    - mu
    - alpha_syst1
    - mu
      value: 1
      low: 0
      high: 3
      const: 0
ERROR parsing yml: parse error - indentation should not increase at this point
line 4: '      value: 1' (sz=14)
         ^~~~~~~~~~~~~~  (cols 1-15)

@leoetlino
Copy link
Contributor Author

Hmm, well I'm not sure that document is well formed? Is each sequence/array item supposed to be a map? If so, you'd need to make "Lumi" and the last "mu" either keys or values. For what it's worth, PyYAML doesn't like it either.

@cburgard
Copy link
Contributor

Well, the document was written with rapidyaml. So irrespective of whether it's valid YAML or not, I'd hope it would at least be able to parse back what it emitted :-)

@leoetlino
Copy link
Contributor Author

Heh, indeed, though that really does sound more like a bug with the emitter as the YAML is invalid.

@biojppm
Copy link
Owner

biojppm commented Jan 25, 2020

@leoetlino Thanks for writing up.

Your YAML is valid indeed, so this is a parser bug. The parser code in ryml has trouble with the first non-indented sequence item on line 2. From there on, all bets are off. So you are right that indenting the sequence will allow ryml to parse it correctly.

@biojppm
Copy link
Owner

biojppm commented Jan 25, 2020

@cburgard - I've filed a separate bug #31 to track that problem; thanks for writing up.

@leoetlino
Copy link
Contributor Author

So I did a small test, and it looks like changing Parser::_handle_seq_impl (in the RNXT state) to fall back to assuming the sequence was unindented and ending it lets ryml parse the example document correctly:

else if(rem.begins_with("---"))
{
_c4dbgp("got document start '---'");
_start_new_doc(rem);
return true;
}
else
{
_c4err("parse error");
}

-            _c4err("parse error");
+            _c4dbgp("end the indentless sequence");
+            _pop_level();
+            return true;
enemy:
- actors: 1
  test_array:
    - a
  test: x
test: 2

parses correctly into this:

enemy:
  -
    actors: 1
    test_array:
      - a
    test: x
test: 2

That is most likely just a hacky workaround though. I'm not sure how to fix this correctly.

@biojppm biojppm added the bug Something isn't working label Jan 27, 2020
@biojppm
Copy link
Owner

biojppm commented Jan 27, 2020

@leoetlino thanks for the note - it did help. All the unit tests pass, so I'm going to go with that.

This is in the dev branch, and I'm going to look next at #32 in the same branch. When it's fixed, I'll merge dev to master, and only then will I close.

biojppm added a commit that referenced this issue Jan 29, 2020
Fixes #28 #32 . Prepare emit refactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants