Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

guidoiaquinti · 2016-02-25T11:51:40Z

A string of digits separated by comma (without quotes) is evaluated as integer. Is it intentional?

irb(main):001:0> Psych.load("key: 123,456")
=> {"key"=>123456}

irb(main):002:0> Psych.load("key: 123456,7890")
=> {"key"=>1234567890}

I understand that the first example can be seen as a number in American notation but the 2nd example is not following any notation/standard that I'm aware of. I was expecting to have both of the values evaluated as string.

With the quotes it works as expected:

irb(main):003:0> Psych.load("key: \"123,456\"")
=> {"key"=>"123,456"}

irb(main):004:0> Psych.load("key: \"123456,7890\"")
=> {"key"=>"123456,7890"}

Thanks

The text was updated successfully, but these errors were encountered:

tenderlove · 2017-03-07T16:47:49Z

Apparently Japan and China will separate at four digits, but to be honest Psych internals don't care, they just remove all commas and cast to a number. The only reason for this is legacy support for Syck documents. I agree that both values should evaluate to strings, but I'm not sure how to do that and support old formats.

- Otherwise multiline values break like crazy - Update specs that would fail otherwise, to show that you must quote numbers containing commas - See ruby/psych#273 for why..

Jell · 2020-11-04T19:31:36Z

I've hit this issue today after trying to parse some YAML output by another tool (kustomize build in my case, but not that relevant). I think as YAML becomes more and more ubiquitous it might be a good idea to provide an option for "strict" parsing according to the standard?

It seems to me that we use those regexp to decide whether or not to parse a Scalar as a Float or Integer:

psych/lib/psych/scalar_scanner.rb

Lines 11 to 20 in 9f8c365

    
               # Taken from http://yaml.org/type/float.html 
        
               FLOAT = /^(?:[-+]?([0-9][0-9_,]*)?\.[0-9]*([eE][-+][0-9]+)?(?# base 10) 
        
                         |[-+]?\.(inf|Inf|INF)(?# infinity) 
        
                         |\.(nan|NaN|NAN)(?# not a number))$/x 
        
               # Taken from http://yaml.org/type/int.html 
        
               INTEGER = /^(?:[-+]?0b[0-1_,]+          (?# base 2) 
        
                             |[-+]?0[0-7_,]+           (?# base 8) 
        
                             |[-+]?(?:0|[1-9][0-9_,]*) (?# base 10) 
        
                             |[-+]?0x[0-9a-fA-F_,]+    (?# base 16))$/x

I did this patch on my side as a workaround, which solved the issue I had:

Psych::ScalarScanner.send(:remove_const, "INTEGER")
Psych::ScalarScanner::INTEGER = /^(?:[-+]?0b[0-1_]+          (?# base 2)
                                    |[-+]?0[0-7_]+           (?# base 8)
                                    |[-+]?(?:0|[1-9][0-9_]*) (?# base 10)
                                    |[-+]?0x[0-9a-fA-F_]+    (?# base 16))$/x

(basically replacing the regexp with a subset of the standard one from from http://yaml.org/type/int.html)

Would it be possible to have say two constants:

INTEGER_ENRICHED=/.../
INTEGER_STANDARD=/.../

And chose one or the other based on a flag to keep backward compatibility but offer a stricter/more standard alternative? I can offer to work on a patch if that can help.

sethboyles · 2021-04-13T21:07:42Z

We would love to see this fixed as well. The current parsing logic does not conform to the YAML specification, so we believe something like @Jell's proposed solution should be considered. Would the maintainers (@tenderlove) of Psych be amenable to a PR implementing the above solution or similar?

sethboyles · 2021-08-17T17:28:25Z

@tenderlove Do you have a stance on whether or not you would accept a PR that adds a flag to change the behavior to something stricter? We are willing to work on this, but don't want to get started if you don't think it would be worthwhile.

tenderlove · 2021-08-19T18:39:47Z

@tenderlove Do you have a stance on whether or not you would accept a PR that adds a flag to change the behavior to something stricter? We are willing to work on this, but don't want to get started if you don't think it would be worthwhile.

Yes, I'm definitely open to it. There was a PR submitted here for strict hash keys. I made a commit here that demonstrates how to accomplish it.

Maybe this commit should just be extended to be "strict parsing" or something (rather than just specific to hash keys).

sethboyles · 2022-01-14T20:13:31Z

@tenderlove I created a PR that implements this (#537). Let me know what you think!

hsbt · 2022-10-20T06:01:59Z

#537 fixed this issue.

JustinAiken mentioned this issue Oct 15, 2019

Revert to yaml style loading.. usertesting/biscuit#7

Merged

sethboyles mentioned this issue Apr 13, 2021

numbers with commas in space manifests get parsed as an integer, not a string cloudfoundry/cloud_controller_ng#2193

Closed

sergiogomez mentioned this issue Jun 14, 2021

Add quotes to multiple postcodes. OpenCageData/address-formatting#75

Merged

sethboyles mentioned this issue Jan 14, 2022

Add strict_integer option to parse numbers with commas as strings #537

Merged

sethboyles mentioned this issue May 12, 2022

release Psych 4.0.4? #561

Closed

hsbt closed this as completed Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

guidoiaquinti commented Feb 25, 2016

tenderlove commented Mar 7, 2017

Jell commented Nov 4, 2020

sethboyles commented Apr 13, 2021

sethboyles commented Aug 17, 2021

tenderlove commented Aug 19, 2021

sethboyles commented Jan 14, 2022 •

edited

Loading

hsbt commented Oct 20, 2022

Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

Comments

guidoiaquinti commented Feb 25, 2016

tenderlove commented Mar 7, 2017

Jell commented Nov 4, 2020

sethboyles commented Apr 13, 2021

sethboyles commented Aug 17, 2021

tenderlove commented Aug 19, 2021

sethboyles commented Jan 14, 2022 • edited Loading

hsbt commented Oct 20, 2022

sethboyles commented Jan 14, 2022 •

edited

Loading