Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about a "curly quotes" safe check #800

Closed
giuliohome opened this issue May 20, 2024 · 3 comments
Closed

Question about a "curly quotes" safe check #800

giuliohome opened this issue May 20, 2024 · 3 comments

Comments

@giuliohome
Copy link

Is it possible to raise an exception when parsing a yaml file that is using left double quotation mark () instead of straight double quotation marks (") as string delimiters? (also somehow related)

Thanks

@giuliohome giuliohome changed the title Question about a "left double quotation mark" safe check Question about a "curly quotes" safe check May 20, 2024
@giuliohome
Copy link
Author

Notice that a rudimentary approach to implement this check on Windows is to open the file without using UTF-8, thus triggering the exception mentioned here.

@nitzmahone
Copy link
Member

nitzmahone commented May 20, 2024

While it seems unlikely we'd ever add the ability to fail on valid documents for arbitrary reasons (that's arguably the job of a linter), it's not hard to implement most any kind of rules like that you want by pre-inspecting or intercepting the token stream yourself:

import re
import yaml

from yaml.tokens import ScalarToken

doc = """
double: "mar"
single: 'mar'
none: mar
leftright: “mar” 
"""

# inspect pre-scan
curly_quoted_scalars: list[ScalarToken] = [tok for tok in yaml.scan(doc) if isinstance(tok, ScalarToken) and re.search('[“”]', tok.value)]

for tok in curly_quoted_scalars:
   print(f'found curly-quote scalar at {tok.start_mark}')

# bake it into a custom loader   
class NoCurlyQuotesSafeLoader(yaml.SafeLoader):
    def get_token(self):
        tok = super().get_token()

        if isinstance(tok, ScalarToken) and re.search('[“”]', tok.value):
            raise ScannerError(problem="this loader does not allow curly-quoted scalars", problem_mark=tok.start_mark)

        return tok

print(yaml.load(doc, Loader=NoCurlyQuotesSafeLoader))

@giuliohome
Copy link
Author

@nitzmahone Excellent! Thank you very much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants