Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Switch to JSON #3

Open
digama0 opened this issue Jan 1, 2024 · 5 comments
Open

[RFC] Switch to JSON #3

digama0 opened this issue Jan 1, 2024 · 5 comments

Comments

@digama0
Copy link
Collaborator

digama0 commented Jan 1, 2024

The concrete syntax of the export format is troublesome to parse, being a homegrown textual format with a heavy reliance on newline separators and with lots of "clever" sequence encodings. Moreover, the encoding of names is unquoted, which is just plain wrong in the presence of names with escaped characters, because these names can include arbitrary characters, including newlines, keywords and everything else - a classic SQL injection attack.

I propose we drop this ad hoc encoding entirely and switch to a JSON-based format. This is much easier to get right, and libraries for doing the parsing are numerous (but it's also feasible to write the parser directly).

@ammkrn
Copy link

ammkrn commented Jan 1, 2024

Fwiw I'm not opposed to the idea; while the current format is really easy to write a parser for, JSON would certainly be more accessible for people/projects that don't want to write a parser in the first place. I haven't looked deeply into the name escaping issue yet, but I'll take your word for it that it's a potential source of problems.

I'm interested to see what you come up with.

@adomasbaliuka
Copy link

adomasbaliuka commented Aug 23, 2024

JSON has a quite complicated spec with lots of extensions which aren't implemented exactly the same (or strictly according to the spec) by different libraries.

Perhaps something slightly more limited and more standardized would be better? (Ideally one would also use a formally verified parser, which after searching for a bit I'm still not quite sure about the status of for JSON)

Perhaps a subset of JSON would work? (Which one would specify separately for the purpose of lean4export.)

@digama0
Copy link
Collaborator Author

digama0 commented Aug 24, 2024

I'm not sure what you are referring to, JSON is an exceedingly simple format. The only part I'm aware of tricky bits in json is for representation of large integers and floating point, but I don't think these are likely to be issues since we don't need floats at all and it is unlikely for numbers to get that large except when representing lean bignums, and string encoding these solves the problem completely.

@adomasbaliuka
Copy link

All I'm saying is that there isn't an ironclad standard for how to parse JSON and many (also "standard") parser implementations differ. Some examples: here.

I thought maybe it's important for this export format to have an ironclad standard. Am I wrong? I guess it's also reasonable to say something like "let the parser fail on exported valid lean proofs in extremely rare cases (e.g. if they contain too big numbers), let it crash on malformed input, let it output whatever it wants, the kernel will check everything in the end and that's all that matters".

@digama0
Copy link
Collaborator Author

digama0 commented Aug 24, 2024

All I'm saying is that there isn't an ironclad standard for how to parse JSON and many (also "standard") parser implementations differ. Some examples: here.

This is not true. What that page shows is that there exist multiple documents that specify JSON, and parsers sometimes accept more or less than the spec due to implementation limits or "extensions". I don't see why any of this matters, I think you should be more specific.

Regarding having an ironclad standard, I'm not seeing anything here which prevents having such. But this is a proof format, which means that it actually doesn't matter if there are edge cases which are interpreted oddly, because proof checkers are allowed to spuriously fail for implementation limits reasons or even just not liking the shape of the proof. That's a quality-of-implementation issue, not a correctness issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants