Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json parser should optionally tolerate malformed json #32291

Open
eiriktsarpalis opened this issue Feb 14, 2020 · 19 comments
Open

System.Text.Json parser should optionally tolerate malformed json #32291

eiriktsarpalis opened this issue Feb 14, 2020 · 19 comments
Labels
area-System.Text.Json wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@eiriktsarpalis
Copy link
Member

Forwarding a report on twitter about System.Text.Json failures when subjected to property-based testing.

It seems that the STJ parser is choking on scenaria that the Newtonsoft.Json implementation is perfectly capable of handling. I've transcribed the original report into a standalone project.

cc @Horusiath

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 14, 2020
@GrabYourPitchforks
Copy link
Member

What is the actual test string it's failing on? The JSON deserializer mandates strict well-formedness by default (unlike Newtonsoft.Json), but there are toggles so that the caller can opt in to more relaxed behavior if necessary.

@eiriktsarpalis
Copy link
Member Author

What is the actual test string it's failing on?

For instance "\"\u0018\"", or any string with improperly escaped characters.

there are toggles so that the caller can opt in to more relaxed behavior if necessary.

Could you point me to that config? I'm looking at JsonDocumentOptions and can't find something relevant.

@GrabYourPitchforks
Copy link
Member

You found the right doc, but it looks like there's no switch for opting in to malformed string handling. There's a switch malformed array handling and a few other cases. If malformed string handling is an important scenario the team should consider an opt in switch for it. It would not be enabled by default, however.

@eiriktsarpalis
Copy link
Member Author

Being able to handle malformed json is important, since it can often come from sources you do not control. I can attest from prior experience that Newtonsoft is excellent at dealing with this.

@GrabYourPitchforks
Copy link
Member

We made a stance that we allow only well-formed payloads by default, both for security and correctness reasons. Newtonsoft is very liberal in what it allows by default (malformed payload processing, cyclic references, invalid UTF-16 generation, comment fudging). When I was doing red team work one of my favorite vectors was using these non-compliant-on-by-default behaviors to exploit applications. :)

When I'm back on a proper desktop I'll modify this item to track adding a switch to relax the string processing behavior.

@eiriktsarpalis eiriktsarpalis changed the title System.Text.Json unable to handle improperly escaped json string literals. System.Text.Json parser should optionally tolerate malformed json Feb 14, 2020
@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Feb 14, 2020

A few other examples of invalid json that occur too frequently in the wild:

@GrabYourPitchforks
Copy link
Member

Looks like you beat me to changing the title! Thanks :)

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Feb 21, 2020
@layomia layomia added this to the 5.0 milestone Feb 21, 2020
@layomia layomia modified the milestones: 5.0.0, Future Jun 20, 2020
@ghost
Copy link

ghost commented Oct 19, 2021

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.

@eiriktsarpalis eiriktsarpalis added wishlist Issue we would like to prioritize, but we can't commit we will get to it yet and removed no-recent-activity labels Oct 22, 2021
@eiriktsarpalis
Copy link
Member Author

Adding to #32291 (comment) we might also consider exposing a setting for tolerating unescaped control characters in string values.

@ClementeGao
Copy link

ClementeGao commented May 19, 2022

@eiriktsarpalis @GrabYourPitchforks When there are special characters in the JSON string, how can I ignore them and how can I configure them
see issue #69502

@p10tyr
Copy link

p10tyr commented Apr 4, 2023

Hello .
Is there any option for Missing quotes in property names - Because apparently that is a normal thing from Python or PHP. I am receiving JSON from a 3rd party like this and I'm stuck now.

I cant use string on API because the content type is JSON so it says failing to convert JSON to string.. yay not happy. I can use object as a last resort because it just tries to convert it to an object and getting "'t' is an invalid start of a property name. Expected a '\"'

Seems like I have to go back to Newtonsoft as I cant find any works arounds. Wasted 2 hours on this issue

@am11
Copy link
Member

am11 commented Nov 6, 2023

Just ran into this limitation while upgrading (~15yo) System.Web.Helpers.Json -based reader code to System.Text.Json.Nodes. I will switch to Newtonsoft.Json instead.

Having laxed parsing option in STJ, which encompasses Newtonsoft && S.W.H.Json behavior, is a definite improvement.

@huoyaoyuan
Copy link
Member

Hitting exactly the behaviors mentioned here. I'm extracting a JSON object from some JavaScript and I can confidently determine its start. It contains unquoted property names, trailing commas, and more JavaScript content after the object.

@CyrusNajmabadi
Copy link
Member

JavaScript should produce legal Json. How is your js producing the value?

@huoyaoyuan
Copy link
Member

The browser engine is just more tolerant about Javascript code.

@CyrusNajmabadi
Copy link
Member

@huoyaoyuan that didn't really answer my question. Can you provide a repro where you're producing a json object in JS using a browser engine and it's not generating legal json?

@huoyaoyuan
Copy link
Member

@CyrusNajmabadi the code I'm parsing is like this:

Object.defineProperty(object1, "p", {
    value: {
        foo : "bar",
        isEnabled : true,
        content : { "title": "title", "id": 123, "Text": "Text" }
    }
});

Yes it's inconsistent about property name, but browser and console just accepts it.

@CyrusNajmabadi
Copy link
Member

That's a JavaScript object, not Json. Json is a subset.

If you serialize that object to Json (using the actual js APIs for that purpose) you'll get Json that can be parsed.

--

To be as clear as possible, Json is not for parsing arbitrary JavaScript. It's for Json objects.

@SteveL-MSFT
Copy link
Contributor

Having options for the JSON document during parsing to be more lenient would be great for interactive users using PowerShell PowerShell/PowerShell#21338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.Json wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
None yet
Development

No branches or pull requests