Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose line number and byte position in line as public properties on Utf8JsonReader #28482

Open
ahsonkhan opened this issue Jan 23, 2019 · 21 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@ahsonkhan
Copy link
Member

We currently have line number and byte position in line as internal fields of the JsonReaderState and Utf8JsonReader that are used as part of creating the JsonReaderException, when reading invalid JSON.

    public sealed partial class JsonReaderException : System.Exception
    {
        public JsonReaderException(string message, long lineNumber, long bytePositionInLine) { }
        public long BytePositionInLine { get { throw null; } }
        public long LineNumber { get { throw null; } }
        public override void GetObjectData(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) { }
    }

API addition:

public ref partial struct Utf8JsonReader
{
    public long LineNumber { get { throw null; } }
    public long BytePositionInLine { get { throw null; } }
}

If a user wants schema validation (rather than checking if the JSON is valid according to spec) and decides to throw their own exception, exposing such information would be useful.

One such scenario is M.E.DependencyModel, where the caller is throwing a FormatException (for example if an array is expected or the first token is expected to be a start of an object):

Question:

  • Should we expose it from the Utf8JsonReader, OR from the JsonReaderState, OR both? I don't see the need to expose it from the JsonReaderState, currently.

cc @eerhardt, @JamesNK

@CodeTherapist
Copy link
Contributor

I would like to do this :)

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Jan 28, 2019

I would like to do this :)

Awesome!

This API change is waiting for API-review and approval (which happen on Tuesdays). We generally wait for that before opening up a PR.

However, if approved, the API shape will likely not change, so imo, feel free to get started on a PR if you like (marking it as blocked against this issue). Feel free to ping me for review or if you are blocked on anything. There are only two pending questions that need to be discussed in the review:

  1. Should we add the API at all?
    • Imo, almost certainly yes.
  2. Should we expose it from the Utf8JsonReader, OR from the JsonReaderState, OR both?
    • I would start with just adding it to Utf8JsonReader for now.

@CodeTherapist
Copy link
Contributor

Thanks for the comprehensive information about the API-Review part - I guess it's the session that is streamed over youtube.

Should we expose it from the Utf8JsonReader, OR from the JsonReaderState, OR both?

Accordingly to other similar API's (Json.NET, XmlReader), this information is exposed on the reader directly. On the other hand, the JsonReaderState is about state and would make sense to me - but not sure if my consideration is reasonable.

The JsonReaderState is already public exposed over the property CurrentState and would be accessible.

@terrajobst
Copy link
Member

terrajobst commented Feb 5, 2019

Video

This looks useful. We discussed whether we should expose this on the state as well so that consumers can use that information when invalid schema information was found (rather than when the reader detects invalid JSON).

The current design has the unfortunate behavior that the line/byte position represents where the next token will start reading. Note that this isn't even the start of next token (due to skipped whitespace). Since schema validation requires the token to be read, it's always at the end of the token, rather than at the beginning.

@ahsonkhan please explore what the documentation for these APIs would say to explaint their behavior. We can then review wether that's something developers will be able to reason about.

@ahsonkhan ahsonkhan self-assigned this Feb 14, 2019
@ahsonkhan
Copy link
Member Author

Though the change is relatively straightforward, the semantics of where line number and byte position in line is, for all cases, is nuanced. I think the current behavior is correct and makes sense, but requires more deliberation. Hence, moving this to 5.0, in case we want to change the semantics.

The remarks section capture the subtleties.

/// <summary>
/// The number of lines read so far within the JSON payload (starting at 0).
/// </summary>
/// <remarks>
/// This returns the current line number of the reader on which the last processed JSON 
/// token ends, not where it started.
/// For property names, that means after the colon (including whatever whitespace or new line 
/// characters might be in between the end quote and colon). So, the line number won't always 
/// coincide with the line where the end of the property name itself can be found, but rather 
/// where the colon was found.
/// Outside of multi-line comments, returning the end of any other token (rather than the start)
/// has no affect on line number.
/// See also <see cref="TokenStartIndex"/>.
/// </remarks>
public long LineNumber => _lineNumber;

/// <summary>
/// The number of bytes read so far within the current line of the JSON payload (starting at 0).
/// </summary>
/// <remarks>
/// This returns the current position of the reader which is at the end of the last processed 
/// JSON token, not where it started.
/// For property names, that means after the colon (including whatever whitespace or new line 
/// characters might be in between the end quote and colon). So, the position in line doesn't 
/// coincide with where the end of the property name itself can be found, but rather where the 
/// colon was found.
/// See also <see cref="TokenStartIndex"/>.
/// </remarks>
public long BytePositionInLine => _bytePositionInLine;
[Fact]
public static void LineNumberAndPositionSemantics()
{
    string str = "{  \"foo\"\r\n:  \"bar\"  }";
    Utf8JsonReader reader = new Utf8JsonReader(Encoding.UTF8.GetBytes(str));

    // Generally speaking, white space after a token is not eagerly consumed (only what's before)
    reader.Read(); // Read the start object token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(1, reader.BytePositionInLine);

    // However, an exception is whitespace between property names and the colon that separates 
    // name/value is consumed
    reader.Read(); 
    // Read the property name
    // Cursor is now at the end of the colon, not at the end of the property name)
    Assert.Equal(1, reader.LineNumber);
    Assert.Equal(1, reader.BytePositionInLine);

    reader.Read(); 
    // Read the value string (cursor is now at the end of the string token)
    Assert.Equal(1, reader.LineNumber);
    Assert.Equal(8, reader.BytePositionInLine);

    str = "  [  1 , 2  ]";
    reader = new Utf8JsonReader(Encoding.UTF8.GetBytes(str));

    // Generally speaking, white space after a token is not eagerly consumed (only what's before)
    reader.Read(); // Read the start array token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(3, reader.BytePositionInLine);

    reader.Read(); 
    // Read the first value, we don't read the comma, end right at the end of the number token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(6, reader.BytePositionInLine);

    reader.Read(); // Read the second value (cursor is now at the end of the 2nd number token)
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(10, reader.BytePositionInLine);

    str = "  [  1/*.\r\n.*/ , 2  ]";
    var options = new JsonReaderOptions { CommentHandling = JsonCommentHandling.Skip };
    reader = new Utf8JsonReader(Encoding.UTF8.GetBytes(str), options);

    // Generally speaking, white space after a token is not eagerly consumed (only what's before)
    reader.Read(); // Read the start array token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(3, reader.BytePositionInLine);

    reader.Read(); 
    // Read the first value (we don't eagerly read the comment in any mode).
    // End right at the end of the number token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(6, reader.BytePositionInLine);

    reader.Read(); 
    // Read the second value, skipping comments (cursor is now at the end of the 2nd number token)
    Assert.Equal(1, reader.LineNumber);
    Assert.Equal(7, reader.BytePositionInLine);

    str = "  [  1/*.\r\n.*/ , 2  ]";
    options = new JsonReaderOptions { CommentHandling = JsonCommentHandling.Allow };
    reader = new Utf8JsonReader(Encoding.UTF8.GetBytes(str), options);

    // Generally speaking, white space after a token is not eagerly consumed (only what's before)
    reader.Read(); // Read the start array token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(3, reader.BytePositionInLine);

    reader.Read(); 
    // Read the first value (we don't eagerly read the comment in any mode).
    // End right at the end of the number token
    Assert.Equal(0, reader.LineNumber);
    Assert.Equal(6, reader.BytePositionInLine);

    reader.Read(); // Read the comment (cursor is now at the end of the comment token)
    Assert.Equal(1, reader.LineNumber);
    Assert.Equal(3, reader.BytePositionInLine);
}

Here's the change as a starting point:
https://github.com/dotnet/corefx/compare/master...ahsonkhan:ExposeLNandBPInLine?expand=1

@jeffhandley
Copy link
Member

I'm closing this issue as it hasn't had much traction/interest recently, it's not going to make .NET 7, and its value is lower than many other proposals in the System.Text.Json backlog.

@ewoutkramer
Copy link

Maybe there wasn't much action because everyone is just waiting for this to be applied? Were there open issues that needed to be resolved before this can be applied?

@jeffhandley
Copy link
Member

Thanks, @ewoutkramer. I'm reopening the issue; it will remain in the Future milestone as it's not planned for inclusion in .NET 7.

@jeffhandley jeffhandley reopened this Apr 6, 2022
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Apr 6, 2022
@jam40jeff
Copy link

jam40jeff commented Apr 8, 2022

@jeffhandley That's bad news. Isn't this a very simple change? It would be nice to be able to include this information when throwing JsonException from a custom converter.

@jeffhandley
Copy link
Member

We agree that it would be nice, as such this feature will remain in our backlog. Issue #63762 captures what we're focusing on during .NET 7.

Thanks!

@ewoutkramer
Copy link

@jam40jeff - I wrote a set of extension methods to get to these properties using reflection. Yeah, that's definitely not something I like to do, but it works (for now ;-)).

@danmoseley
Copy link
Member

@jeffhandley is this a change that a community member could feasibly contribute? It's a little unclear from comments higher up whether there are subtleties that have to be considered, or perhaps exposing "good enough" values is straightforward and acceptable for now.

@jam40jeff
Copy link

Could we at least get a helper method like the internal JsonException ThrowHelper.GetJsonReaderException(...) (accepting a string error message rather than a resource) to create a JsonReaderException with the appropriate values set?

@jeffhandley
Copy link
Member

@danmoseley I don't want to set a community contributor up for frustration by inviting a pull request that we won't be able to promptly review.

As is mentioned in #63762, we are continually revising our scope for .NET 7 as we progress through the release. When we hit our next notable checkpoint on the work we have underway right now, we will reassess our availability for being able to champion a community contribution here or bring this back into scope for .NET 7. At the moment though, I do not want to jeopardize #63686 or #63747 in favor of committing to this one.

@danmoseley
Copy link
Member

Fair enough

@LostTime76
Copy link

LostTime76 commented May 6, 2022

I have not read through this entire thread, but it is related to a thread that I opened some time ago about wanting line / col information. In addition to exposing line / col information elsewhere, what would be useful is exposing it on JsonElements within JsonDocs. If I get a JsonElement from a doc, I want to know where it is within the byte stream. Heck. I don't even care if you give me line or col information, I just want to know where it is in the data. I can convert the location to a line & col by counting new lines and characters. This would allow for precise doc error location information when a property value is not valid within your use case.

I had assumed the reason for not giving line information on JsonElements had to do with not wanting to waste the extra bytes in the struct to lower memory size. However, I was wrong. You are already storing byte stream location inside the DB rows of the JsonDocument. for each token. Why then would I just not be able to call a method on JsonElement, such as JsonElement.GetLocation() to retrieve the location within the stream just like how you do lookups for all the other JsonElement methods using the _idx into the database? Wouldn't this be a very simple API addition? For now, I intend to hack my way into that field using reflection.

image

@olivier-spinelli
Copy link

A simple Utf8JsonReader.ThrowJsonException( string message ) (kind of @jam40jeff proposal) would definitely be great...

@rs-blade
Copy link

rs-blade commented Oct 2, 2023

I would find it very useful if JsonReaderState also had public LineNumber and BytePositionInLine properties (not only the Utf8JsonReader)!
Background: We have an assertion framework that takes a state and forwards it to the exception generation. The state is transported via a generic parameter...so the Utf8JsonReader cannot be used here (as it is a ref struct), but the JsonReaderState can.
I could imagine more cases where using the JsonReaderState and not the Utf8JsonReader is more feasible.

@tschodde
Copy link

tschodde commented Dec 8, 2023

Dear all,
what is the state of this extension? We use it as additional information for our users during the deserialization process, meaning, if we find errors in the structure which are NOT an error in json but for our models and, thus, we will not have the option to get an exception thrown by System.Text.Json.

So we either need the linenumber and position or an option on the reader to throw an exception, as stated by olivier-spinelli.

@eiriktsarpalis eiriktsarpalis added the partner-impact This issue impacts a partner who needs to be kept updated label May 13, 2024
@darrelmiller
Copy link

Now that we are migrating OpenAPI.NET https://github.com/microsoft/OpenAPI.NET over to use STJ it would be very helpful to be able to show line numbers of where validation errors occur. We also have other validator libraries that apply a similar pattern for Copilot related scenarios that could benefit from having line numbers for both Utf8JsonReader and JsonNodes that are loaded via JsonDocument.ParseAsync.

@ygoe
Copy link

ygoe commented Jul 26, 2024

Is anybody still interested in making this change? Here's the code, just add it to the file System.Text.Json/System/Text/Json/Reader/Utf8JsonReader.cs:

public long LineNumber => _lineNumber;
public long BytePositionInLine => _bytePositionInLine;

Is it that hard? When I face an invalid value while reading the source with Utf8JsonReader, I want to tell the user where the value can be found. At least approximately. Now I have nothing and the user has to guess and find it themselves. Very frustrating.

I've made my own reader for a YAML-like format, inspired by this JSON reader, and I just added those properties. Works wonderfully! And it sure can here, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json partner-impact This issue impacts a partner who needs to be kept updated
Projects
None yet
Development

No branches or pull requests