Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing SourceFile - adding line/column offsets etc #190

Open
timholy opened this issue Feb 10, 2023 · 4 comments
Open

Generalizing SourceFile - adding line/column offsets etc #190

timholy opened this issue Feb 10, 2023 · 4 comments

Comments

@timholy
Copy link
Member

timholy commented Feb 10, 2023

JuliaDebug/Cthulhu.jl#345 uses JuliaSyntax to represent source code with expression boundaries mapped to character positions within the source text. In that application, CodeTracking is used to extract source-text for specific methods which may be partway into a file.

Question: for "snippets" like these, should the starting line number lineno be added as a new field to SourceFile? Or do you need it to represent "a whole file"? If so, then adding it as a field seems incorrect, since of course a whole file will start at line 1. The other issue is whether functions like source_line should return the offset line or the line index; currently the two are the same. It's for reasons like these that I didn't want to add it without thinking about the consequences.

@c42f
Copy link
Member

c42f commented Feb 11, 2023

The main purpose of SourceFile is to map between a lines-and-columns based view of the text and byte offsets. So it would probably be fine to have a line offset and allow it to represent part of a file.

Related, we might need SourceFile to represent part of a file to support incremental reparsing with language server. I haven't tried to get that working yet but it might tie in here at some point.

timholy added a commit to timholy/JuliaSyntax.jl that referenced this issue Feb 11, 2023
timholy added a commit to timholy/JuliaSyntax.jl that referenced this issue Feb 11, 2023
timholy added a commit to timholy/JuliaSyntax.jl that referenced this issue Feb 11, 2023
timholy added a commit to timholy/JuliaSyntax.jl that referenced this issue Feb 11, 2023
timholy added a commit to timholy/JuliaSyntax.jl that referenced this issue Feb 11, 2023
@davidanthoff
Copy link
Contributor

Couple of points from the LS side of things:

  1. we sometimes pass code snippets with an associated line/column offset around in the language server. So, at some level we also sometimes have a need for a type that contains code + offset info for lines/columns, but we would need the ability to offset both the line and the column, not just the line.
  2. it seems to me that a type that includes this kind of offset info shouldn't be called SourceFile simply because it isn't representing an entire file. Maybe SourceSnippet might be better? Or like Roslyn SourceText?
  3. we do have a use-case for the line/column to index conversions in the LS, but the current SourceFile wouldn't work for us. We need uris instead of paths, we need additional versioning info in the file and we will (hopefully soon) have entirely different structures for notebooks. So I'm wondering whether a structure like the following would allow more code-reuse:
struct Position
  line::Int
  column::Int
end

struct SourceSnippet
  code::String
  first_position::Position
  line_starts::Vector{Int}
end

# LS would actually not use this
struct SourceFile
  code::SourceSnippet
  filename::Union{Nothing,String}
end

# LS would use this instead
struct TextDocument
  code::SourceSnippet
  uri::URI
  version::Int
end

struct NotebookCell
  code::SourceSnippet
  other fields...
end

I hope this doesn't derail the original issue too much, I guess my main point is that just adding a first line and nothing about column seems weird to me :)

@c42f
Copy link
Member

c42f commented Feb 14, 2023

Thanks David this is very useful context. I haven't been able to look at LS integration at all yet or what would be required for incremental parsing.

it seems to me that a type that includes this kind of offset info shouldn't be called SourceFile simply because it isn't representing an entire file

Agreed. SourceText seems good to me. It's shorter than SourceSnippet and I already use the term text in many places.

For now I'm happy with merging #191 to get what Tim needs, but we can go with something more complete in the future.

Related to source abstractions, the internal JuliaSyntax.ParseStream doesn't deal natively with SourceFile because it's trying not to require that the code be copied into a String before we can do anything with it (the C code might pass us a plain old buffer, for example). But maybe this is a bit of a pointless optimization and we could streamline the internals too. (Maybe also improve the Core._parse hook API in Base to make extra copying less usual.)

@c42f
Copy link
Member

c42f commented Feb 14, 2023

we can go with something more complete in the future

Let's keep this issue open to discuss that

@c42f c42f changed the title Add lineno to SourceFile? Generalizing SourceFile - adding line/column offsets etc Feb 14, 2023
c42f added a commit that referenced this issue Jun 16, 2023
When parsing source code fragments incrementally with

    * `Meta.parse(str, index)` or
    * `parsestmt(str, index)`

we must avoid scanning the rest of `str` for line numbers for
efficiency. In this mode, the user is expected to provide `first_line`
to "manually" specify which line number we're counting from.

Admittedly this is a bit clunky and should be integrated better with
SourceFile (which should also be renamed - see issue #190) but for now
seems to be the most consistent way to approach things here.
c42f added a commit that referenced this issue Jun 16, 2023
When parsing source code fragments incrementally with

    * `Meta.parse(str, index)` or
    * `parsestmt(str, index)`

we must avoid scanning the rest of `str` for line numbers for
efficiency. In this mode, the user is expected to provide `first_line`
to "manually" specify which line number we're counting from.

Admittedly this is a bit clunky and should be integrated better with
SourceFile (which should also be renamed - see issue #190) but for now
seems to be the most consistent way to approach things here.
c42f added a commit that referenced this issue Jun 16, 2023
When parsing source code fragments incrementally with

* `Meta.parse(str, index)` or
* `parsestmt(str, index)`

we must avoid scanning the rest of `str` for line numbers for
efficiency. In this mode, the user is expected to provide `first_line`
to "manually" specify which line number we're counting from.

Admittedly this is a bit clunky and should be integrated better with
SourceFile (which should also be renamed - see issue #190) but for now
seems to be the most consistent way to approach things here.

As part of the refactoring here, switch over to using `Vector{UInt8}`
for literal parsing which makes parsing to `ParseStream` and
`GreenNode` around 10% faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants