-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
char_range()
function for indexing into source strings
#457
Comments
Why would indexing into a string with the |
Yeah, I also realized, I should probably be using using JuliaSyntax
src = read(joinpath(homedir(), ".julia/dev/Sunny/test/test_tensors.jl"), String)
tree = parseall(SyntaxNode, src, filename="foo.jl")
code_range = range(tree[2][end][end])
src[code_range] which also errors. |
If anyone else is facing this issue, a way to bypass it until there's a permanent fix is to find your lines ending in non-ASCII Unicode characters and make that no longer the case. |
Thanks @DaniGlez! Here's a regexp (VSCode flavour) for finding non-ASCII characters that end lines: [^\x00-\x7f]$ |
So unicode generally works and there's a pile of tests for this in test/source_files.jl However there must be a bug in an edge case for non-ascii at line end |
Nice properties of byte ranges are that |
I think all I'm looking for is some function that returns a range that has valid string indices and can be used directly with the underlying string. Does that exist? If I understand @c42f's comment in my PR correct, then As a user of this, I would also say that at the level of SyntaxNodes, users are probably thinking in terms of string indices, not byte indices. I understand that byte stuff makes sense at the green tree level, but isn't that an abstraction leakage if I have to deal with this stuff when dealing with For now I'm going to hack around all of this downstream, we can't have this bug crash in the wild, but it would be much nicer if there was some "official" way to get these indices. |
Not exactly. Here's what exists so far:
Correct. By design,
Ok. If you can point me at the place you're using this downstream, I'm sure we can work out the right design to unhack things in the future :-) |
My hack is https://github.com/julia-vscode/TestItemDetection.jl/blob/e19b96801a10718ca39b2c15cad18d0aae123993/src/packagedef.jl#L2. Essentially I just want that :) In terms of design, I generally just guess that "end-users" of this package that use I saw the |
Right - this is pretty much what JuliaSyntax.jl/src/source_files.jl Line 114 in a63e8bb
Except that I feel the core issue here is the question "what's a good representation of source code?" |
For now I guess a quick fix would be to add a For |
Yes, I think that would be great.
I think it would also be reasonable to say that the "char"-based interface is only available at the |
char_range()
function for indexing into source strings
We have a crash from the VS Code extension that seems to originate from JuliaSyntax. Repo steps are:
pkg> dev Sunny
That crashes with
I assume that
last_byte
should always return a valid index, right? So this seems like a bug.The rest of this issue is some speculation from my end, might all be wrong :)
I started looking a bit around in the JuliaSyntax code, and there seems to be a fair bit of code that assumes 1-byte characters, which strikes me as incorrect? Or maybe this is carefully only done when there is a guarantee that only 1-byte codepoints can appear? An example is
JuliaSyntax.jl/src/parse_stream.jl
Line 519 in 1d95081
last_byte
is only one code unit? Generally if I search the repo for- 1
or+ 1
I see a fair bit of code where I would have assumed that aprevind
ornextind
would be needed?The text was updated successfully, but these errors were encountered: