Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take empty tokens into account #51

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

vemoo
Copy link

@vemoo vemoo commented Jan 17, 2020

This is usefull if you are creating empty tokens to represent virtual tokens that were added to make the ast make sense.

Before, for the tokens ["A", "+", ""] calling token_at_offset(2) would return TokenAtOffset::Single("+")``. Now it returns TokenAtOffset::Between("+", "")`.

I've added a simple test, and I've also run the rust-analyzer tests with the modified rowan crate.

Something to take into account with this implementation is that if token_at_offset is called in a non root node with offset at the start or end of the node, TokenAtOffset::Between will be returned with a token from outside this node.

also implements token_at_offset TODO
and adds a test
Copy link
Collaborator

@jD91mZM2 jD91mZM2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! I don't know exactly about the pros/cons of supporting empty tokens, I'll let @matklad decide on that :)

src/cursor.rs Outdated
}
} else {
left.token_at_offset(offset)
unreachable!()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should theoretically be able to be reached if there's a node with zero children... Right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for nodes without children that shouldn't be reached because of

rowan/src/cursor.rs

Lines 452 to 454 in be215ee

if range.is_empty() {
return TokenAtOffset::None;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did find another issue related to empty nodes. I fixed it and added a couple of tests. 2bfdd9e

@matklad
Copy link
Member

matklad commented Jan 18, 2020

I am very hesitant about supporting empty tokens, as they open all kinds of nasty edge cases.

In particular, with empty tokens, TokenAtOffset type does not really make sense, as there may be arbitrary many empty tokens at a given position, and, if we fully support empty tokens, we should return a Vec or an Iterator here.

Initially (way back in fall) I've used empty tokens and nodes extensively (for example, syntax errors like "missing ;" were modeled as empty nodes), but that caused a lot of problems down the line, exectly because the API became non-intuitive due to all of edge cases.

I'd love to go further and even forbid empty internal nodes, but that unfortunately hits a really horrible edge-case: an empty file. I am even wondering if the API would be better if we had an explicit EMPTY_FILE token for this case....

@vemoo
Copy link
Author

vemoo commented Jan 18, 2020

But returning TokenAtOffset::Single in the presence of empty tokens also doesn't make sense.
For the example of tokens ["A", "+", "", "*", "C"] token_at_offset(2) before this change returns TokenAtOffset::Single("+") but why not TokenAtOffset::Between("+", "*") or TokenAtOffset::Single("*")?

I realise there could be arbitrary many empty tokens, but TokenAtOffset::Between("+", "") could be considered left biased in a way. With empty tokens the return type will always be somthing like TokenAtOffset::Between(<non_empty>, <empty>).

Other alternative would be to disallow them when building them.

@matklad
Copy link
Member

matklad commented Jan 18, 2020

Other alternative would be to disallow them when building them.

Yeah, we currently "soft-disalow" them, in that code assumes that tokens are non-empty. We don't have any active assetions though, because nothing actively breaks with empty tokens. If one has to have empty tokens, one can write custom algorithms which do something resonable

@CAD97
Copy link
Collaborator

CAD97 commented Jan 19, 2020

When I first learned indexes, I learned them like the cursor of a text editor:

|a|b|c|d|e|f|g|
^ ^ ^ ^ ^ ^ ^ ^
0 1 2 3 4 5 6 7

Per this understanding of the offset (cursor position), then TokenAtOffset::Between is definitely correct for when the cursor is next to a zero-width token.

For example, given the tokens ["a", "", "b"] and offset 1, the cursor could either be at a|␣b or a␣|b. So for this, TokenAtOffset::Between("a", "") or ::Between("", "a") would be correct, you could argue ::Between("a", "b") (but I'd argue against), and ::Single("a") is just incorrect (the position is not within the span of the a token, but after it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants