Skip to content

Conversation

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Jul 1, 2025

Extracted from #79205 to make that PR simpler.

That PR ends up making a few changes that end up cleaning up lexing a lot:

  1. responsibility for lexeme tracking moves to the lexer, from the text window. the text window now just concerns itself with being a fast stream of characters from teh original source text. This also simplifies a bunch of code to boot.
  2. text window gets less mutable state (with data showing why that is ok), simplifying lifetimes and array management.
  3. because of '2', text window can move to nicer abstractions (like ArraySegment/Span/etc) to make segment processing operations simpler. This helps ensure less mistakes and makes code simpler.

this.TextWindow = new SlidingTextWindow(text);
}

protected int LexemeStartPosition => this.TextWindow.LexemeStartPosition;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intent is to move LexemeStartPosition into lexer, so that only the lexer cares about lexemes, and the textwindow only cares about being a fast streaming sequence of chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a lexeme? Is that like a token?

Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, and i can probably doc. It's "the entity the lexer is currently producing". This is commonly the text of BOTH trivias AND tokens (without its trivia).

It's what you generally expect to get back if you ask the Token/Trivia for its .Text property (not .FullText, and not .ValueText).

Ignoring things like directives, the lexer generally is pointing at some position in the source. And it will 'start' lexing a 'lexeme' at that point. It consumes forward, based on certain rules about what it is currently consuming, until it 'finishes' that lexeme. At which point it generates a result (token or trivia in the majority case). That result is given a Kind, Text, and potentially other bits and bobs attached to it.

The goal here is to make the sliding-text-window care absolutely not one whit about lexer concepts, and keep itself only in the domain of making character-retrieval efficient. So lexemes and the like move up entirely to the lexer. This actually simplifies a bunch, and makes it harder to get things wrong.

FOr example, in the last year, there was a tweak to the sliding text window to allow it to look backwards. However, because the window itself was tracking lexemes, it could get into a corrupt state when it did that, leading to bad results being returned upwards in edge-case scenarios. THis split would help avoid that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR:

It's the smallest piece of Text hte lexer grabs out as an individual string to jam into either a Token or Trivia. it is indivisible.

=> TextWindow.GetText(intern: false);

protected string GetInternedLexemeText()
=> TextWindow.GetText(intern: true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these helpers are here because GetText implicitly uses LexemeStartPosition. Once that is removed from the text window itself, it will need to be passed in (as the start position to read from, up to the text window's current position). So this means instead of having to update a huge number of sites, only this site is updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: i wanted all lexeme-oriented operations to have that in their name. It's not at all evident what "TextWindow.GetText" or "TextWindow.Width" even means. Names like "CurrentLexemeWidth" are much clearer that it refers to the length of the current token being lexed out.

var atDotPosition = this.TextWindow.Position;
if (atDotPosition >= 1 &&
atDotPosition == this.TextWindow.LexemeStartPosition)
atDotPosition == this.LexemeStartPosition)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanical updates of TextWindow.LexemeStartPosition to this.LexemeStartPosition


this.ScanToEndOfLine();
info.Text = TextWindow.GetText(false);
info.Text = this.GetNonInternedLexemeText();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanical update of TextWindow.GetText(true/false) to GetInternedLexemeText()/GetNonInternedLexemeText()

//to valid UTF-16 characters. So if we get the SlidingTextWindow's sentinel value,
//double check that it was not real user-code contents. This will be rare.
Debug.Assert(TextWindow.Width > 0);
Debug.Assert(this.CurrentLexemeWidth > 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanical update of TextWindow.Width to this.CurrentLexemeWidth

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review July 1, 2025 16:28
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner July 1, 2025 16:28
@jcouv jcouv self-assigned this Jul 1, 2025
Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 26). Kindly reminder to squash, thanks!

@CyrusNajmabadi CyrusNajmabadi merged commit f10909a into dotnet:main Jul 1, 2025
24 checks passed
@CyrusNajmabadi CyrusNajmabadi deleted the lexerHelpers branch July 1, 2025 23:58
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Jul 1, 2025
333fred added a commit that referenced this pull request Jul 17, 2025
* Simplify code

* Simplify code

* Ensure server is loaded and unloaded with the solution

* Check that a resulting ternary operator conversion is implicit before removing a cast in one of its branches

* Extract method for handling source generated documents

* Revert "Remove the 'intent' subsystem from roslyn (#79179)"

This reverts commit 4e03ac9, reversing
changes made to 93b1b3e.

* Remove legacy intents that are no longer hooked up

* Remove unnecessary folder in project file

* More cases

* More cases

* remove tests

* Reduce allocations of BoundBinaryOperator.UncommonData (#79200)

* Remove duplicate method

* Extensions: adjust SpecialName on implementation methods (#79068)

* Update azure-pipelines-integration-dartlab.yml (#79206)

* Rename

* Remove obsolete code

* Extract common lexer code into helpers (#79214)

* Improve diagnostic for ambiguous predefined type (#79196)

* Improve diagnostic for ambiguous predefined type

* Update pre-existing tests

* Keep translations

* Revert unnecessary changes

* Add a source package that can be used to connect to the roslyn build server (#78986)

* Add BuildClient package

* Allow sdk passing compiler hash

* Reuse in Replay

* Add a bit how we can and will break the APIs

* Move more files to the shared folder

* Use shared file list in build task

* Address some known functionality gaps for extension operators (#79167)

Related to #78968.
Related to #76130.

* Use spans in low level lexer char array handling code

* Use spans in low level lexer char array handling code

* VB tests

* Remove duplicate code for processing arrays vs strings

* Revert

* Fix test

* Add internal APIs for prototyping

* Share more code

* Remove unused function

* Use spans in low level lexer char array handling code (#79232)

* Remove unused functions (#79234)

* Update src/Compilers/Core/Portable/InternalUtilities/StringTable.cs

* Remove using

* Rename FileBasedProgramsProjectFactory to MiscellaneousFilesWorkspaceProjectFactory

* Don't double register for document sync

* Don't check configuration if the client doesn't support it (ie, tests)

* Extensions: adjust logic related to primary ctor parameter (#79056)

* Address another set of functionality gaps for extension operators (#79227)

Related to #78967
Related to #78968
Related to #76130

* Revert "Use spans in low level lexer char array handling code (#79232)" (#79245)

This reverts commit fb38d6b.

* Use spans in low level lexer char array handling code (part 2) (#79250)

* Allow the Razor extension to report telemetry (and initialize)

* Make the new Workspace.Register*Handler methods public

We're obsoleting the old ones since the expectation going forward is
most users can use the new ones only.

* Do not use a constructed method symbol in a BoundMethodDefIndex (#79211)

* Simplify

* Address follow-up comments for extension operators (#79249)

Closes #79136.

* Use collection expression

* Docs

* ordering

* Initial support for adding obsolete attributes to primary constructors

* Flesh out

* Use raw strings

* File scoped namespaces

* Simplify tests

* in progress

* Simpler approach

* Revert

* Simplify tests

* Working

* Delete file

* Remove file

* Use the miscellaneous files project name for rich misc projects (#79267)

* Update tests

* Inline strings in test code

* Make tests non-async

* Add the "experimental feature" string back

Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2453154

* Move telemetry initialization out of our UI-thread bound helper

This isn't needed anymore.

* Update global.json

* Update nuget.config

* Update Arcade

* Use .NET 10  Preview 5 SDK

* Add back dotnet9 feed

* Add back additional feeds

* Add back dotnet6 feed

* The Unix CI build does not need to pack or publish

* Downgrade arcade

* Reapply "Move to .NET 10 Preview 5 (#78906)"

This reverts commit 1ab27c2.

* Use scouting queue for integration tests

* Revert: Use scouting queue for integration tests

* Fix Rename adornment positioning after scrolling

* Fix debugging of build tasks (#78271)

* Fix debugging of build tasks

* Split into a new target

* Fix up tests

* Tweak 'add required parens' to recognize a common C# idiom

* Lint

* Revert compiler change (#79288)

* Extensions: extension grouping type names (#79217)

* Fix ref safety of user-defined increment operators (#79034)

* Fix ref safety of user-defined increment operators

* Update after merge

* Improve code and tests

* Inline test code

* Make tests synchronous

* Remove unnecessary await

* Enable C# classification in more tests

* Fix unit tests

There's two issues here:

1. Some tests of the VS layer didn't have the telemetry service in the
   first place.
2. There was now a circularity between the VisualStudioWorkspace and
   VisualStudioWorkspaceTelemetryService, so a Lazy has been added.

* Use collection expressions

* Explicitly document that ITaggerEventSource.Changed can be on any thread

All subscribers are working in thread-safe manners (they either take
locks to clear caches, or use existing batching/threading primitives to
queue new work).

* Document that LspSolutionChanged may happen on any thread

Subscribers were already thread safe. I've also removed the IMPORTANT
warning since it seems quite stale -- this is already in a delayed queue
so it's not really clear what it meant.

* Remove subscription from WorkspaceChanged in the StackTraceExplorer

I'm unable to figure out what the intent was here -- it clears the list
but only on a SolutionChanged event, which is the type of event
raised if multiple projects are modified at once in a single workspace
change -- any other type of event would have a more specific kind.
I could imagine the intent might have been for solution close, but
then I can imagine scenarios where the user might have pasted a stack
and now needs to switch the solution to navigate from the stack.

Since this likely never ran, I'm just deleting it.

* Avoid checking the CanonicalName each time for the analyzer

If the user expands a node in a CPS project wanting to look at the
diagnostics under an analyzer, but we haven't been told about that
analyzer let, we stick a WorkspaceChanged handler there to find it once
it comes back. We were hopping to the UI thread to see if the
CanonicalName of the item could have changed, but that's not really
going to happen ever for these items, so we can just grab it once
during creation and be done with it.

* Remove thread requirement from CodeAnalysisDiagnosticAnalyzerService

This looks to be safely callable from any thread.

* Reduce probability of stack overflow during exception handling in ModelComputation (#79292)

Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2428510

This was previously addressed by adding a Task.Yield to the TransformModelAsync execution, instead this moves that stack size mitigation to the exception handling process. Per #26567, the vast majority of stack size during stack unwinding is due to the unwinding process itself.

* Remove synchronous rename (#78839)

* wip

* wip

* wip

* fix tests

* wip

* remove unused code

* update test impl

* update option test

* add cancel method

* some cleanup

* call CommitAsync in tests instead

* fix some comments and naming

* rename commit to make it more clear it's an async operation

* feedback

* feedback

* fix integration test

* Update

* Add test demonstrating issue

* inline

* fix casing mismatch when using non-unc file paths

* Collection expressions

* Skip failing tests

* Add cookbook section for avoiding inheritance (#79276)

* Add cookbook section for avoiding inheritance

Adds a section to the incremental generators cookbook on why users should avoid inheritance, that goes over a few possible scenarios and how they can hurt IDE responsiveness.

* Update ToC

* Add a small note on preferring FAWMN.

* Make 'convert to raw string' a syntax-only refactoring

* Fix issue with use-raw-string and fix-all

* Fix bad merge

* Use raw strings in tests

* [main] Source code updates from dotnet/dotnet (#79313)

* [VMR] Codeflow 9eec48b-9eec48b

[[ commit created by automation ]]

* Update dependencies from https://github.com/dotnet/dotnet build 274568
No dependency updates to commit

---------

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Fix argument indentation

* REvert

* Simplify initialization of OpenFileTracker

There's only two things here that could potentially need the UI thread:

1. The Running Document Table subscription, but OpenTextBufferProvider
   already abstracted that away.
2. The IEditorOptionsFactoryService usage, which being a MEF component
   shouldn't care, but even then we don't need it right away.

It's easy to clean all this up, so just do so.

* Revert

* revise

* NRT

* Auto prop

* nrt

* Fix

* Update src/Compilers/Core/Portable/AdditionalTextFile.cs

* better fix

* better fix

* better fix

* Set DeployExtension for all the extensions we expect to deploy

* Set RoslynCompilerType=Custom in all toolset package flavors (#79327)

* Fix code gen for some increment/compound assignment scenarios involving constrained calls. (#79293)

Fixes #79268

* In progress

* In progress

* Hook up

* Add work item

* Cleanup

* Simplify

* Update to xunit.runner.visualstudio 3.1.1

Address changes from dotnet/sdk#49248.

* Temporarily increase timeout of helix items to see if it resolves timeouts

* Update resx generator test resources to assert new behavior, and fix generator in one instance

* Revert broken raw string changes in PropertySetAnalysisTests

* Fix 'use var' with spans

* Skip tests affected by dotnet/runtime#117566.

* Disable tests for #79351

* Disable tests for #79352.

* Make tests more consistent

* Fix issue offering to remove nullable cast in a ternary expression

* Fix not offering to remove unnecessary nullable pragmas

* Fix issue with remove unnecessary parens in vb

* Fix crash in replace property with methods

* Fix

* Track assembly names, not counts

* Disable more tests for #79352

* Add test

* Add test showing issue no longer reproes

* Fix issue where typeof/sizeof weren't classified properly in FindRefs

* Allow user to still create a new field/prop when offering to initialize an existing prop

* Ensure generated types come after top level statements

* Null tolerance

* Ensure we collect dumps on hangs/crashes

* nrt

* Update configs for snap

* *** DO NOT MERGE: Revert "Remove most remaining uses of WorkspaceChanged off the UI thread (#78778)" (#79366)

This reverts commit 67cce50, reversing
changes made to aafd6eb.

Going to run a test insertion to see if this is the cause of regressions
flagged in
https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/650773

* Disable Assert that is causing hangs

Will follow up with an issue but disabling for now to see if this unblocks the CoreCLR tests

* Add console logger to ensure helix console log has detailed info

* Replace Assert.False call

This `Assert.False` call was executing directly an a thread pool thread.
That meant when it triggered it was an unhandled exception on a TPT
which crashes the process. The calling code already fails the test when
this happen hence changed this to just output the failure info to the
test logs.

* Fix helix timeout

* Use new dll name for hooking xunit dispose

---------

Co-authored-by: Cyrus Najmabadi <cyrusn@microsoft.com>
Co-authored-by: David Wengier <david.wengier@microsoft.com>
Co-authored-by: DoctorKrolic <mapmyp03@gmail.com>
Co-authored-by: Cyrus Najmabadi <cyrus.najmabadi@gmail.com>
Co-authored-by: AlekseyTs <AlekseyTs@users.noreply.github.com>
Co-authored-by: Julien Couvreur <julien.couvreur@gmail.com>
Co-authored-by: Ankita Khera <40616383+akhera99@users.noreply.github.com>
Co-authored-by: Jan Jones <janjones@microsoft.com>
Co-authored-by: gel@microsoft.com <gel@microsoft.com>
Co-authored-by: Jason Malinowski <jason.malinowski@microsoft.com>
Co-authored-by: Gen Lu <genlu@users.noreply.github.com>
Co-authored-by: Rikki Gibson <rigibson@microsoft.com>
Co-authored-by: Joey Robichaud <joseph.robichaud@microsoft.com>
Co-authored-by: Joey Robichaud <jorobich@microsoft.com>
Co-authored-by: Oleg Tkachenko <olegtk@microsoft.com>
Co-authored-by: Todd Grunke <toddgrun@microsoft.com>
Co-authored-by: David Barbet <dabarbet@microsoft.com>
Co-authored-by: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: jaredpar <jared@paranoidcoding.org>
@RikkiGibson RikkiGibson modified the milestones: Next, 18.0 P1 Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants