Skip to content

Commit

Permalink
Unblock more lexer inlining. (#3274)
Browse files Browse the repository at this point in the history
The big change is to make the lexer helpers have internal linkage,
making all of them easy to inline into single call sites.

Looking at the profile showed several other cases of unfortunate
out-of-line functions. Two were due to the code size produced for checks
-- those are switched to `DCHECK`s to remove that code from optimized
builds. The loss of coverage seems minor.

A last one was closing open groups. This was a surprising routine to be
hot, but it the paths to discover "nothing to do here" were intertwined
into the code. This PR extracts this common trace into a separate
function that delegates to the looping recovery path. This lets the hot
path inline easily.

At this point, for a large lexing benchmark I'm using, 50% of the time
is in the identifier hash table at this point. The remaining
improvements are to actually make some of the hot routines like symbol
lexing and comment lexing faster.
  • Loading branch information
chandlerc authored Oct 10, 2023
1 parent d1e3749 commit 6f5934a
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 6 deletions.
4 changes: 2 additions & 2 deletions toolchain/lex/token_kind.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class TokenKind : public CARBON_ENUM_BASE(TokenKind) {
// The token kind must be an opening symbol.
[[nodiscard]] auto closing_symbol() const -> TokenKind {
auto result = ClosingSymbol[AsInt()];
CARBON_CHECK(result != Error) << "Only opening symbols are valid!";
CARBON_DCHECK(result != Error) << "Only opening symbols are valid!";
return result;
}

Expand All @@ -66,7 +66,7 @@ class TokenKind : public CARBON_ENUM_BASE(TokenKind) {
// The token kind must be a closing symbol.
[[nodiscard]] auto opening_symbol() const -> TokenKind {
auto result = OpeningSymbol[AsInt()];
CARBON_CHECK(result != Error) << "Only closing symbols are valid!";
CARBON_DCHECK(result != Error) << "Only closing symbols are valid!";
return result;
}

Expand Down
28 changes: 24 additions & 4 deletions toolchain/lex/tokenized_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ static auto ScanForIdentifierPrefix(llvm::StringRef text) -> llvm::StringRef {
// tokens by calling into this API. This class handles the state and breaks down
// the different lexing steps that may be used. It directly updates the provided
// tokenized buffer with the lexed tokens.
class TokenizedBuffer::Lexer {
class [[clang::internal_linkage]] TokenizedBuffer::Lexer {
public:
// Symbolic result of a lexing action. This indicates whether we successfully
// lexed a token, or whether other lexing actions should be attempted.
Expand Down Expand Up @@ -565,11 +565,31 @@ class TokenizedBuffer::Lexer {
// Closes all open groups that cannot remain open across the symbol `K`.
// Users may pass `Error` to close all open groups.
auto CloseInvalidOpenGroups(TokenKind kind) -> void {
if (!kind.is_closing_symbol() && kind != TokenKind::Error) {
// There are two common cases that result in nothing to close. Short circuit
// those here.
if ((!kind.is_closing_symbol() && kind != TokenKind::Error) ||
open_groups_.empty()) {
return;
}

while (!open_groups_.empty()) {
// Also check the first open group token to see if it matches this closing
// token, in which case there is nothing to do. This is redundant with the
// work inside the main loop, but we peel it out to allow inlining.
Token opening_token = open_groups_.back();
TokenKind opening_kind = buffer_->GetTokenInfo(opening_token).kind;
if (kind == opening_kind.closing_symbol()) {
return;
}

// Otherwise, delegate to a separate function to help with inlining.
CloseInvalidOpenGroupsSlow(kind);
}

[[gnu::noinline]] auto CloseInvalidOpenGroupsSlow(TokenKind kind) -> void {
CARBON_CHECK(kind.is_closing_symbol() || kind == TokenKind::Error);
CARBON_CHECK(!open_groups_.empty());

do {
Token opening_token = open_groups_.back();
TokenKind opening_kind = buffer_->GetTokenInfo(opening_token).kind;
if (kind == opening_kind.closing_symbol()) {
Expand Down Expand Up @@ -598,7 +618,7 @@ class TokenizedBuffer::Lexer {
TokenInfo& closing_token_info = buffer_->GetTokenInfo(closing_token);
opening_token_info.closing_token = closing_token;
closing_token_info.opening_token = opening_token;
}
} while (!open_groups_.empty());
}

auto GetOrCreateIdentifier(llvm::StringRef text) -> Identifier {
Expand Down

0 comments on commit 6f5934a

Please sign in to comment.