Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store documents in a memdb-backed table #771

Merged
merged 15 commits into from
Feb 22, 2022
Merged

Conversation

radeksimko
Copy link
Member

@radeksimko radeksimko commented Jan 25, 2022

This is a first part of refactoring per #719


Why

Per LSP contract the language server is basically expected to maintain the up-to-date copy of the document sent by client via text synchronization methods (e.g. didOpen or didChange). Previously we did this as part of the filesystem package where we kept it within a map:

docMeta map[string]*documentMetadata
docMetaMu *sync.RWMutex

The filesystem package also served two use cases, which ended up not being as related to each as we originally thought:

  • unified FS interface to pass e.g. to HCL parser so that it can parse either (virtual) document or the underlying file on the OS FS easily - previously solved largely via spf13/afero
  • direct access to the document metadata via read and write methods

While this was a reasonable implementation originally, I believe we have slowly outgrown it with our needs.

Much of the important state which LS maintains is held in memdb and so it makes sense to move this data to memdb as well. We can benefit from the "data locality" as we don't need to be importing another package like filesystem if we only need access to documents (which is majority of the RPC layer). Secondly we can query and update data across memdb tables more easily. Lastly it makes the architecture a bit easier to understand when data/state is kept in one place, stored the same way.

New document package

A new package is dedicated to:

  • Document type describing the document itself (exactly how it is stored in memdb)
  • Decoupled concept of "handles" which was originally spread across lsp, filesystem packages and elsewhere. This makes it a single place which deals with conversions between file, document and directory URIs as needed in various places.
  • logic related to calculation of changes between two versions of a document

Having a dedicated (smaller) package makes it easier to import throughout the codebase without running into import cycles.

New documents memdb table

type Document struct {
	Dir      DirHandle
	Filename string

	ModTime    time.Time
	LanguageID string
	Version    int

	// Text contains the document body stored as bytes.
	// It originally comes as string from the client via LSP
	// but bytes are accepted by HCL and io/fs APIs, hence preferred.
	Text []byte

	// Lines contains Text separated into lines to enable byte offset
	// computation for any position-based operations within HCL, such as
	// completion, hover, semantic token based highlighting, etc.
	// and to aid in calculating diff when formatting document.
	// LSP positions contain just line+column but hcl.Pos requires offset.
	Lines source.Lines
}

Previously the filesystem package maintained a isOpen flag next to a document indicating whether the document is open but that flag was never used because we only ever tracked open documents and removed them when they get closed - see

func (fs *fsystem) CloseAndRemoveDocument(dh DocumentHandler) error {
isOpen, err := fs.isDocumentOpen(dh)
if err != nil {
return err
}
if !isOpen {
return &DocumentNotOpenErr{dh}
}
err = fs.memFs.Remove(dh.FullPath())
if err != nil {
return err
}
return fs.removeDocumentMetadata(dh)
}

It therefore made sense to just avoid that flag.

Removal of spf13/afero

After decoupling documents into memdb and re-plumbing DocumentStore methods back to the filesystem FS interface I realized that there's actually relatively little benefit we would now get from the external library and decided to just reimplement the small part that we actually need - which is translation between document.Document and fs.FileInfo or fs.DirEntry which ended in internal/filesystem/document.go, internal/filesystem/inmem.go and internal/filesystem/os_fs.go - altogether relatively few LOC.

Unfortunately we still depend on afero indirectly through mockery which we use in tests:

$ go mod why github.com/spf13/afero
# github.com/spf13/afero
github.com/hashicorp/terraform-ls/tools
github.com/vektra/mockery/v2
github.com/vektra/mockery/v2/cmd
github.com/spf13/viper
github.com/spf13/afero

but this is "dev" dependency only.

@radeksimko radeksimko changed the title Move documents from in-mem filesystem into memdb table Move documents from in-mem filesystem into documents memdb table Jan 25, 2022
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch from efb972f to 2d40ec6 Compare January 25, 2022 19:34
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch from 2d40ec6 to 64ddf97 Compare February 3, 2022 21:58
@radeksimko radeksimko changed the title Move documents from in-mem filesystem into documents memdb table Store documents in a memdb-backed table Feb 3, 2022
@radeksimko radeksimko marked this pull request as ready for review February 4, 2022 18:01
@radeksimko radeksimko requested a review from a team February 4, 2022 19:09
…mentStore

Previously filesystem package had two major use cases, to offer a unified io/fs.FS interface for e.g. parsing *.tf or *.tfvars, which was implemented mostly via external library (spf13/afero). Secondly it also provided a direct full access to the "in-memory layer" of the filesystem for RPC handlers (e.g. didOpen, didChange, didClose, ...).

These use cases rarely overlap throughout the codebase and so this lead to unnecessary imports of the `filesystem` package in places where we only needed either the OS-level FS or in-mem FS, but almost never both.

This decoupling allows us to import `filesystem` or `state.DocumentStore` separately.

Also, as we no longer need the in-mem backend of afero, it makes more sense to just reimplement the small part of the 3rd party library instead.
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch from 64ddf97 to 215ebe5 Compare February 9, 2022 09:15
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work and presentation! Thank you for taking the time to make the review as easy as possible.

I was able to follow through with all of your changes, and the language server still works ;)

I've added a few questions to understand memdb handling transactions better.

internal/document/handle.go Show resolved Hide resolved
internal/state/documents.go Show resolved Hide resolved
internal/state/documents.go Show resolved Hide resolved
internal/state/documents.go Show resolved Hide resolved
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch 11 times, most recently from ccf8946 to 3a4d871 Compare February 21, 2022 14:40
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch from 3a4d871 to cce44ec Compare February 21, 2022 14:43
@radeksimko radeksimko force-pushed the f-fs-memdb-refactoring branch from cce44ec to 5af49be Compare February 21, 2022 16:10
@radeksimko
Copy link
Member Author

The refactored uri package now accounts for the edge case we discovered with Windows-style paths C:\ & VS Code as per microsoft/vscode#75027 and contains some tests for that as well.

I also moved some of the existing logic we had in place for case-insensitive comparison of URIs there and added some comments after researching the problem.

There is a few problems which this PR does not address, but we have issues to track them separately:

@dbanck Do you mind reviewing the newly added commits? I intentionally left the old ones intact, so it should be easier to filter out.

I re-tested the PR in a Windows VM on VMware Fusion but I'd also appreciate any additional testing on any other Windows machine.

@radeksimko radeksimko requested a review from dbanck February 21, 2022 17:21
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for answering all my questions and adding more tests!

Your refactorings around the handle/URI logic look solid, and the comments are helpful. I would never have guessed that VS Code is making paths so difficult.

Next, I'll test the changes on Windows.

internal/uri/uri.go Show resolved Hide resolved
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

Everything works fine for me on Windows, too.

@radeksimko radeksimko merged commit 9906c49 into main Feb 22, 2022
@radeksimko radeksimko deleted the f-fs-memdb-refactoring branch February 22, 2022 14:49
@radeksimko radeksimko added this to the v0.26.0 milestone Feb 23, 2022
@github-actions
Copy link

This functionality has been released in v0.26.0 of the language server.
If you use the official Terraform VS Code extension, it will prompt you to upgrade to this version automatically upon next launch or within the next 24 hours.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants