Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial pipeline completion support #530

Merged
merged 4 commits into from
Jan 25, 2021

Conversation

renkun-ken
Copy link
Member

@renkun-ken renkun-ken commented Jan 21, 2021

What problem did you solve?

Closes #323

This PR implements a simple function getPipelineCompletionItems() based on extendSelection so that the beginning symbol of the pipeline expression is extracted and its names() stored in globalenv.json is used to provide live completion items in user session.

Note that for subsequent pipeline expressions such as mutate() and summarize(), new variables may be introduced. REditorSupport/languageserver#324 should be helpful in this case to provide token completion for these newly introduced variables even if the code is not executed but those symbols already appear in the code.

(If you have)Screenshot

image

(If you do not have screenshot) How can I check this pull request?

In the following example, tbl1 is created and saved to disk. The language server should be able to capture all these symbols in tbl1 if they appear in the code rather than comments. After storing tbl1 to disk, we could comment out the tbl1 code and let session watcher completion provider work to capture tbl1 in the pipeline expression and provide its names in the whole expression.

We could test the completions in each mutate() and select() call.

library(dplyr)
library(tibble)

# tbl1 <- tibble(
#   Name = c("Product1", "Product2", "Product3"),
#   Quality = c("Good", "Good", "Best"),
#   Score = c(8, 9, 10),
#   Type = c("B", "B", "A")
# )
# saveRDS(tbl1, "tbl1.rds")

tbl1 <- readRDS("tbl1.rds")

# typical pipeline without assignment
tbl1 %>%
  mutate(Score_mean = mean(Score)) %>%
  select(Score)

# typical pipeline with assignment
tbl2 <- tbl1 %>%
  mutate(Score_mean = mean(Score)) %>%
  select(Score)

# one-lined pipeline
tpl3 <- tbl1 %>% mutate(Name2 = Name) %>% select(Score)

# mixed case
tpl3 <- tbl1 %>% mutate(Name2 = Name) %>% 
  select(Score)

@renkun-ken
Copy link
Member Author

getBracketCompletionItems and getPipelineCompletionItems may generate duplicate completion items. Looks like we need to remove those duplicates.

@renkun-ken
Copy link
Member Author

Also this does not handle the following

tbl2 <- tbl1 %>%
  select(Name, Score)

Looks like we need to extract the symbol before the pipe operator or after the assignment operator.

@renkun-ken renkun-ken marked this pull request as draft January 21, 2021 16:29
@ManuelHentschel
Copy link
Member

The example seems to work nicely for me (small typo: l is missing in # saveRDS(tb1...).

Have you considered checking the function name to determine if the column names are suggested? As far as I'm aware, the pipeline operator can also be used for other functions like lapply() which do not automatically attach the dataframe (and hence don't make the column names usable by themselves). On the other hand, functions like mutate, select etc. can also be called with normal syntax, e.g. tbl2 <- select(tbl1, Score), which does not trigger the current pipeline-completion.

The current approach is probably fine for most users, though, and still a significant improvement!

@renkun-ken renkun-ken marked this pull request as ready for review January 24, 2021 10:17
@renkun-ken
Copy link
Member Author

I change the algorithm to searching the pipe token before any pipe operator like %>%, |> in the first non-empty code line detected by extendSelection() so that the following cases can be handled properly:

# typical pipeline without assignment
tbl1 %>%
  mutate(Score_mean = mean(Score)) %>%
  select(Score)

# typical pipeline with assignment
tbl2 <- tbl1 %>%
  mutate(Score_mean = mean(Score)) %>%
  select(Score)

# one-lined pipeline
tpl3 <- tbl1 %>% mutate(Name2 = Name) %>% select(Score)

# mixed case
tpl3 <- tbl1 %>% mutate(Name2 = Name) %>% 
  select(Score)

@renkun-ken
Copy link
Member Author

Have you considered checking the function name to determine if the column names are suggested?

I guess we don't really need such accurate completions by detecting known NSE functions as RStudio does not seem to do this either. It's hard to know the accurate results unless we do more to make sure if these calls are from the expected namespaces (dplyr::filter() or stats::filter()?)

@ManuelHentschel
Copy link
Member

I guess we don't really need such accurate completions by detecting known NSE functions as RStudio does not seem to do this either. It's hard to know the accurate results unless we do more to make sure if these calls are from the expected namespaces (dplyr::filter() or stats::filter()?)

Sounds good :)

@renkun-ken renkun-ken merged commit 2d378dd into REditorSupport:master Jan 25, 2021
@danielbasso
Copy link
Contributor

danielbasso commented Jan 25, 2021

Used this feature a lot this weekend, and it is already way better than not having anything.

getBracketCompletionItems and getPipelineCompletionItems may generate duplicate completion items. Looks like we need to remove those duplicates.

This is the only "problem" I encountered so far. Sometimes the autocomplete can become a lot messy by the sheer number of duplicates:

image

I personally wouldn't mind don't having the 'abc' suggestions, just the object ones (blue outlined solid), even if this means that sometimes variables created by mutate() or not assigned dataframes won't show up. I think it would become cleaner.

Aside from that, it works great. The autocomplete doesn't need to be perfect, just need to help, as now it is already doing.

@renkun-ken
Copy link
Member Author

@danielbasso Thanks for testing!

In fact, these duplicates come from different sources. The "abc" items come from token completion in languageserver which means these tokens appear in your code (not necessarily executed) while blue-box items combe from vscode-R pipeline completion which detects the symbol being piped and find its names() from the global environment of user session provided by session watcher.

Currently, it seems we don't have a way to remove duplicates of completion items from difference sources, but I guess I could add an option to languageserver so that you could turn off token completion?

@danielbasso
Copy link
Contributor

Currently, it seems we don't have a way to remove duplicates of completion items from difference sources, but I guess I could add an option to languageserver so that you could turn off token completion?

@renkun-ken

Well, if it doesn't take much of your time, this would be nice! I just don't want to cause extra work for you.

The token completion from languageserver is cool, but sometimes it kinda gets out of control with 100+ suggestions. In my previous example, 8 lines of code produced 17 suggestions (Even "c" is suggested hahaha).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand Session watcher to help auto complete with NSE (tidyverse syntax)
3 participants