Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for pandoc install without GitHub token? #30

Open
zkamvar opened this issue May 30, 2023 · 10 comments · May be fixed by #35
Open

Allow for pandoc install without GitHub token? #30

zkamvar opened this issue May 30, 2023 · 10 comments · May be fixed by #35
Assignees
Labels
feature a feature request or enhancement

Comments

@zkamvar
Copy link

zkamvar commented May 30, 2023

I tried to get a colleague to use this package, but she used SSH verification for Git and does not have a PAT. She got an error that said that she needed a PAT to continue. I suspect it comes from the {gh} package, but I was wondering if there was a way to have a fallback when no GitHub token is avaialable?

@cderv
Copy link
Owner

cderv commented May 31, 2023

We indeed use gh to get the available releases (especially the latest) and their download URL from API

pandoc/R/pandoc-install.R

Lines 278 to 288 in 638f1c6

fetch_gh_releases <- function(limit = Inf) {
gh_required()
rlang::inform(c(i = "Fetching Pandoc releases info from github..."))
gh::gh(
"GET /repos/:owner/:repo/releases",
owner = "jgm",
repo = "pandoc",
.limit = limit,
.progress = FALSE
)
}

I don't think the github API can be access with SSH only. So I am not sure how to offer a fallback when no Github Token.

Is setting a Github PAT too complicated and not possible here ?

I can add some guidance in doc or error message about how to setup a PAT. Like pointing toward usethis helpers: https://usethis.r-lib.org/articles/git-credentials.html#get-a-personal-access-token-pat

This PAT think with Github is one of the reason the recommandation from usethis is to use https and not ssh : https://usethis.r-lib.org/articles/git-credentials.html#https-vs-ssh

If you authenticate via SSH for “regular” Git work, you still have to set up a PAT for work that uses the REST API.

Theoretically there could be some solution (like scraping a release page using a url), but honestly using the Github API is the most reliable and easiest to maintain.

Internally there is a caching mechanism per session to store the queried the database, so shipping a cached version inside the package could be possible but

  • it would need to be very low size to be under CRAN limitation
  • it would not be up-to-date and would only allow installing version released before the latest CRAN release.

So not ideal.

Another idea would be to have free fallback mode where you pass a version number and we try() to download without checking for availability and URL validation querying the API first.

Not ideal either.

@zkamvar
Copy link
Author

zkamvar commented Jun 1, 2023

Thank you for the detailed response, Chris!

Is setting a Github PAT too complicated and not possible here ?

In some ways, yes. Two years ago, there was a big discussion in The Carpentries community around the use of SSH vs PAT in teaching novices, and the community decided that SSH offered fewer complications (see swcarpentry/git-novice#778 (comment) for an example of the cognative load offered by SSH vs PAT). I have even written guidance myself for this, but find that it's one of those things that people just find too complicated: https://carpentries.github.io/sandpaper-docs/github-pat.html

My goal was to be able to use the {pandoc} package as a method for helping people bootstrap The Carpentries Workbench directly from R.

I can add some guidance in doc or error message about how to setup a PAT. Like pointing toward usethis helpers: https://usethis.r-lib.org/articles/git-credentials.html#get-a-personal-access-token-pat

This PAT think with Github is one of the reason the recommandation from usethis is to use https and not ssh : https://usethis.r-lib.org/articles/git-credentials.html#https-vs-ssh

If you authenticate via SSH for “regular” Git work, you still have to set up a PAT for work that uses the REST API.

I think this is a very sensible solution.

Another idea would be to have free fallback mode where you pass a version number and we try() to download without checking for availability and URL validation querying the API first.

This is the method I was thinking about. It's not ideal or perfect, but at least it gives people the ability to install pandoc from a known version number.

Another method is to set up a GitHub account like https://github.com/testingjerry (as used in the R-universe docker container) and generate a few permission-less PATs, obfuscate them, and randomly select one when someone does not have a PAT set up

Another method could be to continously cache the database of pandoc release tags in this repository as a text file and then have a function that would locally store and update that text file on the user's machine in the appdir folder (which is a kind of generic functionality I'm hoping someone has already solved). This way, if the user does not have the PAT, you could query the local text file and then fetch the version. Again, not ideal because of lag and need for the user to run the function that updates the file, but it's better than the try() and hope.

@yihui
Copy link
Collaborator

yihui commented Jun 1, 2023

I don't think the github API can be access with SSH only. So I am not sure how to offer a fallback when no Github Token.

Without a Github PAT, I think we can still access the Github API. The only problem is that the rate limit (per IP) will be stricter, e.g., this is how to list the tags in the jgm/pandoc repo without a Github token:

xfun::github_releases('jgm/pandoc', pattern = '^[0-9.]+$')
# lower-level call
xfun:::rest_api_raw('https://api.github.com', '/repos/jgm/pandoc/tags', '', list(per_page = 100, page = 1))

Installing Pandoc from Github releases is very similar to installing Hugo in blogdown. For blogdown, I don't require Github PAT or SSH configurations (if a PAT is provided, it will be used). You can take a look at the source code of blogdown::hugo_installers, which fetches the download links of all release assets. I can factor out some of its code into a function that can be reused by other developers.

@cderv
Copy link
Owner

cderv commented Jun 1, 2023

Thanks both for your inputs! It seems we can do without a PAT after all. I'll have a look at this.

the community decided that SSH offered fewer complications

My two cents: As a Windows user, SSH was always a bit complicated to setup or used, as not that common in the Windows world. 😅

@cderv cderv self-assigned this Jun 1, 2023
@zkamvar
Copy link
Author

zkamvar commented Jun 1, 2023

My two cents: As a Windows user, SSH was always a bit complicated to setup or used, as not that common in the Windows world.

Honestly, I'm on your side of this, but the people have spoken and the cries for the old ways combined with the fear of megacorporations controlling everything won out since this is one of those processes where you set something once and you are done 🤷🏼

Thank you for considering the option!

@cderv cderv added the feature a feature request or enhancement label Aug 23, 2023
@cderv
Copy link
Owner

cderv commented Aug 23, 2023

@zkamvar would it be possible to share the exact error message that was encountered ?

Looking into this, I believe our current solution using gh::gh() works also with no token set

This means that

gh::gh(
    "GET /repos/:owner/:repo/releases",
    owner = "jgm",
    repo = "pandoc",
    .progress = FALSE
)

should work even if no GITHUB_PAT or GITHUB_TOKEN env var set

Otherwise, using

gh::gh(
    "GET /repos/:owner/:repo/releases",
    owner = "jgm",
    repo = "pandoc",
    .progress = FALSE,
    .token = ""
)

Insure no token use.

I am wondering if there is something happening with the fact that SSH if configured for GIT, or if the error is linked to API rate limitation anyway.

Thank you !

@zkamvar
Copy link
Author

zkamvar commented Aug 23, 2023

Ah good point. 😓 It turns out it was a 401 response; I think it genuinely was bad credentials. I seem to remember GitHub throwing this response during some operations when there were no credentials required.

Error in gh_process_response(raw) :
GitHub API error (401):
Message: Bad credentials
Read more at https://docs.github.com/rest

If I create a PR with a tryCatch for the 401 error to re-attempt without a token, would that be acceptable?

@cderv
Copy link
Owner

cderv commented Aug 24, 2023

Bad credential means potentially a wrong PAT string is being passed.

You get this from the above gh::gh() example I shared ? With both of them including the one with .token = "" ?

For the /releases endpoint, it works without credentials, and I get this when rate limit is reached (60 max when no PAT)

> gh::gh(
+ "GET /repos/:owner/:repo/releases",
+ owner = "jgm",
+ repo = "pandoc",
+ .progress = FALSE,
+ .token = ""
+ )
Error in `gh::gh()`:
! GitHub API error (403): API rate limit exceeded for
  146.70.194.99. (But here's the good news: Authenticated requests get
  a higher rate limit. Check out the documentation for more details.)
ℹ Read more at
  <https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting>
Run `rlang::last_trace()` to see where the error occurred.

So 403 error, not 401.

So I am curious with which call exactly you get this error, so that we could see why you get that.

If you want to check token, gh::gh_token() will show you what is seen by gh. Should be "" if you have really no PAT used.
Internally gh is using https://gitcreds.r-lib.org/ and gitcreds::gitcreds_get() is the function used.

Maybe the SSH credentials are messing up the logic for gh which expect a PAT. 🤷

@zkamvar
Copy link
Author

zkamvar commented Aug 24, 2023

If I had more information about the state of the token and ssh, I would gladly give it, but the only information I had was a screenshot from the user who has since updated their computer. The command that generated the error was pandoc::pandoc_install("2.19.2")

I did test the response with the empty token and I was able to successfully get the 200 status. The reason I suggest a 401 fallback is because for a lot of users, juggling tokens is difficult and they end up with a situation where they have an invalid token in their environment (or in their git setup) and no clue of how to remove it.

I think that people should be able to download pandoc, even if they have bad credentials. Yes, it's not a great state for them to be in, but that's not the problem this package is trying to solve. The implementation would look something like this:

res <- tryCatch(
  expr = {
    gh::gh(
      "GET /repos/:owner/:repo/releases",
      owner = "jgm",
      repo = "pandoc",
      .progress = FALSE,
      .token = "nope" # simulate bad credentials
    )
  },
  http_error_401 = function(e) {
    message(e$message) # show that the message is from a 401 call
    message("Bad credentials found; downloading without credentials")
    gh::gh(
      "GET /repos/:owner/:repo/releases",
      owner = "jgm",
      repo = "pandoc",
      .progress = FALSE,
      .token = "" # fallback from bad credentials:
    )
  }
)
#> GitHub API error (401):  Bad credentials
#> Bad credentials found; downloading without credentials
class(res)
#> [1] "gh_response" "list"
length(res)
#> [1] 30

Created on 2023-08-24 with reprex v2.0.2

Thank you for your patience with this issue!

@cderv
Copy link
Owner

cderv commented Aug 24, 2023

The reason I suggest a 401 fallback is because for a lot of users, juggling tokens is difficult and they end up with a situation where they have an invalid token in their environment (or in their git setup) and no clue of how to remove it.

I understand better what you mean ! I can probably do that yes. I don't how many users would it this, but it seems easy enough to do. Thanks for the idea!

zkamvar added a commit to zkamvar/pandoc that referenced this issue Aug 24, 2023
This allows users with bad credentials (they expired or were
misconfigured in the first place) to install pandoc using the limited
API without credentials.

This will fix cderv#30
@zkamvar zkamvar linked a pull request Aug 24, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants