Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: Allow streaming of the data #601

Closed
1 task done
Uzlopak opened this issue Jul 6, 2023 · 18 comments · Fixed by #602
Closed
1 task done

[FEAT]: Allow streaming of the data #601

Uzlopak opened this issue Jul 6, 2023 · 18 comments · Fixed by #602
Labels
released Type: Feature New feature or request

Comments

@Uzlopak
Copy link
Contributor

Uzlopak commented Jul 6, 2023

Describe the need

Currently octokit/request is not configurable to provide the stream of the response.body. This is an issue, if you want to load the tarball of a project and want to stream it to a tar stream and process the file content on the fly.

There should be an option, which results in passing the body as stream, so that we can utilitze it if needed.

SDK Version

No response

API Version

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Uzlopak Uzlopak added Status: Triage This is being looked at and prioritized Type: Feature New feature or request labels Jul 6, 2023
@github-project-automation github-project-automation bot moved this to 🆕 Triage in 🧰 Octokit Active Jul 6, 2023
@gr2m
Copy link
Contributor

gr2m commented Jul 6, 2023

For this use case, we recommend you use @octokit/endpoint with your own request method. You can access it at request.endpoint()

Like this

// https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#download-a-repository-archive-tar
const requestOptions = request.endpoint('GET /repos/{owner}/{repo}/tarball/{ref}', {
  owner: 'OWNER',
  repo: 'REPO',
  ref: 'REF'
})

// {
//   method: 'GET',
//   url: 'https://api.github.com/repos/OWNER/REPO/tarball/REF',
//   headers: {
//     accept: 'application/vnd.github.v3+json',
//     'user-agent': 'octokit-core.js/0.0.0-development Node.js/16.20.1 (darwin; arm64)'
//   },
// }

Note that you will have to set the authorization header yourself if you use an octokit instance. The resulting object of calling octokit.request.endpoint(...) will not set it. The authorization header is only set before sending an actual request

@gr2m gr2m added Type: Support Any questions, information, or general needs around the SDK or GitHub APIs and removed Type: Feature New feature or request Status: Triage This is being looked at and prioritized labels Jul 6, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 6, 2023

Tbh, this doesnt help me. I just used in our bot built in fetch, because there is no benefit in using a package, if I have to set everything manually. The only benefit of using octokit is the potential typesafety. If i have no typesafety then i dont need the package.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 6, 2023

As i see that you added the "support" tags. If you need support to implement this feature, then could have a look, in providing a PR.

@gr2m
Copy link
Contributor

gr2m commented Jul 6, 2023

if I have to set everything manually. The only benefit of using octokit is the potential typesafety. If i have no typesafety then i dont need the package

that's okay, we can't make it work for all possible use cases, everything is a trade off. If this is the only request you need to send, I'd use native fetch, too.

@gr2m gr2m closed this as completed Jul 6, 2023
@github-project-automation github-project-automation bot moved this from 🆕 Triage to ✅ Done in 🧰 Octokit Active Jul 6, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 6, 2023

Sry but this is not what I expected. I did not expected that my issue gets demoted from feature request to a support request. I expected some feedback regarding the possibility to land this feature and maybe the invitation to provide a PR.

@gr2m
Copy link
Contributor

gr2m commented Jul 6, 2023

okay if you like to give it a go, you can certainly send a pull request and we can discuss it there.

Beware that we are currently discussing an introduction of options.request.fetchOptions which could be useful in this context.

How do you stream the response with a native fetch currently? Does it work across Node / Browsers / Deno?

@gr2m gr2m reopened this Jul 6, 2023
@gr2m gr2m added Type: Feature New feature or request and removed Type: Support Any questions, information, or general needs around the SDK or GitHub APIs labels Jul 6, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 6, 2023

I checked and i realized that i use node-fetch as our bots use still node 16.

This is what i do:

public processArchive (url: string): Promise<DependabotConfigEntry[]> {
    const headers = {
      Authorization: 'Bearer ' + this.git.octokit.authToken
    }
    return new Promise<DependabotConfigEntry[]>((resolve, reject) => {
      fetch(url, { method: 'GET', headers })
        .then(response => {
          if (!response.ok) {
            throw new Error(`Failed to fetch ${url}. Status: ${response.status}`)
          }

          const tarStream = tarList()
          const uniqueSet: Set<string> = new Set()
          const entries: DependabotConfigEntry[] = []

          tarStream.on('entry', function onentry (entry: ReadEntry) {
            let directory = dirname(entry.path).slice(entry.path.split(sep)[0].length) || sep
            const ecosystem = detectEcosystemByFilepath(basename(entry.path), directory)

            if (ecosystem) {
              // github-actions need to set the path to root
              if (ecosystem === 'github-actions') {
                directory = '/'
              }
              if (uniqueSet.has(ecosystem + '/' + directory) === false) {
                entries.push({
                  'package-ecosystem': ecosystem,
                  directory,
                  schedule: {
                    interval: 'daily'
                  }
                })
                uniqueSet.add(ecosystem + '/' + directory)
              }
            }
          })

          pipeline(response.body, tarStream, error => {
            error
              ? reject(error)
              : resolve(entries)
          })
        })
        .catch(e => {
          reject(e)
        })
    })
  }

With this implementation memory spikes and alerts are gone from the gcloud logs.

@gr2m
Copy link
Contributor

gr2m commented Jul 6, 2023

we cannot rely on node-fetch unfortunately, it has some APIs specific to Node. If we consider adding a streaming feature to Octokit, we can only use fetch APIs that are standardized and supported across Node 18+, modern browsers, and Deno.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 6, 2023

Sure. Issue was that fetch is afaiknot in @types/node and we already used node-fetch in other parts of the code. I would try to provide a solution with native fetch.

@wolfy1339
Copy link
Member

wolfy1339 commented Jul 6, 2023

There is such a thing as read streams with the fetch API

https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams

Writable streams also exist

@wolfy1339
Copy link
Member

This would only require a small modification to the getResponseData() function:

async function getResponseData(response: Response) {
const contentType = response.headers.get("content-type");
if (/application\/json/.test(contentType!)) {
return response.json();
}
if (!contentType || /^text\/|charset=utf-8$/.test(contentType)) {
return response.text();
}
return getBuffer(response);
}

Instead of doing anything with Response.body, simply return it and that is your stream

@wolfy1339 wolfy1339 moved this from ✅ Done to 🔥 Backlog in 🧰 Octokit Active Jul 6, 2023
@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 7, 2023

I created a PR to discuss it further ;)

@github-project-automation github-project-automation bot moved this from 🔥 Backlog to ✅ Done in 🧰 Octokit Active Jul 11, 2023
@github-actions
Copy link

🎉 This issue has been resolved in version 8.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@amacneil
Copy link

Glad this was merged! Here's a complete example of streaming release assets from rest.js:

octokit/rest.js#12 (comment)

@AleksandrHovhannisyan
Copy link

Is there any documentation on this aside from the linked code above? I'm wondering if it's possible to stream the result of Octokit.issues.listComments, for example, but I'm not seeing anything in the docs about this.

@wolfy1339
Copy link
Member

You can stream any request you want. although for pure JSON endpoints like the list comments for issue endpoint. you should be using pagination

You can pass request options to all API methods

@AleksandrHovhannisyan
Copy link

AleksandrHovhannisyan commented Aug 15, 2024

This is very likely a skill issue on my part as I'm new to HTTP streaming as a concept. I don't want to pollute a closed issue with unnecessary notifications. At the risk of doing that, though, this is what my old code was doing:

const response = await octokit.paginate(octokit.issues.listComments, {
  owner: site.issues.owner,
  repo: site.issues.repo,
  issue_number: issueNumber,
  per_page: 100,
});

I tried using the above suggestion of request: { parseSuccessResponseBody: false } as well as octokit.paginate.iterator per these docs, but neither one seem to give me a readable stream; I always just get application/json with an array of all comments at once. I'm on @octokit/rest version 20.1.0. Edit: I'm also just now realizing this is the wrong repo, ugh sorry.

@wolfy1339
Copy link
Member

Feel free to open a discussion instead. We can carry on there
https://github.com/octokit/octokit.js/discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Type: Feature New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants