-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
staging put: stream file uploads instead of loading all to memory #197
staging put: stream file uploads instead of loading all to memory #197
Conversation
c129212
to
bd00b6f
Compare
@yunbodeng-db @andrefurlan-db @rcypher-databricks @jadewang-db any chances of a review on this? This is currently blocking us from using Databricks efficiently |
os.ReadFile reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, an os.File instance is itself a byte reader, and we can provide the file directly to http.NewRequest so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory.
bd00b6f
to
6e608ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!!
@andrefurlan-db thanks! I can't merge the pull-request myself, and I think the lint and check jobs are failing for unrelated reasons (the lint errors are for other files and seem to complain because the imports are not recognised). can the pull-request be merged? |
A team member will assist you shortly. Thanks for your patience. |
@kravets-levko @yunbodeng-db @mdibaiee I get 501 unimplemented response with this change, I really think this should be reverted & properly tested with the backend. My assumption is that the file streaming is alright, but the backend doesn't actually allow data with unknown length, hence, this fails. |
…tabricks#197) `os.ReadFile` reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, an `os.File` instance is itself a byte reader, and we can provide the file directly to `http.NewRequest` so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory.
…tabricks#197) `os.ReadFile` reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, an `os.File` instance is itself a byte reader, and we can provide the file directly to `http.NewRequest` so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory. Signed-off-by: Esdras Beleza <esdras@esdrasbeleza.com>
….5.3 to 1.5.4 (#61) Bumps [github.com/databricks/databricks-sql-go](https://github.com/databricks/databricks-sql-go) from 1.5.3 to 1.5.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/databricks/databricks-sql-go/releases">github.com/databricks/databricks-sql-go's releases</a>.</em></p> <blockquote> <h2>v1.5.4</h2> <h2>What's Changed</h2> <ul> <li><code>databricks/databricks-sql-go#189</code><a href="https://github.com/rcypher-databricks"><code>@rcypher-databricks</code></a>)</li> <li><code>databricks/databricks-sql-go#197</code><a href="https://github.com/mdibaiee"><code>@mdibaiee</code></a>)</li> <li><code>databricks/databricks-sql-go#205</code><a href="https://github.com/candiduslynx"><code>@candiduslynx</code></a>)</li> <li><code>databricks/databricks-sql-go#207</code><a href="https://github.com/candiduslynx"><code>@candiduslynx</code></a>)</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4">https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/databricks/databricks-sql-go/blob/main/CHANGELOG.md">github.com/databricks/databricks-sql-go's changelog</a>.</em></p> <blockquote> <h2>v1.5.4 (2024-04-10)</h2> <ul> <li><code>databricks/databricks-sql-go#189</code><a href="https://github.com/rcypher-databricks"><code>@rcypher-databricks</code></a>)</li> <li><code>databricks/databricks-sql-go#197</code><a href="https://github.com/mdibaiee"><code>@mdibaiee</code></a>)</li> <li><code>databricks/databricks-sql-go#205</code><a href="https://github.com/candiduslynx"><code>@candiduslynx</code></a>)</li> <li><code>databricks/databricks-sql-go#207</code><a href="https://github.com/candiduslynx"><code>@candiduslynx</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/databricks/databricks-sql-go/commit/e82880f5e0583fcb1228080956d8ca27491f22c3"><code>e82880f</code></a> Prepare release v1.5.4 (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/208">#208</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/7ac797b5800fec66147795b2039886e270f9b944"><code>7ac797b</code></a> fix: Properly format <code>time.Time</code> values (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/207">#207</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/dea3e6ded2e8c6012d223168b7577f71016e6aa9"><code>dea3e6d</code></a> fix: Don't panic on <code>remove</code> op (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/205">#205</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/00bc1c893537290e8db73b9921151123b8cebd2b"><code>00bc1c8</code></a> staging put: stream file uploads instead of loading all to memory (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/197">#197</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/bb10f7a642f79133a4a2cb200d2c47e4909581fb"><code>bb10f7a</code></a> Update Github workflows to fix deprecation warnings (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/203">#203</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/d70ab7c5b6dda71f1439bd3edc51a2d2022f7d92"><code>d70ab7c</code></a> Added GCP cloud type for OAuth (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/189">#189</a>)</li> <li><a href="https://github.com/databricks/databricks-sql-go/commit/5adddfcbcaad7991ea084242606dd4527bd5a396"><code>5adddfc</code></a> Update owners (<a href="https://redirect.github.com/databricks/databricks-sql-go/issues/190">#190</a>)</li> <li>See full diff in <a href="https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sql-go&package-manager=go_modules&previous-version=1.5.3&new-version=1.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
os.ReadFile
reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, anos.File
instance is itself a byte reader, and we can provide the file directly tohttp.NewRequest
so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory.