-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate duckdb-wasm to query parquet/csv files on the UI (experimental) #4821
Conversation
From @AdiPolak's tweet :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome solution!
Have a few comments/suggestions and I think you can delete /components/repository/dataTable.jsx
, since it's empty
package.json
Outdated
@@ -0,0 +1 @@ | |||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this and the package-lock.json
from the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/api/controller.go
Outdated
ctx := r.Context() | ||
c.LogAction(ctx, "head_object", r, repository, ref, "") | ||
|
||
_, err := c.Catalog.GetRepository(ctx, repository) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like GetEntry
already handles this error state. Not sure this is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
throw new Error(await extractError(response)); | ||
} | ||
|
||
return response.text() | ||
} | ||
|
||
async getWithHeaders(repoId, ref, path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getWithHeaders
is still used in ReadmeContainer
. Needs to be replaced there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -28,7 +28,7 @@ import {RefTypeBranch, RefTypeCommit} from "../../../constants"; | |||
import {copyTextToClipboard} from "../controls"; | |||
import Modal from "react-bootstrap/Modal"; | |||
|
|||
const PREVIEW_SIZE_LIMIT = 4194304; // 4MB in bytes | |||
const PREVIEW_SIZE_LIMIT = 5.243e+8; // 4MB in bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const PREVIEW_SIZE_LIMIT = 5.243e+8; // 4MB in bytes | |
const PREVIEW_SIZE_LIMIT = 5.243e+8; // 524.3MB in bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, changed :)
onReady: (text: string) => JSX.Element; | ||
} | ||
|
||
export const DATA_FORMATS = ["parquet", "csv", "tsv"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're moving to TS, might as well make the most of it. I'd suggest replacing this with an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! didn't know ts has enums. Changed.
return <ObjectTooLarge {...props}/> | ||
} | ||
// is this markdown? | ||
if (contentType === "text/markdown" || fileExtension === 'md') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overly verbose vs. the previous implementation (look up a map of renders). With the map it's also much quicker and easier to understand what's supported or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I tried fitting the new logic into a map resolution interface and it ended up being complex. I took another stab at it. The difference being:
- We want duckdb to render data files even if they are above the size limit (duckdb does HTTP byte range requests, so it doesn't really pull all the data)
- We now render code and configuration files, there are A LOT of types and we need to somehow normalize the different content types and extensions to fit the list of table names.
Anyway - took another stab at it, I think it's more robust now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much more readable now IMHO
If we ever need more complex logic for some keys, we can change the values from FC<RendererComponent>
to a function or HoC
/> | ||
); | ||
|
||
export const DuckDBRenderer: FC<RendererComponentWithDataFormat> = (props) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move DuckDBRenderer
to another file along with DataRow
, as it's getting a bit long. I should probably refactor to move all renderers out of this file. I should have done it this way to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
value: any; | ||
} | ||
|
||
const DataRow: FC<RowProps> = ({ value }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since RowProps
is a very minimal interface and it's probably not going to be reused, you can inline the type def.
const DataRow: FC<RowProps> = ({ value }) => { | |
const DataRow: FC<{ value: any }> = ({ value }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
export const guessLanguage = (extension: string | null, contentType: string | null) => { | ||
if (extension && SyntaxHighlighter.supportedLanguages.indexOf(extension) !== -1) { | ||
if (extension && SyntaxHighlighter.supportedLanguages.indexOf(extension) !== -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Array.includes() instead of indexOf
It's the more modern approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Done.
} else { | ||
content = React.createElement(renderer, { content: rawContent, contentBlob: blobContent, }, []); | ||
} | ||
|
||
const repo = { | ||
id: repoId, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you changed the URL, I suggest adding the filename value to the download
prop of the download button, so you get the correct filename in the download
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Done
Thanks for the great review @eladlachmi - PTAL :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
api/swagger.yml
Outdated
default: | ||
description: internal server error | ||
410: | ||
description: object expired | ||
416: | ||
description: Requested Range Not Satisfiable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep them ordered
default: | |
description: internal server error | |
410: | |
description: object expired | |
416: | |
description: Requested Range Not Satisfiable | |
410: | |
description: object expired | |
416: | |
description: Requested Range Not Satisfiable | |
default: | |
description: internal server error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2778,7 +2775,67 @@ paths: | |||
ETag: | |||
schema: | |||
type: string | |||
Content-Disposition: | |||
401: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes reflected in the diff of the docs swagger doesn't match the main api/swagger - run: make docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/reference/.bsp/sbt.json
Outdated
@@ -0,0 +1 @@ | |||
{"name":"sbt","version":"1.8.0","bspVersion":"2.1.0-M1","languages":["scala"],"argv":["/opt/homebrew/Cellar/openjdk/19.0.1/libexec/openjdk.jdk/Contents/Home/bin/java","-Xms100m","-Xmx100m","-classpath","/opt/homebrew/Cellar/sbt/1.8.0/libexec/bin/sbt-launch.jar","-Dsbt.script=/opt/homebrew/Cellar/sbt/1.8.0/libexec/bin/sbt","xsbt.boot.Boot","-bsp"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how did we got this one? I think it can be safely removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
pkg/api/controller.go
Outdated
rr := httptest.NewRecorder() | ||
if c.handleAPIError(ctx, w, err) { | ||
w.WriteHeader(rr.Code) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the api error handle so you can capture the code and have the same result without using a test library to record the response write generate.
func (c *Controller) handleAPIErrorCallback(ctx context.Context, w http.ResponseWriter, err error, cb func(w http.ResponseWriter, code int, v interface{})) bool {
switch {
case errors.Is(err, catalog.ErrNotFound),
errors.Is(err, graveler.ErrNotFound),
errors.Is(err, actions.ErrNotFound),
errors.Is(err, auth.ErrNotFound),
errors.Is(err, kv.ErrNotFound):
cb(w, http.StatusNotFound, err)
case errors.Is(err, graveler.ErrDirtyBranch),
errors.Is(err, graveler.ErrCommitMetaRangeDirtyBranch),
errors.Is(err, catalog.ErrNoDifferenceWasFound),
errors.Is(err, graveler.ErrNoChanges),
errors.Is(err, permissions.ErrInvalidServiceName),
errors.Is(err, permissions.ErrInvalidAction),
errors.Is(err, model.ErrValidationError),
errors.Is(err, graveler.ErrInvalidRef),
errors.Is(err, graveler.ErrInvalidValue),
errors.Is(err, actions.ErrParamConflict):
cb(w, http.StatusBadRequest, err)
case errors.Is(err, graveler.ErrNotUnique),
errors.Is(err, graveler.ErrConflictFound),
errors.Is(err, graveler.ErrRevertMergeNoParent):
cb(w, http.StatusConflict, err)
case errors.Is(err, catalog.ErrFeatureNotSupported):
cb(w, http.StatusNotImplemented, err)
case errors.Is(err, graveler.ErrLockNotAcquired):
cb(w, http.StatusInternalServerError, "branch is currently locked, try again later")
case errors.Is(err, adapter.ErrDataNotFound):
cb(w, http.StatusGone, "No data")
case errors.Is(err, auth.ErrAlreadyExists):
cb(w, http.StatusConflict, "Already exists")
case errors.Is(err, graveler.ErrTooManyTries):
cb(w, http.StatusLocked, "Too many attempts, try again later")
case err != nil:
c.Logger.WithContext(ctx).WithError(err).Error("API call returned status internal server error")
cb(w, http.StatusInternalServerError, err)
default:
return false
}
return true
}
func (c *Controller) handleAPIError(ctx context.Context, w http.ResponseWriter, err error) bool {
return c.handleAPIErrorCallback(ctx, w, err, writeError)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! did the same for c.authorize
.
pkg/api/controller.go
Outdated
w.WriteHeader(rr.Code) | ||
return | ||
} | ||
c.Logger.Tracef("get repo %s ref %s path %s: %+v", repository, ref, params.Path, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tracing middleware should give you the same information. In case it doesn't - please use the logger's WithFields to capture the structured data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
c.LogAction(ctx, "head_object", r, repository, ref, "") | ||
|
||
// read the FS entry | ||
entry, err := c.Catalog.GetEntry(ctx, repository, ref, params.Path, catalog.GetEntryParams{ReturnExpired: true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, not in this pr - but I don't think we currently supports ReturnExpired: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing
closes #4820