-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export Hooks: Glue Catalog exporter #6653
Conversation
@nopcoder please notice that PR description added usage (TODO's will be done after this PR code approved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name the file aws.go
as we do for other packages - name the main library as the package.
config.WithRegion(c.Region), | ||
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(c.AccessKeyID, c.SecretAccessKey, "")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The credentioals are required, but the region can be empty - verify that LoadDefaultConfig
with empty region works. I think you should not pass it to the options in order to use the default configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified, works :)
table.remove(page, 1) | ||
else | ||
-- remove hidden files from the entry set | ||
table.remove(page, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are doing the same - so you can just move table.remove
out of the both if scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
-- create a standard AWS Glue table input (i.e not Apache Iceberg), add input values to base input and configure the rest | ||
local function build_glue_create_table_input(base_input, descriptor, symlink_location, columns, partitions, action_info, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see any real usage for this one in the near future? if not inline it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see usage incase user want's to use opts.override_create_table_input
and change certain things in build_glue_create_table_input
output.
Actually I added now export for this function as well outside of the module.
local repo_id = action_info.repository_id | ||
local commit_id = action_info.commit_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not using repo and commit id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
-- create table | ||
local json_input = json.marshal(table_input) | ||
if opts.debug then | ||
print("Creating Glue Table - input: ", json_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you concat the strings
print("Creating Glue Table - input: ", json_input) | |
print("Creating Glue Table - input:", json_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return { | ||
create_table_input=table_input | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not returning just table_input
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To leave room for other variables and make it a "result" without breaking function signature.
For example, in a more complex scenario where export will do version updates to the same table we will want the new latest version things like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to table_input
@@ -1,6 +1,18 @@ | |||
local url = require("net/url") | |||
local DEFAULT_SHORT_DIGEST_LEN=6 | |||
|
|||
local function clone_table(original_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest use the deepcopy (and call it deepcopy as it does deep copy for any type) from the formal lua wiki:
function deepcopy(orig)
local orig_type = type(orig)
local copy
if orig_type == 'table' then
copy = {}
for orig_key, orig_value in next, orig, nil do
copy[deepcopy(orig_key)] = deepcopy(orig_value)
end
setmetatable(copy, deepcopy(getmetatable(orig)))
else -- number, string, boolean, etc
copy = orig
end
return copy
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woha took me a while to compile that in my mind, but better for sure - updated thanks.
pkg/actions/lua/storage/aws/glue.go
Outdated
// parse table input JSON | ||
var tableInput types.TableInput | ||
err := json.Unmarshal([]byte(tableInputJSON), &tableInput) | ||
if err != nil { | ||
lua.Errorf(l, err.Error()) | ||
panic("unreachable") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move parse after the you finish the params extract - I think it will be readable and better structure of the function implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/actions/lua/storage/aws/glue.go
Outdated
retArgs := util.DeepPush(l, itemMap) | ||
l.PushBoolean(true) | ||
return retArgs + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange that DeepPush needs to return a number - it expected to already return 1.
Same for this function getTable
return - we must return 2 as we like the boolean to be on position 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, changed 👍
@nopcoder Thanks! Please re-review 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #6575
TODO - will create additional PR and merge into current:
Usage
Pre-requisite
Write some table data to lakeFS either
parquet
orCSV
format, snippet of parquet added:Define table in
_lakefs_tables/employees.yaml
Create
_lakefs_action/employee_exporter.yaml
Create export hook in
scripts/glue_export.lua