Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/allow direct insert #31

Merged
merged 20 commits into from
Oct 24, 2024
Merged

Feature/allow direct insert #31

merged 20 commits into from
Oct 24, 2024

Conversation

yokofly
Copy link
Collaborator

@yokofly yokofly commented Oct 18, 2024

Sorry about making a big PR, probably I can start earlier to split this big PR.
things summary:

  1. support more data type
- nullable(date)
- nullable(datetime)
- nullable(datetime64)
- nullable(fixed_string())
- map(string, string)
- array(string)
- array(int64)
  1. direct insert support (especially when the target proton, we force to write directly) (reason: upsert query like insert select will timeout sometimes)
  2. add a very strong restriction, the target table needs to be created if we use incremental mode and write to target proton database. reason: because now we use direct write, which makes the type conversion and then infer pretty hard. we wanna make user user-created table first. (here we do not check if the user created a buggy table)
  3. refine the retry, now we use expo time backoff.
  4. refien the temp/internal csv. if we set SLING_KEEP_TEMP then save it, otherwise created in the /app or local sling_transfer folder.

We plan to eliminate the timeout issue during partial inserts by directly writing to the target table instead of using a temporary table. However, when we directly write to the target table, handling type conversion and inference may be a bit tricky. I decided that users will need to create the target table themselves.

TODO:

  1. performance improved. I use linux perf wanna check bottleneck but find nothing.

yokofly and others added 9 commits October 16, 2024 19:26
…tetoDB

refactor writeto db to smaller function
with directy insert, we need to support more detailed column append
otherwise, the column type will be changed to other type by infer, and we cannot append data through go driver directly
the final table must exist
this is a very strong requirement, but make everything more stable/robust/easier
reason: infer type cause insert not work usually.
also enhance the retry logic in task_run.go
@yokofly
Copy link
Collaborator Author

yokofly commented Oct 18, 2024

not work, let me refine

@yokofly yokofly linked an issue Oct 23, 2024 that may be closed by this pull request
@yokofly yokofly marked this pull request as ready for review October 23, 2024 07:19
@yokofly
Copy link
Collaborator Author

yokofly commented Oct 23, 2024

It turns out that in the local build I previously used an option to disable optimization for debugging. When I later deactivated them, I got a 3x boost.

However, this is different from the release created by goreleaser. This is because the default goreleaser YAML usage is quite simple, with binary version LDflags.

reason:
the createtmp will assign an unique id
@yokofly yokofly merged commit 31be7b2 into main Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to convert date types when copy date to tmp table
3 participants