-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Databend sink #15727
Comments
FWIW, Databend seems to suggest using the We have been told about some performance limitations of that sink as-is, so it's worth investigating what a native sink may look like. |
We intend to do a 3-step batch sink:
BTW, direct insert mode would also be supported, and configurable via settings. |
Haha, I should have checked to see if you were part of @datafuselabs/databend before responding 😆 I'm definitely not familiar enough with the application to have strong opinions of how to implement it - I expect we'd lean on y'alls expertise for what's the most performant/reliable/has-whatever-features-we-need and go that way. Since object storage would be involved with the 3-step, it could be an opportunity to spike/explore using OpenDAL (#15715) |
Actually object storage is not involved with the 3-step. ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html |
Oh, I see - that seems handy! I saw that databendlabs/databend#9448 included "Datadog Vector integrate with Rust-driver", is this issue/work something Databend is considering contributing? |
Yes, I'm working on this these days. |
Hi @everpcpc! We've been taking a look at this request, and we're wondering if you could provide us with more details about the issues you're facing with Vector that require a new sink. Mainly, it would be great to have some background on what makes the HTTP or Clickhouse workarounds too complicated, then also some context on what the exact bandwidth concerns the existing sinks face. We're happy to extend Vector's surface area to include new projects, but we're also being careful about increasing the project's surface area if workarounds already exist. Sorry about chiming in so late in the game, and thanks in advance! 😸 |
hi @davidhuie-dd, Also, with pre-signed insert, we are able to do insert with cluster, not the single instance that handles the insert statement, which could gain much more performance. Besides, for the later CSV sink format, neither |
@everpcpc For documentation purposes: since ingress bandwidth is free on AWS, this is for saving egress bandwidth costs? It seems like it would help when traffic is between a Databend client and server within the same region, but within different AZs. That would make the bandwidth cost free. Thanks. |
@davidhuie-dd some additional notes:
|
A note for the community
Use Cases
sink logs to databend
Attempted Solutions
Http sink can do the trick, but it's too complicated.
Proposal
We need to do some further improvement for the cloud database, such as using presigned url for upload to save most of the transfer fee.
References
Version
0.26.0
The text was updated successfully, but these errors were encountered: