-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(new sink): new postgres
sink
#22481
base: master
Are you sure you want to change the base?
Conversation
@pront We are fully aware of it and analysed it. 😉 And yet we don't think it is the right way to do. Please take a closer look at the code. Features we can add, no worries. But we have a really enormous data and we need it in large amounts into Postgres and TimescaleDB. We specifically need that optimised for cloud usage (mem/CPU matters!). In a worst case you will have two sinks! 😆 Call it "lightweight PgSink". |
Sure will do. It will take some time though so please bear with me.
Did you compare both implementations against some benchmarks?
Having two sinks doing the same thing is probably not what we want. I do like that #21248 has support for all telemetry data, Vector features such as ACKs and good UX. And most importantly, a lot of testing. Again, I didn't dive into the differences and I need some time to do so. I wonder, since you looked the existing PR, can you work on optimizing that after it lands? |
Thanks!
To our defence, our day one Chapter 1 is not half-year Chapter 128 😛. We specifically focused on having it zero-copy, no dependencies generic micro-sink. Adding features is not a problem, ACKs are coming, as it is a necessity.
We would definitely support and maintain ours — that's for sure, because it will go into production straight away. Alternatively, it can land in "contrib" section: more options to choose from is always better. We are interested in bringing more sinks/transforms in a near future. |
Hi, @pront with these changes we should have feature parity with the other PR aside from configuration. Is there a nice way to do benchmarks? I looked at the benches directory but didn't really understand how to apply that to this use case. |
Hi, I would like to drop my opinion on this.
I'm not really sure about this and claiming about performance improvements and optimizations without measuring it, is a mistake. I see that you are not batching events and every ingested event results in a network trip. I would be surprised to see that this approach results in a higher throughput than batching them. Taking a look at your implementation, I'm not sure it would work in a general case. For example, this prepared statement vector/src/sinks/postgres/mod.rs Line 94 in e9b0c8c
and then when inserting the column values vector/src/sinks/postgres/mod.rs Line 191 in e9b0c8c
Moreover, as you are loading the table column's on sink's startup vector/src/sinks/postgres/mod.rs Line 90 in e9b0c8c
And also, I'm not sure your implementation works for Composite types (maybe does, but I'm currently not sure if it does). The implementations are not feature-wise equal so I don't think that a performance comparison makes sense in this case though (whichever would be the fastest).
No allocation does not always imply to be faster. It generally is faster to not allocate, but does not imply to be faster.
so does #21248. https://docs.rs/sqlx/latest/sqlx/fn.query.html
This is also a fallacy. From a new user experience, not having a single solution is actually worse, as users would struggle deciding which one to use, for example. Moreover, it is a maintenance overhead for maintainers to have multiple implementations for nearly the same. From my point of view, we should not be talking about should be faster and actually measuring it, but as I think this is not feature-wise equal to #21248, I don't know if it makes sense to just choose the fastest |
Also, you state that
but clearly using a vector/src/sinks/postgres/mod.rs Line 192 in e9b0c8c
valgrind . Stating that allocations are not done purely based on your written code and not on your dependencies' code (which also must be taken into account) is wrong.
|
Summary
A zero-copy postgres sink that requires no new dependencies (it adds one feature on
tokio-postgres
).The sink uses a prepared statement to insert the data in pure SQL instead of serializing the data to JSON and deserializing it in the database.
For now the sink can only handle Logs and Traces.
Tests are still missing but it can be E2E tested it using this setup:
Change Type
Is this a breaking change?
How did you test this PR?
Does this PR include user facing changes?
Checklist
make check-all
is a good command to run locally. This check isdefined here. Some of these
checks might not be relevant to your PR. For Rust changes, at the very least you should run:
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.References