Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new sink): Add sinks sftp support #18076

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Xuanwo
Copy link
Contributor

@Xuanwo Xuanwo commented Jul 25, 2023

This PR is part of #3382.

Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo requested a review from a team July 25, 2023 07:19
@netlify
Copy link

netlify bot commented Jul 25, 2023

Deploy Preview for vector-project canceled.

Name Link
🔨 Latest commit e3298a4
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/64c47a99dbacde0008003b39

@netlify
Copy link

netlify bot commented Jul 25, 2023

Deploy Preview for vrl-playground canceled.

Name Link
🔨 Latest commit e3298a4
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/64c47a99d5723800081c3c69

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Jul 25, 2023
@gaby
Copy link

gaby commented Jul 28, 2023

@Xuanwo The tokio-utils update has been merged:

Rel #18078

@jszwedko
Copy link
Member

Are you still seeing the compilation errors with io-lifetimes? I'm actually seeing different compilation errors related to the code in this branch.

Thanks for opening this to get the conversation started! We'll get a more thorough review soon.

Xuanwo added 2 commits July 29, 2023 10:22
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jul 29, 2023

I'm actually seeing different compilation errors related to the code in this branch.

Thanks for the remind from @gaby, all build issues have been fixed.


Thanks for opening this to get the conversation started! We'll get a more thorough review soon.

Let's confirm the correct direction before pushing any works to docs and configs.

@StephenWakely
Copy link
Contributor

@Xuanwo, just to confirm, if we merge this will you be willing to continue supporting this sink in future?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Aug 8, 2023

@Xuanwo, just to confirm, if we merge this will you be willing to continue supporting this sink in future?

This PR is far away from been merged. So I'm guessing we are talking the future: when this PR is ready and sftp is able to use, will I keep supporting sftp sink?

My answer is YES. As you may know, both webhdfs and sftp are OpenDAL based sinks. Most logic are vector internal and not related to sinks.

And sink related things breaking means OpenDAL is broken. In this case, I will make sure it fixed so that all opendal users will benefit from the fix too.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Aug 8, 2023

As you may know, both webhdfs and sftp are OpenDAL based sinks.

There is an ongoing discussion on adding a doc about how to implement and maintain sinks that based opendal.

#15715 (comment)

@StephenWakely
Copy link
Contributor

when this PR is ready and sftp is able to use, will I keep supporting sftp sink?

Yes. Thank you.

@dsmith3197 dsmith3197 self-assigned this Aug 21, 2023
Copy link
Contributor

@dsmith3197 dsmith3197 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me once the comments are addressed.

I would like to see us add an integration test for this sink, similar to the webhdfs sink.

Also, can you run make generate-component-docs to update the generated cue docs to add the sftp to the user-facing documentation? We'll also want to add a note that this only works on unix systems, as mentioned in the crate's docs.

Comment on lines +1 to +15
//! `sftp` sink.
//!
//! `sftp` SFTP, or Secure File Transfer Protocol, is a network protocol used for
//! securely transferring files over the internet. It operates over the Secure
//! Shell (SSH) data stream, providing secure file transfer by both encrypting
//! the data and maintaining the integrity of the transfer. SFTP also supports
//! file management operations like moving and deleting files on the server, unlike FTP.
//!
//! For more information, please refer to:
//!
//! - [sftp(1) — Linux manual page](https://man7.org/linux/man-pages/man1/sftp.1.html)
//!
//! `sftp` is an OpenDal based services. This mod itself only provide config to build an
//! [`crate::sinks::opendal_common::OpenDalSink`]. All real implement are powered by
//! [`crate::sinks::opendal_common::OpenDalSink`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a module doc! I have a few small suggestions.

Suggested change
//! `sftp` sink.
//!
//! `sftp` SFTP, or Secure File Transfer Protocol, is a network protocol used for
//! securely transferring files over the internet. It operates over the Secure
//! Shell (SSH) data stream, providing secure file transfer by both encrypting
//! the data and maintaining the integrity of the transfer. SFTP also supports
//! file management operations like moving and deleting files on the server, unlike FTP.
//!
//! For more information, please refer to:
//!
//! - [sftp(1) — Linux manual page](https://man7.org/linux/man-pages/man1/sftp.1.html)
//!
//! `sftp` is an OpenDal based services. This mod itself only provide config to build an
//! [`crate::sinks::opendal_common::OpenDalSink`]. All real implement are powered by
//! [`crate::sinks::opendal_common::OpenDalSink`].
//! `sftp` sink.
//!
//! SFTP, or Secure File Transfer Protocol, is a network protocol used for
//! securely transferring files over the internet. It operates over the Secure
//! Shell (SSH) data stream, providing secure file transfer by both encrypting
//! the data and maintaining the integrity of the transfer. SFTP also supports
//! file management operations like moving and deleting files on the server, unlike FTP.
//!
//! For more information, please refer to:
//!
//! - [sftp(1) — Linux manual page](https://man7.org/linux/man-pages/man1/sftp.1.html)
//!
//! `sftp` is an OpenDal-based sink. This module itself only provide config to build an
//! [`crate::sinks::opendal_common::OpenDalSink`].

Comment on lines +4 to +21
use vector_config::configurable_component;
use vector_core::{
config::{AcknowledgementsConfig, DataType, Input},
sink::VectorSink,
};

use crate::{
codecs::{Encoder, EncodingConfigWithFraming, SinkType},
config::{GenerateConfig, SinkConfig, SinkContext},
sinks::{
opendal_common::*,
util::{
partitioner::KeyPartitioner, BatchConfig, BulkSizeBasedDefaultBatchSettings,
Compression,
},
Healthcheck,
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit - we can remove some of the common imports by including crate::sinks::prelude::*.

Suggested change
use vector_config::configurable_component;
use vector_core::{
config::{AcknowledgementsConfig, DataType, Input},
sink::VectorSink,
};
use crate::{
codecs::{Encoder, EncodingConfigWithFraming, SinkType},
config::{GenerateConfig, SinkConfig, SinkContext},
sinks::{
opendal_common::*,
util::{
partitioner::KeyPartitioner, BatchConfig, BulkSizeBasedDefaultBatchSettings,
Compression,
},
Healthcheck,
},
};
use vector_core::config::DataType;
use crate::{
codecs::{EncodingConfigWithFraming, SinkType},
sinks::{
prelude::*,
opendal_common::*,
util::{
partitioner::KeyPartitioner, BulkSizeBasedDefaultBatchSettings,
},
},
};

};

/// Configuration for the `sftp` sink.
#[configurable_component(sink("sftp", "Sftp."))]
Copy link
Contributor

@dsmith3197 dsmith3197 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a more detailed description to this component (the second string)? This description ends up getting populated in the docs at vector.dev.

Suggested change
#[configurable_component(sink("sftp", "Sftp."))]
#[configurable_component(sink("sftp", "Secure File Transfer Protocol (SFTP)."))]

#[derive(Clone, Debug)]
#[serde(deny_unknown_fields)]
pub struct SftpConfig {
/// The root path for Sftp.
Copy link
Contributor

@dsmith3197 dsmith3197 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adopting the language from the opendal sftp docs here to make these easier for Vector users to configure.

Suggested change
/// The root path for Sftp.
/// The working directory for the backend. Defaults to the default directory
/// set by the remote SFTP server.

#[configurable(metadata(docs::templateable))]
pub prefix: String,

/// The endpoint to connect to sftp.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The endpoint to connect to sftp.
/// The endpoint of the SFTP backend.

}

#[test]
fn Sftp_build_request() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn Sftp_build_request() {
fn sftp_build_request() {

Comment on lines +39 to +51
fn request_builder(sink_config: &SftpConfig) -> OpenDalRequestBuilder {
let transformer = sink_config.encoding.transformer();
let (framer, serializer) = sink_config
.encoding
.build(SinkType::MessageBased)
.expect("encoding must build with success");
let encoder = Encoder::<Framer>::new(framer, serializer);

OpenDalRequestBuilder {
encoder: (transformer, encoder),
compression: sink_config.compression,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than re-implement this logic here, we should extract the corresponding logic out of SftpConfig::build_processor into a separate method so that we can test it in isolation.

}

#[test]
fn Sftp_generate_config() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn Sftp_generate_config() {
fn sftp_generate_config() {

Comment on lines +82 to +86
let req = build_request(Compression::None);
assert!(req.metadata.partition_key.ends_with(".log"));

let req = build_request(Compression::None);
assert!(req.metadata.partition_key.ends_with(".log"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines seem to be redundant.

Suggested change
let req = build_request(Compression::None);
assert!(req.metadata.partition_key.ends_with(".log"));
let req = build_request(Compression::None);
assert!(req.metadata.partition_key.ends_with(".log"));
let req = build_request(Compression::None);
assert!(req.metadata.partition_key.ends_with(".log"));

@@ -87,6 +87,8 @@ pub mod redis;
pub mod s3_common;
#[cfg(feature = "sinks-sematext")]
pub mod sematext;
#[cfg(feature = "sinks-sftp")]
Copy link
Contributor

@dsmith3197 dsmith3197 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also gate this behind unix as the sftp opendal service states that it only works for unix.

Suggested change
#[cfg(feature = "sinks-sftp")]
#[cfg(all(unix, feature = "sinks-sftp"))]

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Aug 22, 2023

Thanks for the review! I will come back next week.

@dsmith3197 dsmith3197 added the meta: awaiting author Pull requests that are awaiting their author. label Sep 1, 2023
@gaby
Copy link

gaby commented Nov 22, 2023

Thanks for the review! I will come back next week.

@Xuanwo Any updates on this?

@gaby
Copy link

gaby commented Feb 27, 2024

@Xuanwo Do you have plans for updating this PR?

@gaby gaby mentioned this pull request Mar 26, 2024
@gaby
Copy link

gaby commented Jul 3, 2024

@dsmith3197 This can probably be closed. The author hasnt responded in months.

@jszwedko
Copy link
Member

jszwedko commented Jul 3, 2024

I think we can leave it open until someone else picks up the torch so that they have a base to start from. Otherwise someone might start from scratch, not realizing some efforts have been made already. I'll mark it as draft though.

@jszwedko jszwedko marked this pull request as draft July 3, 2024 13:54
@jszwedko jszwedko mentioned this pull request Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks meta: awaiting author Pull requests that are awaiting their author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants