Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support data transfer protection in HDFS artifacts. Fixes #13482 #13483

Merged

Conversation

Looka149
Copy link
Contributor

Fixes #13482

Motivation

As shown in https://github.com/argoproj/argo-workflows/blob/main/examples/hdfs-artifact.yaml, I am working on storing artifacts in HDFS. In my environment, the HDFS cluster has dfs.data.transfer.protection set to authentication. When I run the example, I encounter a read: connection reset by peer error.

Modifications

The HDFS client supports the data transfer protection option, which can be configured to authentication, integrity, privacy, or left empty, as detailed in this pull request. With this configuration, the issue described above is resolved.

This feature is available in versions v2.4.0, v2.3.0, v2.2.1, and v2.2.0 (see commit). Since we are using version v2.4.0 of the HDFS client (as specified here), we can utilize this feature.

Verification

Referring to the hdfs-artifact.yaml example, you can specify the value of artifacts.hdfs.dataTransferProtection as 'authentication', 'integrity', 'privacy', or leave it empty.

  - name: print-message
    inputs:
      artifacts:
      - name: message
        path: /tmp/message
        hdfs:
          dataTransferProtection: authentication

If an incorrect value is provided, the following error will occur.

$ kubectl get workflow
NAME                  STATUS   AGE     MESSAGE
hdfs-artifact-td82h   Failed   33m     invalid spec: templates.artifact-example.steps[1].consume-artifact templates.print-message.inputs.artifacts.message.hdfs.dataTransferProtection must be one of `authentication`, `integrity`, `privacy`, or empty

@Looka149 Looka149 marked this pull request as ready for review August 19, 2024 10:47
@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Aug 19, 2024
@Looka149 Looka149 force-pushed the enhancement/hdfs-data-transfer-protection branch from 6cd885f to cacc094 Compare August 20, 2024 08:28
@Looka149
Copy link
Contributor Author

@terrytangyuan @sarabala1979
Hi, is there anything else needed to proceed with the approval? Thanks.

// Do nothing
default:
return errors.Errorf(errors.CodeBadRequest, "%s.dataTransferProtection must be one of 'authentication', 'integrity', 'privacy', or empty", errPrefix)
}
Copy link
Contributor

@juliev0 juliev0 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the validation here, although I'm torn on whether we should validate this or not. On one hand, if the ClientOptions ever change for DataTransferProtection then we'd have to update this when the version in go.mod gets updated. I'm wondering if it may be easy for someone to update the go.mod and then forget to check this part - what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that your concern is valid. I initially added validation based on the three modes specified in https://github.com/colinmarc/hdfs/blob/master/client.go#L24-L28. While I don't expect these options to change frequently, I agree that if they do change in the future, this validation might be easily overlooked.

@juliev0
Copy link
Contributor

juliev0 commented Sep 6, 2024

@Looka149 Your PR looks good. I can merge it. Only thing I'm thinking is that it may be better not to validate on these specific strings which could theoretically change, per my comment, but I may be convinced otherwise. Thanks.

@Looka149
Copy link
Contributor Author

Looka149 commented Sep 8, 2024

@juliev0 Thank you for reviewing my PR, I agreed with your feedback and have removed the validation. Thank you.

Signed-off-by: Injun Baeg <ij.baeg@navercorp.com>
Signed-off-by: Injun Baeg <ij.baeg@navercorp.com>
…with future updates

Signed-off-by: Injun Baeg <ij.baeg@navercorp.com>
Signed-off-by: Injun Baeg <ij.baeg@navercorp.com>
@Looka149 Looka149 force-pushed the enhancement/hdfs-data-transfer-protection branch from 68b0013 to ca34b16 Compare September 8, 2024 16:11
@juliev0
Copy link
Contributor

juliev0 commented Sep 9, 2024

/retest

@juliev0
Copy link
Contributor

juliev0 commented Sep 10, 2024

@Looka149 assuming the CI failures aren't related to your change, would you mind adding empty commits until the CI passes?

@Looka149
Copy link
Contributor Author

Looka149 commented Sep 10, 2024

@juliev0 Hi. I have merged the main branch, and the CI tests have completed successfully. Thanks.

@juliev0 juliev0 merged commit 1b0f008 into argoproj:main Sep 10, 2024
27 checks passed
@agilgur5 agilgur5 added this to the v3.6.0 milestone Sep 10, 2024
@agilgur5 agilgur5 changed the title feat: support data transfer protection in HDFS artifacts (#13482) feat: support data transfer protection in HDFS artifacts. Fixes #13482 Sep 10, 2024
options := hdfs.ClientOptions{
Addresses: addresses,
Addresses: addresses,
DataTransferProtection: dataTransferProtection,
Copy link
Contributor

@agilgur5 agilgur5 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make artifacts more sustainable (other than plugins #5862), I've been arguing we should perhaps accept arbitrary YAML string output more often. E.g. in this case:

clientOptions: |
  dataTransferProtection: true

That way we don't have to add a new feature every time a user wants to configure something else

Although it doesn't apply in every case, to be fair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support data transfer protection in HDFS artifacts
3 participants