Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTPS/certificate authentication support to file transfer service #2474

Merged
merged 24 commits into from
Nov 30, 2023

Conversation

jarhodes314
Copy link
Contributor

@jarhodes314 jarhodes314 commented Nov 21, 2023

Proposed changes

This adds support for HTTPS and certificate authentication to the file transfer service.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#2475

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Merging #2474 (55b8577) into main (b216239) will decrease coverage by 0.1%.
The diff coverage is 70.7%.

Additional details and impacted files
Files Coverage Δ
crates/common/axum_tls/src/acceptor.rs 94.9% <100.0%> (+3.1%) ⬆️
crates/common/axum_tls/src/lib.rs 100.0% <ø> (ø)
crates/common/axum_tls/src/redirect_http.rs 20.4% <ø> (ø)
crates/common/download/src/error.rs 28.5% <ø> (ø)
crates/common/log_manager/src/error.rs 0.0% <ø> (ø)
crates/common/tedge_config/src/lib.rs 0.0% <ø> (ø)
crates/common/upload/src/error.rs 85.7% <ø> (ø)
...extensions/c8y_config_manager/src/plugin_config.rs 74.3% <ø> (ø)
...rates/extensions/tedge_config_manager/src/error.rs 0.0% <ø> (ø)
...rates/extensions/tedge_config_manager/src/tests.rs 89.1% <100.0%> (ø)
... and 25 more

... and 2 files with indirect coverage changes

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Copy link
Contributor

@Bravo555 Bravo555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only one request - adding some documentation to the new axum_tls crate. I think it having doc comments for our crates especially (and sometimes for some complex modules) significantly helps with understanding and why the code is the way it is, and in this case, I think even a short summary would be helpful.

crates/core/tedge_agent/src/agent.rs Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/file_transfer_server/actor.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/file_transfer_server/error.rs Outdated Show resolved Hide resolved
crates/common/axum_tls/src/lib.rs Outdated Show resolved Hide resolved

Once HTTPS is enabled for the file-transfer service, certificate-based authentication can also be enabled.
The directory containing the certificates that the mapper will trust can be configured using `http.ca_path`,
and the agent can be configured to use a trusted certificate using the `http.client.auth.cert_file` and `http.client.auth.key_file`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and the agent can be configured to use a trusted certificate using the `http.client.auth.cert_file` and `http.client.auth.key_file`
and the mapper can be configured to use a trusted certificate using the `http.client.auth.cert_file` and `http.client.auth.key_file`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true, but the "mapper" in the sentence above is incorrect (as you can probably tell, I copied and pasted it from the auth proxy docs since they're broadly the same, but forgot to change that bit).

Copy link
Contributor

@albinsuresh albinsuresh Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? There seems to be some misunderstanding about this setting then, as I see @didier-wenzek also suggesting a similar change. This is the client setting, right? Meant to be used by the clients of the agent: which are the mappers and other child devices.

Or these are the client settings to be used by the agent, when it is the client of other services like the c8y.proxy? If that's the case, then you may wanna stress that and also mention the settings to be used by clients of the tedge-agent like the mapper and child devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted it to be in line with Didier's comment, as well as correcting the line above, which was also adding to my confusion as that was also wrong.

Comment on lines +36 to +37
## HTTPS and authenticated access
By default, the service is unauthenticated and does not support HTTPS connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## HTTPS and authenticated access
By default, the service is unauthenticated and does not support HTTPS connections.
## HTTPS and authenticated access
By default, the service is unauthenticated and does not support HTTPS connections.

docs/src/references/tedge-file-transfer-service.md Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
…ent config generation

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
crates/core/tedge_actors/src/errors.rs Outdated Show resolved Hide resolved
Comment on lines +628 to +631
/// The file that will be used as a the server certificate for the File Transfer Service
#[tedge_config(example = "/etc/tedge/device-certs/file_transfer_certificate.pem")]
#[doku(as = "PathBuf")]
cert_path: Utf8PathBuf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take the time to write scripts and docs on how to setup all these certificates.

I suspect that the current set of settings could be made simpler. For instance, in practice, the tedge-agent running on the main device can use the same certificate to connect the c8y auth proxy and to serve files to child devices. Also, on a child device, a user might wrongly set http.cert_path instead of http.client.auth.cert_file.
It would be good to have a certificate per service instance and to have config settings that reflect that.
Writing scripts and docs will help me to clarify this.

Copy link
Contributor

@didier-wenzek didier-wenzek Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is first raw version for a script (where all names are hard-coded!): https://gist.github.com/didier-wenzek/110b618eda6866814dc02fe4f168ff2d

Using this script, I can confirm that we can configure different levels of authentication (HTTP -> HTTPS with server authentication -> HTTPS with symmetric authentication) for c8y-auth-proxy and file-transfer. This flexibility adds complexity though.

Copy link
Contributor

@albinsuresh albinsuresh Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a certificate per service instance and to have config settings that reflect that.

I just hope per-service certificates won't make the setup too complicated for clients. Ideally, a child device should be able to connect to both the c8y-proxy and FTS using the same client certificate. At least for our docs/demos, having the same root CA sign all the service certificates as well as the client certificates should simplify things I guess. Just saw that your script is doing exactly that.

Writing scripts and docs will help me to clarify this.

That'd really help. With the agent being a client of the mapper for the proxy and the reverse for the file transfer service, I found the client cert setting twins of both the services really confusing to get my head around and made a lot of errors.

I feel that it'd be even more complicated when a secure tedge-agent on a child device wants to connect to a secure tedge-agent on the main device.

Here is first raw version for a script (where all names are hard-coded!): https://gist.github.com/didier-wenzek/110b618eda6866814dc02fe4f168ff2d

That really helped and that long list of config settings highlights the complexity as well. It all makes sense, but as a regular user I find it extremely cumbersome.

docs/src/references/tedge-file-transfer-service.md Outdated Show resolved Hide resolved
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
@jarhodes314
Copy link
Contributor Author

Something is wrong. Could my certificate settings be the issue?
Triggering a log upload from the cloud, the final outcome is a 404 error: "Upload failed with Request(reqwest::Error { kind: Status(404), url: Url { scheme: https, cannot_be_a_base: false, username: \", password: None, host: Some(Domain(localhost)), port: Some(8000), path: /tedge/file-transfer/demo-device-777/log_upload/mosquitto-c8y-mapper-2023-11-24T16:45:19.758911928Z, query: None, fragment: None } }).
The error message could be clearer, but this is not my point. The log upload command is executed as expected up to the point where the c8y mapper has to GET the file from file transfer, leading to a 404 and indeed the file is not here: the log plugin (launched using the all-in-one tedge-agent-v1) pretends that that file have been successfully uploaded - but there is no file nor upload errors.

I haven't verified this, but I would imagine the issue is that the agent is not reuploading the file when it receives the 307 redirect response to the HTTPS URL after initially connecting via HTTP. I'll have a look into it on Monday.

So the thing that's happening here is the PUT request to upload the file cannot be retried, so the response to that request is a 307. As far as reqwest is concerned, this isn't an error, the request has succeeded since it's not a 4xx or 5xx status code. That explains why (a) the upload failed and (b) why the log plugin ignored the issue.

I plan to solve this by making a HEAD request to the URL we intend to upload to, and using the final URL from that request as the target to upload to. This is a bit of a hack at the moment since we don't actually implement the HEAD method in the file transfer service, but it's sufficient to generate the desired 307 response in the case where we are using HTTPS. @didier-wenzek does this sound like a reasonable approach to you? I've implemented it locally, and I'm just refining the error handling so we get clear error messages from the log plugin in the case of a failure.

@didier-wenzek
Copy link
Contributor

Something is wrong. Could my certificate settings be the issue?
Triggering a log upload from the cloud, the final outcome is a 404 error: "Upload failed with Request(reqwest::Error { kind: Status(404), url: Url { scheme: https, cannot_be_a_base: false, username: \", password: None, host: Some(Domain(localhost)), port: Some(8000), path: /tedge/file-transfer/demo-device-777/log_upload/mosquitto-c8y-mapper-2023-11-24T16:45:19.758911928Z, query: None, fragment: None } }).
The error message could be clearer, but this is not my point. The log upload command is executed as expected up to the point where the c8y mapper has to GET the file from file transfer, leading to a 404 and indeed the file is not here: the log plugin (launched using the all-in-one tedge-agent-v1) pretends that that file have been successfully uploaded - but there is no file nor upload errors.

I haven't verified this, but I would imagine the issue is that the agent is not reuploading the file when it receives the 307 redirect response to the HTTPS URL after initially connecting via HTTP. I'll have a look into it on Monday.

So the thing that's happening here is the PUT request to upload the file cannot be retried, so the response to that request is a 307. As far as reqwest is concerned, this isn't an error, the request has succeeded since it's not a 4xx or 5xx status code. That explains why (a) the upload failed and (b) why the log plugin ignored the issue.

I plan to solve this by making a HEAD request to the URL we intend to upload to, and using the final URL from that request as the target to upload to. This is a bit of a hack at the moment since we don't actually implement the HEAD method in the file transfer service, but it's sufficient to generate the desired 307 response in the case where we are using HTTPS. @didier-wenzek does this sound like a reasonable approach to you? I've implemented it locally, and I'm just refining the error handling so we get clear error messages from the log plugin in the case of a failure.

I'm okay to send a first HEAD request, even if not usual nor ideal. Could the client issue directly an HTTPS request, guessing the scheme from the config?

…n mapper

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
@jarhodes314
Copy link
Contributor Author

Something is wrong. Could my certificate settings be the issue?
Triggering a log upload from the cloud, the final outcome is a 404 error: "Upload failed with Request(reqwest::Error { kind: Status(404), url: Url { scheme: https, cannot_be_a_base: false, username: \", password: None, host: Some(Domain(localhost)), port: Some(8000), path: /tedge/file-transfer/demo-device-777/log_upload/mosquitto-c8y-mapper-2023-11-24T16:45:19.758911928Z, query: None, fragment: None } }).
The error message could be clearer, but this is not my point. The log upload command is executed as expected up to the point where the c8y mapper has to GET the file from file transfer, leading to a 404 and indeed the file is not here: the log plugin (launched using the all-in-one tedge-agent-v1) pretends that that file have been successfully uploaded - but there is no file nor upload errors.

I haven't verified this, but I would imagine the issue is that the agent is not reuploading the file when it receives the 307 redirect response to the HTTPS URL after initially connecting via HTTP. I'll have a look into it on Monday.

So the thing that's happening here is the PUT request to upload the file cannot be retried, so the response to that request is a 307. As far as reqwest is concerned, this isn't an error, the request has succeeded since it's not a 4xx or 5xx status code. That explains why (a) the upload failed and (b) why the log plugin ignored the issue.
I plan to solve this by making a HEAD request to the URL we intend to upload to, and using the final URL from that request as the target to upload to. This is a bit of a hack at the moment since we don't actually implement the HEAD method in the file transfer service, but it's sufficient to generate the desired 307 response in the case where we are using HTTPS. @didier-wenzek does this sound like a reasonable approach to you? I've implemented it locally, and I'm just refining the error handling so we get clear error messages from the log plugin in the case of a failure.

I'm okay to send a first HEAD request, even if not usual nor ideal. Could the client issue directly an HTTPS request, guessing the scheme from the config?

It could issue an HTTPS request directly, but...

  • in manual testing of the feature, I found the HEAD request necessary to produce a clear error when certificates are required and the client is not providing them
  • if client certificates are not being used, we can only infer the values from the configuration on the main device, a child device shouldn't be configured with http.cert_path and http.key_path, so we would only save effort in certain cases
  • the code changes required in the uploader to support this are agnostic to the redirect, they simply perform the HEAD request and use the final URL of the request for the subsequent PUT request. If we were to possibly attempt the request using HTTPS first, this would add a bit of complexity (though this is a minor point and wouldn't dissuade me per se)

I think we're already on the verge of having too many configurations for this feature, I think making this "just work" at the cost of an additional HTTP request is worthwhile compared to adding another configuration.

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The unit test coverage is definitely good, but shouldn't we have some integration tests as well, which tests a secure setup end-to-end? May be just one test that validates secure access to the file-transfer service with curl or something.

But, to really make sure that our own components can work together when security is enabled, the following tests would also be required:

  1. tedge-log-plugin on child device connecting to secure tedge
  2. tedge-config-plugin on child device connecting to secure tedge
  3. tedge-agent on child device connecting to secure tedge

And the variations of the above cases as well: authenticated client and unauthenticated client. These tests can later be used as a reference for our docs on how to configure a secure deployment.

Since adding all those tests would be a fairly big task, it can be done in follow-up tasks and someone else should be able to support you with those tests.

crates/common/log_manager/src/error.rs Outdated Show resolved Hide resolved
@@ -214,8 +214,9 @@ impl CumulocityConverter {
let download = match download_result {
Err(err) => {
let smartrest_error = SmartRestSetOperationToFailed::new(
CumulocitySupportedOperations::C8yLogFileRequest,
format!("Upload failed with {:?}", err),
// TODO test I'm the right message type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks right. The previous op type: CumulocitySupportedOperations::C8yLogFileRequest was clearly wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see this was how we handle failed operations. We must already know this value is order to execute the operation, so all this function should need to provide is the failure reason, it shouldn't care what the operation is called. This is a particularly critical thing to handle correctly due to Cumulocity's assumption that devices always complete operations, and reproducing this failure is pretty difficult in order to test it (and expensive to test since it requires an integration test if you want more of a guarantee than "hopefully the test author wrote the correct value in their test").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must already know this value is order to execute the operation, so all this function should need to provide is the failure reason, it shouldn't care what the operation is called.

It's partly due to the fact that the operation status transitions in SmartREST is generic and requires the operation type to be sent in that message. That same design was adopted here as well, in generic structures like SmartRestSetOperationToFailed which leads to such accidental errors. But, I agree that some sort of "current-operation-aware" warpper should have been used here to avoid such mistakes.

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
@@ -30,7 +30,7 @@ Request with non-existing log type
... fragments={"c8y_LogfileRequest":{"dateFrom":"${start_timestamp}","dateTo":"${end_timestamp}","logFile":"example1","searchText":"first","maximumLines":10}}
Operation Should Be FAILED
... ${operation}
... failure_reason=.*No such file or directory for log type: example1
... failure_reason=.*No logs found for log type example1
Copy link
Contributor Author

@jarhodes314 jarhodes314 Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created (and will likely work on soon if no one objects) #2487 for including the quotes in the error message. It causes a bunch of unit test failures making the required change locally, so that will be a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections. But, in the meantime you can also use single quotes or back quotes to get around that problem.

@didier-wenzek
Copy link
Contributor

didier-wenzek commented Nov 29, 2023

I think we're already on the verge of having too many configurations for this feature, I think making this "just work" at the cost of an additional HTTP request is worthwhile compared to adding another configuration.

I confirm that things just work.

  • Certificate-based authentication of c8y-auth-proxy, file-transfer and their clients
  • Software download and install
  • Log upload (using tedge-log-plugin or tedge-agent-v1 that includes log upload + TLS support).
  • Configuration file upload and download (using tedge-config-plugin or tedge-agent-v1 that includes config management + TLS support).

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thank you for this quite involved and really useful PR.

@@ -187,6 +187,7 @@ impl CumulocityConverter {
download_result: DownloadResult,
fts_download: FtsDownloadOperationData,
) -> Result<Vec<Message>, ConversionError> {
dbg!(&download_result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dbg!(&download_result);

@jarhodes314
Copy link
Contributor Author

@albinsuresh @didier-wenzek I've added some integration tests to the configuration plugin. There is currently one case where I believe I have found a genuine bug, where a configuration update is marked as successful in cloud but the file isn't actually updated on disk. This only appears to happen if a child device is configured to use client certificates, but the mapper is not configured to use client certificates. Taking a configuration fails as expected with a helpful error in this case, so it should be obvious that something is wrong.

I think this is a sufficiently niche error in a niche feature that it doesn't justify blocking the merging of this PR. I'll start to investigate the root cause now, but unless anyone objects, I will merge this PR when the tests are passing and @albinsuresh has approved.

@didier-wenzek
Copy link
Contributor

I think this is a sufficiently niche error in a niche feature that it doesn't justify blocking the merging of this PR. I'll start to investigate the root cause now, but unless anyone objects, I will merge this PR when the tests are passing and @albinsuresh has approved.

I'm okay with that.

Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. The integration test also seems pretty complete. The rest of the variations can be added later on (even by someone else).

@@ -238,7 +238,7 @@ impl ConfigManagerActor {
}
Err(err) => {
let error_message = format!(
"Failed uploading configuration snapshot to file-transfer service: {}",
"tedge-configuration-plugin failed uploading configuration snapshot: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the name of the plugin directly here won't be a good idea, as this actor is reused in the c8y-config-plugin as well as the tedge-agent-v1. So, some generic reference like "config manager" would be better.

@jarhodes314 jarhodes314 merged commit ea254bf into thin-edge:main Nov 30, 2023
18 checks passed
@jarhodes314 jarhodes314 deleted the feat/file-transfer-tls branch December 27, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants