Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration: Update default web endpoints, and make them configurable #1718

Merged
merged 39 commits into from
Sep 13, 2022

Conversation

o0Ignition0o
Copy link
Contributor

@o0Ignition0o o0Ignition0o commented Sep 7, 2022

Fixes #1500

Configuration: Update metrics and healthcheck web endpoints, and make them configurable (#1500)

The web endpoints exposed by the router listen to 127.0.0.1 by default, and the ports and paths for health check and prometheus have changed.

Here's the list of the endpoints exposed by the router:

While you could previously only customize the path for these endpoints, you can now customize the full IP address, PORT and PATH.

In order to enable this new feature, various server attributes such as listen, graphql_path and landing_page moved to more relevant sections.
Likewise, introspection and preview_defer_support have moved from the server section to the supergraph section:

This previous configuration:

server:
  listen: 127.0.0.1:4000
  graphql_path: /graphql
  health_check_path: /health
  introspection: false
  preview_defer_support: true
  landing_page: true
telemetry:
  metrics:
    prometheus:
      enabled: true

Now becomes:

# landing_page configuration
sandbox: 
  listen: 127.0.0.1:4000
  path: /
  enabled: false # default
# graphql_path configuration
supergraph:
  listen: 127.0.0.1:4000
  path: /
  introspection: false
  preview_defer_support: true
# health_check_path configuration
health-check:
  listen: 127.0.0.1:9494
  path: /health
  enabled: true # default
# prometheus scraper configuration
telemetry:
  metrics:
    prometheus:
      listen: 127.0.0.1:9090
      path: /metrics
      enabled: true

By @o0Ignition0o in #1718

@github-actions

This comment has been minimized.

Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I think the helm chart will need to be updated to reflect these changes. At least the health check stuff and I expect a lot of internal yaml attribute chasing will need to be updated. Let me know if you want me to work on the helm stuff.

docs/source/configuration/health-checks.mdx Outdated Show resolved Hide resolved
@o0Ignition0o o0Ignition0o marked this pull request as ready for review September 12, 2022 10:12
@o0Ignition0o
Copy link
Contributor Author

o0Ignition0o commented Sep 12, 2022

Not sure why the CI fails on linux, and just on linux, rerunning the job

Edit: it just passed Oo

@@ -120,52 +165,57 @@ where
})
}

fn ensure_listenaddrs_consistency(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could there be a comment here explaining the goal of this function?


#[tokio::test]
#[should_panic(
expected = "Failed to create server factory: DifferentListenAddrsOnSamePort(127.0.0.1, 0.0.0.0, 4000)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason behind this? Do we consider that a sandbox with a different IP but same port would be a misconfiguration? What about this:

  • graphql server listening on 0.0.0.0:4000
  • sandbox listening on <IP accessible only from the internal network>:4000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the test to fail if two different listenaddrs want to bind to the same port. This would be a security issue (binding to 127.0.0.1:4000 and 0.0.0.0:4000 would effectively mean the 127.0.0.1 endpoint is accessible from WAN)

The reason why this test fails this way is because we're using the default graphql endpoint (127.0.0.1:4000) and 0.0.0.0:4000 sandbox, which would "leak the graphql endpoint".

The example you have in mind is currently not supported (this would require more efforts like binding to 0.0.0.0:4000 and then manually "route" the connections according to the configuration, so we decided to not support it yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a security issue (binding to 127.0.0.1:4000 and 0.0.0.0:4000 would effectively mean the 127.0.0.1 endpoint is accessible from WAN)

It looks like I don't have the right understanding of this PR then. In which case could this happen? What would it look like in the configuration file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pretend you wanna expose the graphql endpoint and the prometheus one on the same port:

sandbox:
  listen: 0.0.0.0:4000
  path: /
  enabled: true # NOT MANDATORY?
telemetry:
  metrics:
    prometheus:
      listen: 127.0.0.1:4000
      path: /metrics
      enabled: true

With this setup /metrics would be exposed to WAN AFAICT, which would be a security issue.

ensure_listenaddrs_consistency makes sure the configuration doesn't have this, and raises an error if it's the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the way I imagined the feature was more aligned with this:

The example you have in mind is currently not supported (this would require more efforts like binding to 0.0.0.0:4000 and then manually "route" the connections according to the configuration, so we decided to not support it yet.

But since it is not implemented yet we can end up with a security issue, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can’t, because the router explicitly refuses to start if that’s the case

#[serde(default = "default_defer_support")]
pub(crate) preview_defer_support: bool,
fn default_health_check_listen() -> ListenAddr {
SocketAddr::from_str("127.0.0.1:9090").unwrap().into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't the 9090 port supposed to be used only by the prometheus endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the comment i ve worked from says otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the health check was on the same port as the graphql server, that's why it was under the .well-known path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is an intentional breaking change, the goal is to not expose it to internet by default.

Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I think this is a big improvement on what we had.

NEXT_CHANGELOG.md Outdated Show resolved Hide resolved
apollo-router/src/configuration/mod.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/csrf.rs Show resolved Hide resolved
Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to run cargo xtask test locally without killing my local router instance

apollo-router/src/axum_http_server_factory.rs Outdated Show resolved Hide resolved
apollo-router/src/axum_http_server_factory.rs Outdated Show resolved Hide resolved
apollo-router/src/axum_http_server_factory.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@o0Ignition0o o0Ignition0o enabled auto-merge (squash) September 13, 2022 11:05
@o0Ignition0o o0Ignition0o merged commit 817ba70 into main Sep 13, 2022
@o0Ignition0o o0Ignition0o deleted the igni/graphql_config branch September 13, 2022 11:18
@bnjjj bnjjj restored the igni/graphql_config branch September 13, 2022 12:03
@abernix abernix deleted the igni/graphql_config branch May 16, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconsider the organization of configuration sections
5 participants