Replies: 7 comments 12 replies
-
To clarify, would the DPG be responsible for terminating TLS or would that happen inside the connectors? And if the latter, how does the connector get its certificate? I'm worried that we might run into per-domain rate limits fairly quickly if we use ACME, unless we're planning on running our own server for that... |
Beta Was this translation helpful? Give feedback.
-
Just wanted to update here after @jgraettinger and I discussed another detail. There's plenty of prior art for how to handle this, and I found teleport's design particularly informative. Riffing on that a bit, I think we'd want something like |
Beta Was this translation helpful? Give feedback.
-
Another aspect I'd like to talk though is TLS certificates. Our current approach to TLS certificates is to use the The obvious solution here is to just use a wildcard certificate, such as for |
Beta Was this translation helpful? Give feedback.
-
IT'S ALIVE! ...finally High-level guide / exampleStart with a connector that listens on a port. To start with a very simple use case, I hacked up a
Then publish a catalog that uses that connector and exposes the port, as in this example:
I'm then able to The Note that any port configured in the spec will be exposed to the public internet, without any form of authorization check. In the future, we should be able to build support for authorization checks in DPG using TLS client certificates, which might be desireable for certain scenarios. But I've considered that (far) out of scope for now. We'll want to be very clear in our documentation, though, because it would be easy for some naughty person to do all sorts of harm if they were able to connect to a debug port of a running connector, for example. How it worksAt build time: At runtime (DPG): This lookup is done using the shard labels that were populated at build time. Specifically, DPG does a shard listing for any shards having the labels: Once the TLS handshake is complete (more errata), it then starts a bi-directional streaming RPC with the reactor to which the shard is currently assigned. This RPC has an inital handshake that allows the reactor to validate that the connector is (still) assigned and running and allows accessing the given port. If this Proxy RPC handshake doesn't complete successfully, the connetion is closed. If it's successful, then DPG just starts copying data from the client connection to the RPC and vice versa. At runtime (reactor): Connector-init needs to know the port mappings up front, in order for it to translate from the named port in the proxy request to an actual port number that the connector is listening on. I added an The plan from here
UX questionsHow should a user learn about the hostname that was generated for a given task? My proposal is that we add an HTTP endpoint to DPG that returns all of the hostnames for a given named task after verifying that the JWT authorizes the user to write to the task. This lets it work with our existing authz, and avoids the need for control-plane to even know about the actual hostnames. That's desirable because of shard splits, which the control plane is unaware of. I'm thinking it may be good to support an additional, more succinct representation of I could see an argument for putting ErrataDPG can no longer handle HTTP/1.1 requests (edit: disregard)
Edit: It turns out that local builds were already not really working, and I needed to do some refactoring in order to get grpc working locally, anyway. So I refactored the code to use a custom Why is cacheing shard resolutions necessary? Taken from the code comment:
HTTP and ALPN Many use cases can likely be served without users needing to specify a
For the vast majority of cases, there shouldn't ever be a need to specify more than one ALPN protocol per port. The one exception I can think of is Why wait until the TLS handshake is complete before starting the proxy RPC? It'd be faster if we could start the Proxy RPC handshake while the TLS handshake is completing. This turns out not to be possible, given how Go's Handling split shards I haven't yet done anything to handle split shards. If you try to connect to a host that has multiple shards, then DPG just picks one at random. I think this is actually fine for a first pass, but we'll eventually want to support connecting to specific shards of a task. This should be pretty easy once we settle on a way to represent the shard's key and rclock begin values as part of the domain name. Something like |
Beta Was this translation helpful? Give feedback.
-
I realized I'd forgotten to address an aspect of the security considerations here. We could add a salt (and perhaps use a different hash algo) to make the hostnames more "secret". But there would still be significant vulnerabilities remaining, which would make me think twice before exposing a debug port to the open internet without any authentication. DNS queries and the tls client hello will both typically include the full plaintext hostname. So I think at this point, our stance should be: don't expose any sensitive debug ports unless the connector itself is enforcing some sort of authentication, and generally treating it's network traffic as "untrusted". This seems like a bit of a bummer, because as a connector developer I wouldn't want to pull in a ton of auth-enforcement code just for a debug endpoint. The dangers of exposing the port are easy to forget when the hostname seems to resemble an unknowable secret. You might think it's fine to just expose the port for a brief period of time, and then un-expose it when you're done debugging. And that might honestly be fine in some cases. But I don't think just using a "secret" domain label would ever be considered "secure". IANAE, so I'd be especially hesitant to rely on that approach, even on a short term basis. So I think we want to aim for having connectors all enforce authentication, even (especially) for things like debug ports (like for a debugger, pprof is maybe not as bad). It'd be nice if we could figure out some common code for dealing with that authentication, and re-use it across all our connectors. I'll leave that for a separate discussion, though. TLS Client Authentication offers another possible path for exposing things like debug ports securely, while allowing authentication to be handled entirely by DPG. I've considered this very far out of scope for the short term, but I mention here as a possible future solution if wrapping all connector endpoints with authentication becomes too onerous. The downside would be that control-plane would need to get involved with managing certificates, so the scope does not seem small. |
Beta Was this translation helpful? Give feedback.
-
I had a good conversation with @jgraettinger last week, and we've worked out some ways to simplify this feature and make it easier to use. The biggest change is that it now seems motivated to provide an HTTP proxy server. Doing so enables DPG to enforce authZ based on JWTs issued by the control plane. This in turn can enable easy and seamless integration of services running in connector containers with the Flow UI. It also gives us a way to have "private" ports, which are only accessible to authenticated users, without needing to implement and configure authentication in every single connector, at least for HTTP. If we're able to provide authenticated access to ports, then there's also less of a reason to force users to configure the exposed ports up front. Put aother way, if we're going to authenticate them first, why not just let them connect to any port (except for connector-init's port, of course)? Of course we can't ship a proxy for every protocol, so we'd say that anything other than http would just be treated as plain old TCP. But I think this raises the bar for the UX a bit, and the framing of So the new stance is that you don't need to have
From a user's perspective, there's no longer any need to explicitly configure ports that you want to be able to connect to, as long as those ports are for HTTP. For debug ports and such, you just get an auth token and connect. If you want to use another protocol besides http, then you need to mark the port as |
Beta Was this translation helpful? Give feedback.
-
Some thoughts on hashes for shard names and an eventual migration to pet-names. Background: The labels are currently determined by this code in the assemble crate. The current process is to hash the task name and use the hexidecimal value. In the future, we'd like to transition to using "pet-names" instead. Pet names are a "randomly" generated name that's intended to be human readable. The question that @jgraettinger brought up was about the transition from the hashes to the pet names. I obviously don't have a detailed answer, since there are still many unknowns about how we'll implement pet names. But what I can say is that we will definitely need to know the pet name for each task at the point where we generate the shard labels. One way to approach that would be to have a migration that assigns the current hash as the pet name for existing tasks that expose public ports. The thing about pet names is that they need to be persisted by the control plane, and somehow passed in to the publication process to be turned into shard labels. They're also still just opaque ids, so it's fine if a few pre-existing tasks have pet names that are hexidecimal. Alternatively, we could just update the publish process to preserve any existing I think for right now, it's enough to just know that we'll have some plausible options when it comes time to do the pet-name transition. Footnotes
|
Beta Was this translation helpful? Give feedback.
-
@jgraettinger and I have had a side conversation going about push-based ingestion, and I'd like to get it more out in the open and invite feedback and contributions from the broader group.
The high level idea is to reframe push based ingestions as regular captures, by allowing TCP connections from the public internet to running connector containers. The connector container would stand up a web server and use the regular capture protocol to ingest documents in response to web requests. The DPG would act as an L4 load balancer that routes TCP traffic to the container. Working at L4 enables it to work with a wide variety of protocols (Kafka and MQTT would both be pretty appealing). The end goal is that we should be able to handle a new type of ingest protocol just by implementing a new connector for it. There's lots of other interesting possibilities that open up, though, once we allow arbitrary network traffic into containers!
Of course the big questions are around how to route traffic between DGP and the connector container. Here's what we're thinking.
The key enabler is SNI, which is a ubiquitous TLS extension that embeds the hostname of the server in the TLS client hello message. The DPG would use that hostname to map a connection to the proper shard. The hostname would need to embed both the shard id and the data- plane hostname. Something like
<pet-name>-<key-begin>-<rclock-begin>.<data- plane>.estuary.dev
seems decent, but we could also just use a hash of the shard id as the subdomain, or maybe something else.Then DPG needs to somehow create a connection to the connector container. There's lots of different ways we could do this. We talked through a number of possible approaches, including using internal DNS, k8s Services, or even embedding another L4 balancer in flow-reactor. It doesn't feel like a simple problem, partly because there's still a lot of uncertainty around relying on k8s. At this point, we're thinking it makes sense to have flow-reactor provide a simple gRPC service that supports connection tunneling. I can expand on this in another comment if anyone wants more of the backstory.
ALPN would also be key to allowing connector containers to listen on multiple ports. An example would be having an HTTP server for a
/debug/pprof
endpoint on port 80, and also an MQTT server on port 1883. The DPG would need to know which protocols the connector supports, so it could complete ALPN negotiation. It would then specify the chosen protocol when connecting to the broker to create the connection tunnel. The broker uses this protocol name to determine which port to connect to.With this approach, DPG would not be concerned with authentication of requests. The connector container would do that itself if it chose to. For example, we could have an HTTP ingestion connector that allows you to set a Bearer token as part of the endpoint config, and it would require that requests present that token.
One downside to this approach would be that we couldn't give clients a reasonable error message if the shard's connector container is down. They'd just get something like "TLS handshake error: server offered no protocols". IMO that's not much to give up in the tradeoff, though.
That's pretty much the sketch. In terms of a concrete plan, I think it makes sense to try building a POC so we can kick the tires and evaluate next steps. That POC is probably not an immediate priority, but something we should get to within the next month or so, since it might cause us to re-evaluate our approach on a number of other features.
Beta Was this translation helpful? Give feedback.
All reactions