Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deploying Arroyo to a nomad cluster #50

Merged
merged 6 commits into from
Apr 18, 2023
Merged

Support deploying Arroyo to a nomad cluster #50

merged 6 commits into from
Apr 18, 2023

Conversation

mwylde
Copy link
Member

@mwylde mwylde commented Apr 18, 2023

This PR adds a number of improvements that allow an arroyo cluster to be deployed to Nomad. There is a complementary docs PR (ArroyoSystems/arroyo-docs#7) and a new repo (https://github.com/ArroyoSystems/arroyo-nomad-pack) that contains the Nomad orchestration code.

The changes in this PR include:

  • Added the ability to inject the API endpoint into the HTML for the frontend, which is needed to host the API on arbitrary hosts/ports as occurs in a cluster scheduler like Nomad
  • The controller now checks for a proper postgres connection on startup, and fails if it cannot connect (addresses Controller continues in degraded state when psql connection fails #22)
  • The database port is now configurable
  • Simplifies S3 configuration; the env vars S3_REGION and S3_BUCKET now control all usages of S3
  • Richer configuration of ports for services; all services support ADMIN_PORT and GRPC_PORT configurations, and this can be overridden for each services with {SERVICE}_{ADMIN, GRPC}_PORT, for example API_ADMIN_PORT
  • Added Dockerfiles for arroyo-services (which runs arroyo-api and arroyo-controller) and arroyo-compiler
  • arroyo-compiler now has a batch mode that can be used to compile jobs without requiring a persistent service
  • Added just commands for building docker images

There is also a breaking change. In order to more seamlessly support using a single bucket for artifacts and checkpoints, checkpoint are now stored in {BUCKET}/{job_id}/checkpoints. This will break existing jobs that expect checkpoints to be in the former location.

@mwylde mwylde requested a review from jacksonrnewhouse April 18, 2023 00:57
Copy link
Contributor

@jacksonrnewhouse jacksonrnewhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

let query = CompileQueryReq::decode(&*query).expect("Failed to decode query request");

let resp = service.compile(query).await.unwrap();
println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an info!()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's writing json to std out so that it can be consumed by whatever is calling it. If it were an info!, that would send it to the logging system instead.

@@ -447,6 +448,18 @@ impl ControllerServer {
.create_pool(Some(deadpool_postgres::Runtime::Tokio1), NoTls)
.unwrap();

// test that the DB connection is valid
let _ = pool.get().await.unwrap_or_else(|e| {
panic!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it standard to panic in these cases, rather than either returning an error or exiting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicing will produce an error and exit. Eventually we'll probably want clearer error handling for users, but while most of our users are going to be developers working on the system I think it's more helpful to have the context that a panic gives you (like the line).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants