Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Melissa/log 783 production guide #426

Merged
merged 4 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/data/sidebar.ts
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ export const sidebarContent: ISidebarContent = [
makePage("Public API", "reference"),
makePage("Templates", "reference"),
makePage("Pricing", "reference"),
makePage("Production Readiness Checklist", "reference"),
],
},
{
Expand Down
159 changes: 159 additions & 0 deletions src/docs/reference/production-readiness-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
title: Production Readiness Checklist
---

*Is your application ready for production?*

In this page, we'll explore key areas for production readiness, suggesting actions to take to address each one:

- [Performance and Reliability](#performance-and-reliability)
- [Observability and Monitoring](#observability-and-monitoring)
- [Quality Assurance](#quality-assurance)
- [Security](#security)
- [Disaster Recovery](#disaster-recovery)

---

## Performance and Reliability

Ensuring your application is performant and reliable under changing conditions like load and external latency is critical for production-readiness. Consider taking the following actions to ensure your application is performant and reliable -

**✓ Serve your application from the right region**

- Deploying your application as close to your users as possible minimizes the number of network hops, reducing latency and improving performance.

Railway offers multiple [deployment regions](/reference/deployment-regions) around the globe.

You may also consider implementing a CDN to cache server responses on an edge network.

**✓ Use private networking between services**

- When communicating between services over the public network, latency is introduced by the network hops that requests must make to reach their destination.

To reduce latency, ensure communication between services in the same project and environment happens over the [private network](/reference/private-networking).

**✓ Configure a restart policy**

- Services can crash for different reasons, frequently due to unhandled exceptions in code, and it is important to implement a strategy to mitigate performance degredation and user impact.

Ensure that you have properly configured your services [restart policy](/guides/healthchecks-and-restarts#restart-policy).

**✓ Configure at least 2 replicas**

- If a service crashes or becomes unavailable due to a long-running request, your application could experience downtime or degraded performance.

Increase the [number of replicas](/guides/optimize-performance#configure-horizontal-scaling) to at least 2, so if one instance of your service crashes or becomes unavailable, there is another to continue handling requests.

**✓ Confirm your compute capacity**

- The vCPU and memory capacity of your services greatly impacts their ability to perform efficiently.

The compute allocation for your services is handled automatically by Railway, and the limits are determined by your chosen subscription [plan](/reference/pricing#plans). You should review your plan limits and consider if upgrading is necessary to achieve the desired compute.

---

## Observability and Monitoring

Observability and monitoring refers to tracking the health and performance of your application. Consider taking the following actions to ensure you are able to track your application health -

**✓ Get familiar with the log explorer**

- When researching an application issue across multiple services, it can be disruptive and time-consuming to move between log views for each service individually.

Familiarize yourself with the [Log Explorer](/guides/logs#log-explorer) so you can query logs across all of your services in one place.

**✓ Setup webhooks and email notifications**

- Ensure you are alerted if the [deployment status](/reference/deployments#deployment-states) of your services change.

Enable email notifications in you Account Settings to receive these alerts via email.

Setup [webhooks](/reference/deployments#deployment-states) to have the alerts sent to another system, like Slack or Discord.

*What's next for observability features in Railway? We have a ton of ideas, but we would love to hear yours in our <a href="https://community.railway.app/feature-request/better-logging-support-1e6f5676" target="_blank">community forums</a>.*

---

## Quality Assurance

Quality assurance involves following practices to ensure changes to your application code meet quality standards before they are deployed to production. Consider the following actions to ensure you're set up for success -

**&check; Implement check suites**

- Common practice is to run a suite of tests, scans, or other automated jobs against your code before it is merged into production. You may want to configure your deployments to wait until those jobs have completed successfully before triggering a build.

Enable [check suites](/guides/github-autodeploys#check-suites) to have Railway wait for your Github workflows to complete successfuly before triggering a deployment.

**&check; Use environments**

- Maintaining separate environments for production and development is good practice for controlling changes in a production environment.

Consider setting up [environments](/guides/environments) to properly test changes before merging to production.

Additionally, [PR environments](/guides/environments#enable-pr-environments) can be enabled to create environments when PRs are opened on your production branch.

**&check; Use config as code**

- Along with your source code, you can maintain your Railway configuration in a `json` or `toml` file, enabling you to keep track of changes, just as you do with your source code.

Take advantage of [config as code](/guides/config-as-code) to control and track changes to your Railway configuration.

**&check; Understand the deployment rollback feature**

- Introducing breaking changes to your application code is sometimes unavoidable, and it can be a headache reverting to previous commits.

Be sure to check out the [deployment rollback feature](/guides/deployment-actions#rollback), in case you need to rollback to a previous deployment.

---

## Security

Protecting your application and user data from malicious threats and vulnerabilities is mission-critical in production applications. Consider the following for peace of mind -

**&check; Use private networking**

- The easiest way to protect your services from malicious threats, is to keep them unexposed to the public network.

Secure communication between services in the same project and environment by using the [private network](/reference/private-networking).

**&check; Add a WAF service**

- While Railway does have protections in place at the platform level, we do not currently offer a configurable firewall for users' services.

Consider using a service like Cloudflare to protect your application against attacks.

*In the future, we would love to offer a native WAF solution. If you agree, <a href="https://community.railway.app/feature-request/implement-a-waf-firewall-security-54fe2aaf" target="_blank">let us know</a>.*

---

## Disaster Recovery

Being prepared for major and unexpected issues helps minimize downtime and data loss. Consider taking the following actions to ensure you are prepared -

**&check; Set up an instance of your application in two regions**

- In the event of a major disaster, an entire region may become unavailable.

Using [deployment regions](/reference/deployment-regions), you can deploy an entire instance of your application in another region.

To save on cost of running a separate instance of your application, use [App Sleep](/reference/app-sleeping) to turn down resource usage on the inactive services.

**&check; Regularly back up your data**

- Data is critical to preserve in many applications. You should ensure you have a backup strategy in place for your data.

Implement a [cron service](/guides/cron-jobs) to dump and store your data backups.

If you use Postgres, check out one of our popular templates - <a href="https://railway.app/template/I4zGrH" target="_blank">PostgreSQL S3 Backups</a>.

*We are exploring ways to implement a native solution for backing up your data. If you have any thoughts, we would love to hear from you in our <a href="https://community.railway.app/feature-request/native-database-backups-for-popular-data-8ec06824" target="_blank">community forums</a>.*

---

## Conclusion

Using a mix of native features and external tools, we hope you can feel confident that your applications on Railway meet the highest standards of performance, reliability, and security.

Remember, our team is always here to assist you with solutions. Reach out in <a href="https://discord.com/channels/713503345364697088/1006629907067064482" target="_blank">Discord</a> or over email at [team@railway.app](mailto:team@railway.app) for assistance.

Finally, as suggested on several sections above, we are working tirelessly to give you the best experience imaginable on Railway. If you have requests or suggestions, please <a href="https://community.railway.app" target="_blank">let us know</a>!
4 changes: 2 additions & 2 deletions src/docs/reference/regions.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,6 @@ Note that this migration can take a while depending on the size of the volume, a

The same is true if you attach a detached volume to a service in a different region. It will need to be migrated to the new region, which can take a while and cause downtime.

### Caveats
## Support

You can't deploy database services to different regions. We recommend deploying Official Database Templates instead.
For information on how to deploy your services to different regions, refer to [this guide](/guides/optimize-performance#configure-a-region).
Loading