Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate 311-Data System Architecture #44

Closed
5 of 8 tasks
JasonEb opened this issue Aug 18, 2022 · 17 comments
Closed
5 of 8 tasks

Migrate 311-Data System Architecture #44

JasonEb opened this issue Aug 18, 2022 · 17 comments
Assignees
Labels
feature: Roadmap role: Site Reliability Engineer aka Infrastructure Engineer size: 13+pt Must be broken down into smaller issues

Comments

@JasonEb
Copy link
Contributor

JasonEb commented Aug 18, 2022

Overview

The 311 Team has a need to migrate ownership of their infrastructure management from their current servers. This endeavor and approach is highlighted here

https://docs.google.com/document/d/1hgJyxs45aZv1mVOPnbBm783KBiHVjI0zWIX1jOxz-3s/edit?usp=sharing

Action Items

  • Investigate 311 Infrastructure
  • Investigate current AWS account
  • Create roadmap
  • Determine how to structure new AWS (sub)account for 311 Data within H4LA's greater AWS setup (assigned to @JasonEb)
  • Test existing Terraform config using new AWS account. This will bring up the backend infra components only; frontend components are individually and manually configured on AWS.
  • Make any simplifications to the infra, if possible
  • Bring up frontend components
  • Point prod and dev 311 Data sites to new AWS services

Resources/Instructions

311 Data System
311 Repo

@JasonEb JasonEb added feature: missing size: 13+pt Must be broken down into smaller issues role: Site Reliability Engineer aka Infrastructure Engineer labels Aug 21, 2022
@JasonEb
Copy link
Contributor Author

JasonEb commented Sep 2, 2022

9/1/2022

Met with Nicholas and Edwin. Reviewed the technologies involved, documentation, and determined it's potential path to migration.

We need to setup time with Bonnie to discuss it's aws account organization. Should it go to Incubator, and/or setup it's own.

@JasonEb
Copy link
Contributor Author

JasonEb commented Oct 2, 2022

@nichhk May we have the roadmap here?

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Oct 2, 2022

  1. Does 311 really require its own instance, or could it be combined with incubator
    1. If it were combined with Incubator, what new services would have to be added to incubator (those costs can be considered fixed)
    2. What would the changes in the charges be to services that were combined (cost savings
  2. If it must remain its own instance
    1. How do we prevent the ongoing duplication of charges that would come about by two 311 instances being live for more than 30 days.

What specifically do you need bonnie to do right now?

@JasonEb JasonEb changed the title Adopt 311-Data System Architecture Migrate 311-Data System Architecture Oct 13, 2022
@ExperimentsInHonesty
Copy link
Member

We talked about this at the following meetings
Ops 2022-10-12 at 6pm
special meeting #1 2022-10-13 at 7pm
and plan for a second special meeting
special meeting #2 2022-10-17 at 7pm

@joshuayhwu
Copy link

joshuayhwu commented Oct 20, 2022

@JasonEb FYI

Special Meeting #1 2022-10-13 at 7pm:

Meeting Minutes:

Objective:

  • debug 311 infrastructure to prepare for data system migration

Summary:

  • Attempted to explore where the database grab the secret (env file doesn't contain secret)
  • It is likely previous configuration take in secret manually instead of being stored somehwere on cloud / in code
  • Need to continue finding the secret else drop the entire database

Next Steps / potential alternatives:

  • Continue finding secrets
  • Re-create the entire database with same configs as old one
  • Go to terminal and try no password login
  • RDS Configuration tab and try master username
  • Dropping DB is no the best solution but ; copy the database

@nichhk
Copy link
Member

nichhk commented Oct 21, 2022

Thanks for the notes Josh! To expand further:

We were able to find the database username and password in AWS Systems Manager/Parameter Store within the DB_DSN. The username and password are not individually provided; they are fed to Terraform via the DB_DSN.

So with these values, we are now able to ssh into the RDS instance via a new bastion server that we made, since we didn't have the SSH key for the existing bastion server.

We can now follow these steps:

  1. update the master password (we can do this safely since no one has the existing master password)
  2. ssh and connect to the db using psql and the master username and password
  3. create a dump of the dev db using pg_dump
  4. drop the prod db
  5. recreate the prod db (and ensure that the extant user has psql grant on it)
  6. restore the dump using pg_restore to the prod db

One caveat to this process is in step 4: the prod db also happens to be the default db, which makes this process more complicated. Our meeting ended here, but @darpham believes that we should be able to get around that caveat.

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Oct 27, 2022

@nichhk is there a reason to drop the prod db, instead of standing up a new one, and then changing where it links to after the new one is working? And if this viable, and reduces risk, what kind of overlap (days) are we talking about?

also how big of a lift would what I am asking about require over the path you have outlined in your previous comment.

@nichhk
Copy link
Member

nichhk commented Oct 27, 2022

@ExperimentsInHonesty, I believe there is very little risk in dropping the prod db because it's completely unusable right now. I think the plan that you outlined will be a little more work (not much though). I could see it being riskier, though, because it would require us to reload our server with the new prod db info twice--once to test the new prod db, and then again after we rename the prod db to a permanent name.

With the original plan, I don't think we'd need to reload the server at all, since we can just drop the prod db and create it with the same name and credentials as before.

Edit: Your plan might end up being exactly what Darren suggests to get around the caveat that the prod db is the default db for our RDS instance, so we'll have to connect with him to see what he suggests.

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Nov 13, 2022

  • @JasonEb will upload the video and add the link above under resources

@joshuayhwu

  • Please provide a summary of what happened during the meeting
  • What next steps are.
  • Assign this issue to yourself

FYI @nichhk

@joshuayhwu joshuayhwu self-assigned this Nov 17, 2022
@joshuayhwu
Copy link

Summary:

  • We followed the steps laid out in Nich's comment above:
  1. update the master password (we can do this safely since no one has the existing master password)
  2. ssh and connect to the db using psql and the master username and password
  3. create a dump of the dev db using pg_dump
  4. drop the prod db
  5. recreate the prod db (and ensure that the extant user has psql grant on it)
  6. restore the dump using pg_restore to the prod db
  • We tested that the new db work by calling the API directly.

Next Steps:

  1. Re-deploy frontend on 311 data via Github Action

@JasonEb
Copy link
Contributor Author

JasonEb commented Jan 29, 2023

@joshuayhwu whenever good, let's revisit this issue and explore the next step for this migration

@joshuayhwu
Copy link

@JasonEb apologies for the delay - little overwhelemed with work lately. The v1 311 site is currently unavailable as per stakeholder request so there is an opportunity for us to migrate to incubator simply by creating a simpler infrastructure to support the v2 website. We are discussing with internally whether the migration is still worthwhile. Will keep you updated.

@chelseybeck
Copy link
Member

chelseybeck commented Feb 23, 2023

Migration Roadmap (work in-progress)

Requirements

  • create AWS account and profile
  • IAM Role (ecsTaskExucution) already created with SSMReadOnlyAccess AND ECSTaskExecutionRolePolicy applied
  • Create SSL/TLS certificate
  • create AWS user in incubator - current username is ec2-user (recommend changing to service account)
    • add secrets host and key to Repository -> Settings -> Secrets & Variables -> Actions
    • Note: if there are different environments (and it looks like that is how it is currently set up), we will need a separate user/sa per environment (Dev/Prod)
  • add S3 dev bucket to Terraform modules
  • add S3 prod bucket to Terraform modules
    See Resource aws_s3_bucket
  • run the terraform to build out the resources

AWS Secrets

These are the secrets will will need to update in repo settings:

Client

AWS_CI_ACCESS_KEY_ID
AWS_CI_SECRET_ACCESS_KEY
S3_BUCKET_DEV
S3_BUCKET_PROD

Server

AWS_SSH_HOST_PROD
AWS_SSH_PEM_KEY
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Questions

  • Should ops team move the terraform modules to incubator repo? This will require a few extra steps in the migration but is worth discussing.

@chelseybeck
Copy link
Member

Update: I've updated the Terraform versions inside of the modules to bring everything up to date.

Blocker: it looks like some of the Terraform modules rely on an an Amazon Machine Image (AMI) - but that isn't built inside the codebase. Need someone w/ exp. to build this through the UI as it requires an image pipeline, recipe, etc.

@darpham
Copy link

darpham commented Mar 31, 2023

Edit: Your plan might end up being exactly what Darren suggests to get around the caveat that the prod db is the default db for our RDS instance, so we'll have to connect with him to see what he suggests.

I can help with clearing up any questions, we can zoom if anyone is interested; I'm usually free after 5pm PT.

@chelseybeck
Copy link
Member

Closing this issue as we decided to move away from AWS for a more cost friendly tech stack

@ExperimentsInHonesty
Copy link
Member

This issue is being moved to the new issue approval column so that a CoP lead can summarize all the notes that are necessary in order for a new person to take on this issue, and add that to the top and hiding all the comments. The goal here is to make the issue clear for a new person, while taking advantage of all the work that went into it so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: Roadmap role: Site Reliability Engineer aka Infrastructure Engineer size: 13+pt Must be broken down into smaller issues
Projects
Development

No branches or pull requests

6 participants