Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2023-05-12 #4302

Closed
10 tasks
nickumia-reisys opened this issue May 8, 2023 · 4 comments
Closed
10 tasks

O+M 2023-05-12 #4302

nickumia-reisys opened this issue May 8, 2023 · 4 comments
Assignees
Labels
Explore O&M Operations and maintenance tasks for the Data.gov platform

Comments

@nickumia-reisys
Copy link
Contributor

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Miscs

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Check Production State/Actions

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

@nickumia-reisys
Copy link
Contributor Author

nickumia-reisys commented May 9, 2023

Day 1

  • Catalog solr follower 0 restarted 5/8 @ 3:11a (normal)
  • Catalog solr follower 1 restarted 5/8 @ 4:53a (normal)
  • DMARC Report (outlook): 42% compliant (187 total emails)
    • Email domains in question:
      • epa.gov
      • vsmtpx-e107-02.localdomain (although this does pass...)
    • Failures that seem to be false positives:
      • dkim: fail spf: softfail ses-513xxxx.ssb.data.gov ['ses-513xxxx.ssb.data.gov', 'amazonses.com'] mail.ses-513xxxx.ssb.data.gov
  • DMARC Report (google): 100% compliant (410 total emails)
  • Worked through open PRs.
  • Worked through Snyk triaging.
  • Identified issue with inventory build failing.

@nickumia-reisys
Copy link
Contributor Author

nickumia-reisys commented May 10, 2023

Day 2 + 3

  • Fixed Catalog Solr Follower 2 (pairing with @FuhuXia and @btylerburton)

  • Fixed Inventory Build issue

  • Raised issue about catalog tracking update user visits
    image

  • Raised issue about increasing catalog db-solr-sync updates
    image

  • Added @rshewitt as a user for catalog, inventory, sysadmins-users

  • Github Actions has been down since 9:40a on 5/10.. Still down.. will continue to monitor status, https://www.githubstatus.com/
    image

@nickumia-reisys
Copy link
Contributor Author

nickumia-reisys commented May 12, 2023

Day 4 + 5

  • Github showed weakness until end of day 5/11. On 5/12, Github Status showed all green.
    image
  • Backfilled Tracking Update and revealed that the increased traffic started on 5/4 and has leveled at around 230k per day since 5/9
    image
  • 5/12: Catalog Solr Follower 0 restarted @8:11a
  • 5/12: Catalog Solr Follower 1 restarted @8:12a
  • Finally got all of Inventory Open PRs through
  • Harvesting Status
    • Known broken harvest sources still broken.
    • State JSON harvest still broken.
    • FCC Data.json failed 5/12
      • ConnectionError getting json source: HTTPSConnectionPool(host='[opendata.fcc.gov](http://opendata.fcc.gov/)', port=443): Max retries exceeded with url: /data.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f30b6ec98b0>: Failed to establish a new connection: [Errno -2] Name or service not known')).
    • OEI Non-Geo Records
      • JSONDecodeError loading json. Invalid control character '\n' at: line 52 column 997 (char 3150)
  • Investigated Tracking Update anamolies... No clear explanation of increased tracking update number...
    • Cloudfront requests
      image
    • NewRelic requests (only _tracking route -- catalog-proxy)
      image
    • NewRelic requests (all traffic -- catalog-proxy)
      image

@nickumia-reisys
Copy link
Contributor Author

Didn't get much official O&M done last week, but I'll catch up on things this week with the continuing O&M role

@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board May 15, 2023
@nickumia-reisys nickumia-reisys mentioned this issue May 15, 2023
10 tasks
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board May 25, 2023
@nickumia-reisys nickumia-reisys added O&M Operations and maintenance tasks for the Data.gov platform Explore labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Explore O&M Operations and maintenance tasks for the Data.gov platform
Projects
Archived in project
Development

No branches or pull requests

1 participant