Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Elasticsearch integration #421

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Draft: Elasticsearch integration #421

wants to merge 16 commits into from

Conversation

matthinz
Copy link

@matthinz matthinz commented Sep 10, 2021

Description of change

This PR includes the beginning of Elasticsearch integration for the application. It includes Elasticsearch running locally (in Docker Compose). Included in the docs is a high-level overview of what Elasticsearch is and how this integration is managed.

These are the steps required to get this implementation ready and get to a place where we can start building indices of Activity Reports and their attachments in production:

  • Finalize mappings describing how Elasticsearch should index data
  • Finalize pipelines describing how Elasticsearch should pre-process data before indexing
  • Bring up Elasticsearch in the TTA Hub cloud.gov environment
  • Fix failing unit tests and fill in any TODO unit tests
  • Get the ES-aware integration tests running in CI (using an actual ES node running in CircleCI)
  • Implement CLI scripts for working with indices (deleting them, rebuilding them, etc.)
  • Implement some sort of monitoring to judge how "out-of-sync" Elasticsearch gets over time. This will help you decide what the additional tooling / monitoring needs are
  • Write an ADR describing why Elasticsearch was chosen

How to test

After you bring up the local development environment, create Activity Reports and upload attachments as usual. I have been using Elasticvue to monitor the kinds of data being written to search indices. You can bring it up in Docker using the following command:

docker run -p 8081:8080 cars10/elasticvue

Then navigate to http://localhost:8081 in your browser.

Issue(s)

Checklists

Every PR

  • Meets issue criteria
  • JIRA ticket status updated
  • Code is meaningfully tested
  • Meets accessibility standards (WCAG 2.1 Levels A, AA)
  • API Documentation updated
  • Boundary diagram updated
  • Logical Data Model updated
  • Architectural Decision Records written for major infrastructure decisions

Production Deploy

  • Staging smoke test completed

After merge/deploy

  • Update JIRA ticket status

matthinz and others added 16 commits September 10, 2021 11:25
- Add a single ES node at http://elasticsearch:9200
- Send ActivityReports to ES
- Beginnings of unit / integration test infra
- Still many things left to do
Add ES inside application boundary
This tells the ES client where to talk to ES.
Not all tests complete, but the basics are there.
- Start laying out mappings for ActivityRecord
- Introduce pipeline for AR's (with HTML stripping)
- Likely break a number of things ahead of demo Tuesday
Once we've pulled text content out of files, remove the base64-encoded version. This will hopefully help keep our indices smaller. The original files are still in S3 if they are needed.
Provide some detail on how ES works and how it is integrated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants