Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify that demo database is in sync for production database. #96

Closed
dblodgett-usgs opened this issue Jan 4, 2023 · 12 comments
Closed
Assignees

Comments

@dblodgett-usgs
Copy link
Member

The demo database seems to be out of sync with production. We need to verify that the demo database is in sync and operates the same.

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 4, 2023

Received. Not able to assign myself... should I be a member of the org?

@dblodgett-usgs
Copy link
Member Author

You are a member of the repo now.

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 4, 2023

Characterizing the issue:

The existing crawler inserts with geometry in the shape column; the demo database has both shape and location geometry columns. And the production database, if it was created with the tables.sql script, has only a location column.

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 4, 2023

This has implications for the porting of the crawler to python. I'm hoping to replicate the function I see in the java port (where shape is the geometry), and test it with the demo DB. Moving to production -- an open question as to whether a successful demo test will work in production.

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 25, 2023

OK... so I'm jammed up here on how the docker images are used to create the demo database. Looks like the database is just restored from a snapshot taken at some point with the feature table populated. That's the part to re-do.

So -- if this repo will be transitioning to another technology (i.e. python + sqlalchemy + alembic) to automate database creation, we can just keep track of this detail for that work. But if we're sticking with LiquiBase et al, then I will need to figure out how to push this change into the existing framework.

@dblodgett-usgs
Copy link
Member Author

This is probably a topic to bring up with others working on USGS databases. I think LiquiBase is probably what we are sticking with.

Can you just fixup and reup the snapshot for now and we take up automating the demo database at a later time?

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 26, 2023

Can you just fixup and reup the snapshot for now and we take up automating the demo database at a later time?

Short answer is yes... can create a snapshot to use in place of the current. The Dockerfile specifies location of pgdump snapshots to use... example:

https://github.com/internetofwater/nldi-db/releases/download/artifacts-1.0.0/nldi_data.crawler_source.pgdump.gz

We can either create a new release (1.0.1, say) or overwrite the existing release. I am not finding any documentation in the repo from Ethan as to the workflow for these artifacts. Will need to do some digging -- so while I think the final answer is going to be easy, it likely won't be fast.

@gzt5142
Copy link
Collaborator

gzt5142 commented Jan 26, 2023

Creating the dump is easy enough:

pg_dump  \
    --dbname=nldi \
    --host=172.18.0.1 \
    --port=5432 \
    --username=nldi_schema_owner \
    --table=nldi_data.crawler_source \
    --format=custom \
    --file /tmp/nldi_data.crawler_source.pgdump

... assuming that the demo database is spruced up the way we want it. And then creating release and artifacts that don't break anything else.

@dblodgett-usgs
Copy link
Member Author

@gzt5142 -- I think I am going to take care of this in #100

My plan is to dump the tables that need to be loaded for the demo database and load them using the same mechanism as you could for the production database. As part of that, I am thinking I'll create one of the dump files that contains several tables so on in demo we can load a number of demo feature sources and on prod we can load a different set.

Not quite sure how that's going to work yet, but it seems like the right concept. Any thoughts?

@gzt5142
Copy link
Collaborator

gzt5142 commented Apr 19, 2023

Agree concpetually. . . a pg_dump (or similar) of a developer's minimum load would be useful, and the fully populated version of same tables in a more production-like system, or production itself. I put some comments in another issue that I now see that I should have put here. So in summary:

  • Dev needs the full crawler_source table
  • Needs the full nhdplus dataset
  • Full feature table for one of the smallest sources.
  • A minimal feature table for the other sources

The NHD pieces are needed to ensure that ingested features will be spatially matched to a comid from NHD. With a subset, we risk not getting a match for newly ingested features.

@dblodgett-usgs
Copy link
Member Author

Right -- so the dump that contains several tables could accommodate the variation in requirements you have in your dot points. I'll at least get this roughed in as a pattern so we can iterate on the details.

@dblodgett-usgs dblodgett-usgs self-assigned this May 8, 2023
@dblodgett-usgs
Copy link
Member Author

Should be good with #112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants