Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg0.7.0 #160

Merged
merged 16 commits into from
Feb 21, 2023
Merged

Reorg0.7.0 #160

merged 16 commits into from
Feb 21, 2023

Conversation

bitner
Copy link
Collaborator

@bitner bitner commented Feb 6, 2023

  • Reorganize code base to create clearer separation between pgstac sql code and pypgstac.
  • Move Python tooling to use hatch with all python project configuration in pyproject.toml
  • Rework testing framework to not rely on pypgstac or migrations. This allows to run tests on any code updates without creating a version first. If a new version has been staged, the tests will still run through all incremental migrations to make sure they pass as well.
  • Add pre-commit to run formatting as well as the tests appropriate for which files have changed.
  • Add a query queue to allow for deferred processing of steps that do not change the ability to get results, but enhance performance. The query queue allows to use pg_cron or similar to run tasks that are placed in the queue.
  • Modify triggers to allow the use of the query queue for building indexes, adding constraints that are used solely for constraint exclusion, and updating partition and collection spatial and temporal extents. The use of the queue is controlled by the new configuration parameter "use_queue" which can be set as the pgstac.use_queue GUC or by setting in the pgstac_settings table.
  • Reorganize how partitions are created and updated to maintain more metadata about partition extents and better tie the constraints to the actual temporal extent of a partition.
  • Add "partitions" view that shows stats about number of records, the partition range, constraint ranges, actual date range and spatial extent of each partition.
  • Add ability to automatically update the extent object on a collection using the partition metadata via triggers. This is controlled by the new configuration parameter "update_collection_extent" which can be set as the pgstac.update_collection_extent GUC or by setting in the pgstac_settings table. This can be combined with "use_queue" to defer the processing.
  • Add many new tests.
  • Migrations now make sure that all objects in the pgstac schema are owned by the pgstac_admin role. Functions marked as "SECURITY DEFINER" have been moved to the lower level functions responsible for creating/altering partitions and adding records to the search/search_wheres tables. This should open the door for approaches to using Row Level Security.
  • Set search_path and application_name after connection on pypgstac rather than as a kwarg parameter for compatibility with RDS (Fixes Make connection command-line parameters optional #156).
  • Allow pypgstac loader to load data on pgstac databases that have the same major version even if minor version differs. Fixes Allow use of pypgstac loader within same major version. #162. Cherry picked from Allow use of pypgstac loader within same minor version #164 (Thanks @drnextgis).

@bitner bitner mentioned this pull request Feb 6, 2023
@bitner bitner marked this pull request as ready for review February 7, 2023 14:55
@@ -86,7 +87,7 @@ def get_pool(self) -> ConnectionPool:
num_workers=settings.db_num_workers,
kwargs={
"options": "-c search_path=pgstac,public"
" -c application_name=pypgstac"
" -c application_name=pypgstac",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitner could you please make PgstacDB more robust by not using command-line options? As @captaincoordinates pointed out PgstacDB cannot work properly with Amazon RDS Proxy in its current state. The possible solution would be to remove that, and then once the connection is established execute:

self.connection.execute(
    "set search_path = pgstac,public; set application_name=pypgstac"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drnextgis @captaincoordinates, I've made a change for this in the latest commit. Can you verify that this works with RDS Proxy?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitner thanks for adding this change, unfortunately it will be a little while before I can test this on RDS Proxy - I'm currently using pypgstac via stac-fastapi and it doesn't support 0.7.0. I'll have to create a new project and some new AWS infrastructure for it to use before I can verify.

@bitner bitner requested a review from drnextgis February 20, 2023 17:47
@vincentsarago
Copy link
Member

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow use of pypgstac loader within same major version. Make connection command-line parameters optional
5 participants