Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: setup ETL proces to import GeoPackage into search index #3

Merged
merged 30 commits into from
Nov 12, 2024

Conversation

rkettelerij
Copy link
Collaborator

@rkettelerij rkettelerij commented Oct 24, 2024

Description

Setup ETL proces to import a file (currently only gpkg is supported as source file type) into a search index (currently only postgresql is support as target search index). But the ETL proces is agnostic to a specific source/target.

Optimized for performance by loading records in batches and using COPY to insert the records in Postgres.

How to use:

NAME:
   gomagpie import-file - Import file into search index

USAGE:
   gomagpie import-file [command options]

CATEGORY:
   etl

OPTIONS:
   --db-host value        (default: "localhost") [$DB_HOST]
   --db-port value        (default: 5432) [$DB_PORT]
   --db-name value        Connect to this database [$DB_NAME]
   --db-username value    (default: "postgres") [$DB_USERNAME]
   --db-password value    (default: "postgres") [$DB_PASSWORD]
   --db-ssl-mode value    (default: "disable") [$DB_SSL_MODE]
   --config-file value    reference to YAML configuration file [$CONFIG_FILE]
   --search-index value   Name of search index in which to import the given file (default: "search_index") [$SEARCH_INDEX]
   --file value           Path to (e.g GeoPackage) file to import [$FILE]
   --fid value            Name of feature ID field in file (default: "fid") [$FID]
   --geom value           Name of geometry field in file (default: "geom") [$GEOM]
   --feature-table value  Name of the table in given file to import [$FEATURE_TABLE]
   --page-size value      Page/batch size to use when extracting records from file (default: 10000) [$PAGE_SIZE]
   --help, -h             show help

Example config file:

---
version: 1.0.0
lastUpdated: "2024-10-22T12:00:00Z"
baseUrl: http://localhost:8080
availableLanguages:
  - nl
collections:
  - id: addresses
    metadata:
      title: Addresses
    search:
      fields:
        - foo
        - bar
      displayNameTemplate: "{{ .bar }} - {{ .foo }}"
      suggestTemplates:
        - "{{ .foo }} {{ .bar }}"
        - "{{ .bar }}, {{ .bar }} {{ .foo }}"
  - id: buildings

Type of change

  • New feature

Checklist:

  • I've double-checked the code in this PR myself
  • I've left the code better than before (boy scout rule)
  • The code is readable, comments are added that explain hard or non-obvious parts.
  • I've expanded/improved the (unit) tests, when applicable
  • I've run (unit) tests that prove my solution works
  • There's no sensitive information like credentials in my PR

# Conflicts:
#	README.md
#	cmd/main.go
… + suggestions. Using text/template packege here.
@rkettelerij rkettelerij changed the title feat: import-gpkg feat: setup ETL proces to import GeoPackage into search index Nov 11, 2024
@rkettelerij rkettelerij marked this pull request as ready for review November 11, 2024 07:54
config/collections.go Outdated Show resolved Hide resolved
internal/etl/extract/geopackage.go Show resolved Hide resolved
internal/etl/transform/transform.go Show resolved Hide resolved
internal/etl/load/postgres.go Show resolved Hide resolved
@rkettelerij rkettelerij merged commit 59248e7 into master Nov 12, 2024
3 checks passed
@rkettelerij rkettelerij deleted the import-gpkg branch November 12, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants