Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
Fix .gitignore
  • Loading branch information
mjanez committed Sep 12, 2023
1 parent e6905bd commit 205791b
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 10 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ samples
.env
!data/README.md
!data/sample
data/xml
data/*
ogc2ckan/mappings/organizations/*
!ogc2ckan/mappings/organizations/README.md
!ogc2ckan/mappings/organizations/template-org.yaml
Expand Down
64 changes: 55 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
<a href="#overview">Overview</a> •
<a href="#quick-start">Quick start</a> •
<a href="#debug">Debug</a> •
<a href="#additional-info">Additional info</a>
<a href="#containers">Containers</a>
</p>

Expand All @@ -17,18 +18,18 @@
Docker Compose environment for ingesting metadata from different spatial/semantic/general metadata sources into CKAN.

* OGC harvester (WCS/WFS, WMS & WMTS services)
* CSW harvester (ISO 19115/19139 Metadata)
* CKAN API. - WIP
* Tabular data (CSV, TSV)
* CSW harvester (ISO 19115/19139 Metadata Catalogue Services)
* Spreadsheets (XLS/XLSX)
* Metadata files (XML ISO19139) - WIP
* Metadata files (XML ISO19139)
* CKAN API - WIP
* Semantic metadata files (RDF/TTL) - WIP
* Tabular data (CSV, TSV) - WIP

>**Note**<br>
> It can be tested with an open data portal of the CKAN type such as: : [mjanez/ckan-docker](https://github.com/mjanez/ckan-docker)[^1]
## Quick start
First copy the `.env.example` template and configure by changing the `.env` file. Change `PYCSW_URL` and `CKAN_URL`, as well as the published port `PYCSW_PORT`, if needed.
First copy the `.env.example` template and configure by changing the `.env` file. Change `PYCSW_URL` and `CKAN_URL`, as well as the Harvester info `OGC2CKAN INFO`, if needed.

```bash
cp .env.example .env
Expand All @@ -43,8 +44,9 @@ Custom ennvars:
- `DEFAULT_LICENSE`: Default license for the harvested datasets. Open Data default: `http://creativecommons.org/licenses/by/4.0/`
- `DEFAULT_LICENSE_ID`: Default license ID for the harvested datasets, ID list: `{ckan_site_url}/api/3/action/license_list`. Open Data default: `cc-by-4.0`
- `PARALLELIZATION`: [WIP] Parallelization of the harvesters. Default: `False`
- `CKAN_DATASET_SCHEMA`: Dataset schema of the CKAN Endpoint. Default: `geodcatap`
- `CKAN_DATASET_SCHEMA`: Dataset schema of the CKAN Endpoint. Default: `geodcatap_eu`
- `SSL_UNVERIFIED_MODE`: SSL certificate from host will download if `SSL_UNVERIFIED_MODE=True`. Ennvar to avoid SSL error when certificate was self-signed.
- `METADATA_DISTRIBUTIONS`: If need to create a metadata distributions as CKAN resources (GeoDCAT-AP/ISO19139), set `METADATA_DISTRIBUTIONS=True`. Default: `False`

>**Warning**<br>
> `SSL_UNVERIFIED_MODE=True` is not recommended for production environments. Update your certificate or use a valid one. **Check the container log if it fails, and put `True` in the `.env` file.**
Expand Down Expand Up @@ -101,7 +103,12 @@ pdm install --no-self
configure your custom `config.yaml`. Define the harvest servers and the CKAN DCAT default info.

```bash
cp ckan-ogc/conf/config.yaml.template ckan-ogc/conf/config.yaml
cp ckan-ogc/conf/config.yaml.template ./config.yaml
```

Remember to configure your `.env`
```yaml
cp .env.example .env
```

Run:
Expand All @@ -111,9 +118,48 @@ pdm run python ogc2ckan/ogc2ckan.py

## Debug
### VSCode
#### Python debugger with Docker
1. Build and run container.
2. Attach Visual Studio Code to container
3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`).
2. Attach Visual Studio Code to container.
3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`) in the container.

#### Python debugger without Docker
1. Update the previously created `.env` file in the root of the `ckan-ogc` repo and move it to: [`/ogc2ckan`](/ogc2ckan)
2. Open [`ogc2ckan.py`](/ogc2ckan/ogc2ckan.py).
3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`).

>**Note**<br
> By default, the Python extension looks for and loads a file named `.env` in the current workspace folder. More info about Python debugger and [Enviromental variables use](https://code.visualstudio.com/docs/python/environments#_environment-variables).
## Additional info
### CKAN Schemas
The CKAN output schemas are located in the [`ogc2ckan/ckan_datasets`](./ogc2ckan/ckan_datasets) folder. The schemas are used to map the metadata fields from the different sources to the CKAN dataset fields. Now are available the following schemas:
* `geodcatap`: Schema based in [GeoDCAT-AP Schema for CKAN](https://github.com/mjanez/ckanext-scheming_dcat).
* `base`: A DCAT schema with the basic fields.

You can create your own Schema.

### Harvester
The harvester is located in the [`ogc2ckan/harvesters`](./ogc2ckan/harvesters) folder. The harvester is a Python script that harvests the metadata from the different sources and creates the datasets in CKAN.

There are differente harvesters:
* `csw`: Harvests the metadata from a CSW server using OWSLib.
* `table`: Harvests the metadata from a XLS/XLSX file that contains the metadata records in a table format using the CKAN `field_name` of the [custom schemas](./ogc2ckan/mappings/ckan_fields) as the column name.
* `ogc`: Harvests the metadata from a OGC server (WCS/WFS, WMS & WMTS services) using OWSLib.
* `xml`: Harvests the metadata from a XML file that contains the metadata records in a ISO19139 format.

You can create your own Harvester.

### Configuration file (`config.yaml`)
The configuration file is located in the [`ckan-ogc/conf/config.yaml.template`](./config.yaml.template) file. It is a YAML file that contains the configuration of the harvesters and the CKAN DCAT default info.

The configuration file contains the elements that are used to configure the harvesters and the CKAN DCAT default info. The required elements inside the Harvesters are specified at Harvester Schema in the [`ogc2ckan/model/harvest_schema.py`](./ogc2ckan/model/harvest_schema.py) file.

### Custom organizations
A custom organization is a YAML file that contains the custom metadata fields that will be used to create the dataset in CKAN. The custom organization is located in the [`ogc2ckan/mappings/organizations`](./ogc2ckan/mappings/organizations) folder.

If you need create yous custom organization YAML file use the [`template-org.yaml`](./ogc2ckan/mappings/organizations/template-org.yaml) to create your custom file. Specified by the `dataset_id` the custom organization will be used in the harvested datasets (if the organization exists in the CKAN instance) to create the dataset with the custom metadata fields.


## Containers
List of *containers*:
Expand Down

0 comments on commit 205791b

Please sign in to comment.