Update README

Fix .gitignore
mjanez · Sep 12, 2023 · 205791b · 205791b
1 parent e6905bd
commit 205791b
Show file tree

Hide file tree

Showing 2 changed files with 56 additions and 10 deletions.
diff --git a/.gitignore b/.gitignore
@@ -12,7 +12,7 @@ samples
 .env
 !data/README.md
 !data/sample
-data/xml
+data/*
 ogc2ckan/mappings/organizations/*
 !ogc2ckan/mappings/organizations/README.md
 !ogc2ckan/mappings/organizations/template-org.yaml

diff --git a/README.md b/README.md
@@ -7,6 +7,7 @@
     <a href="#overview">Overview</a> •
     <a href="#quick-start">Quick start</a> •
     <a href="#debug">Debug</a> •
+    <a href="#additional-info">Additional info</a>
     <a href="#containers">Containers</a>
 </p>
 
@@ -17,18 +18,18 @@
 Docker Compose environment for ingesting metadata from different spatial/semantic/general metadata sources into CKAN.
 
 * OGC harvester (WCS/WFS, WMS & WMTS services)
-* CSW harvester (ISO 19115/19139 Metadata)
-* CKAN API. - WIP
-* Tabular data (CSV, TSV)
+* CSW harvester (ISO 19115/19139 Metadata Catalogue Services)
 * Spreadsheets (XLS/XLSX)
-* Metadata files (XML ISO19139) - WIP
+* Metadata files (XML ISO19139)
+* CKAN API - WIP
 * Semantic metadata files (RDF/TTL) - WIP
+* Tabular data (CSV, TSV) - WIP
 
 >**Note**<br>
 > It can be tested with an open data portal of the CKAN type such as: : [mjanez/ckan-docker](https://github.com/mjanez/ckan-docker)[^1]
 
 ## Quick start
-First copy the `.env.example` template and configure by changing the `.env` file. Change `PYCSW_URL` and `CKAN_URL`,  as well as the published port `PYCSW_PORT`, if needed.
+First copy the `.env.example` template and configure by changing the `.env` file. Change `PYCSW_URL` and `CKAN_URL`,  as well as the Harvester info  `OGC2CKAN INFO`, if needed.
 
 ```bash
 cp .env.example .env
@@ -43,8 +44,9 @@ Custom ennvars:
 - `DEFAULT_LICENSE`: Default license for the harvested datasets. Open Data default: `http://creativecommons.org/licenses/by/4.0/`
 - `DEFAULT_LICENSE_ID`: Default license ID for the harvested datasets, ID list: `{ckan_site_url}/api/3/action/license_list`. Open Data default: `cc-by-4.0`
 - `PARALLELIZATION`: [WIP] Parallelization of the harvesters. Default: `False`
-- `CKAN_DATASET_SCHEMA`: Dataset schema of the CKAN Endpoint. Default: `geodcatap`
+- `CKAN_DATASET_SCHEMA`: Dataset schema of the CKAN Endpoint. Default: `geodcatap_eu`
 - `SSL_UNVERIFIED_MODE`: SSL certificate from host will download if `SSL_UNVERIFIED_MODE=True`. Ennvar to avoid SSL error when certificate was self-signed.
+- `METADATA_DISTRIBUTIONS`: If need to create a metadata distributions as CKAN resources (GeoDCAT-AP/ISO19139), set `METADATA_DISTRIBUTIONS=True`. Default: `False`
 
     >**Warning**<br>
     > `SSL_UNVERIFIED_MODE=True` is not recommended for production environments. Update your certificate or use a valid one. **Check the container log if it fails, and put `True` in the `.env` file.**
@@ -101,7 +103,12 @@ pdm install --no-self
 configure your custom `config.yaml`. Define the harvest servers and the CKAN DCAT default info.
 
 ```bash
-cp ckan-ogc/conf/config.yaml.template ckan-ogc/conf/config.yaml
+cp ckan-ogc/conf/config.yaml.template ./config.yaml
+```
+
+Remember to configure your `.env`
+```yaml
+cp .env.example .env
 ```
 
 Run:
@@ -111,9 +118,48 @@ pdm run python ogc2ckan/ogc2ckan.py
 
 ## Debug
 ### VSCode
+#### Python debugger with Docker
 1. Build and run container.
-2. Attach Visual Studio Code to container
-3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`).
+2. Attach Visual Studio Code to container.
+3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`) in the container.
+
+#### Python debugger without Docker
+1. Update the previously created `.env` file in the root of the `ckan-ogc` repo and move it to: [`/ogc2ckan`](/ogc2ckan)
+2. Open [`ogc2ckan.py`](/ogc2ckan/ogc2ckan.py).
+3. Start debugging on `ogc2ckan.py` Python file (`Debug the currently active Python file`). 
+
+>**Note**<br
+> By default, the Python extension looks for and loads a file named `.env` in the current workspace folder. More info about Python debugger and [Enviromental variables use](https://code.visualstudio.com/docs/python/environments#_environment-variables).
+
+## Additional info
+### CKAN Schemas
+The CKAN output schemas are located in the [`ogc2ckan/ckan_datasets`](./ogc2ckan/ckan_datasets) folder. The schemas are used to map the metadata fields from the different sources to the CKAN dataset fields. Now are available the following schemas:
+   * `geodcatap`: Schema based in [GeoDCAT-AP Schema for CKAN](https://github.com/mjanez/ckanext-scheming_dcat).
+   * `base`: A DCAT schema with the basic fields. 
+
+You can create your own Schema.
+
+### Harvester
+The harvester is located in the [`ogc2ckan/harvesters`](./ogc2ckan/harvesters) folder. The harvester is a Python script that harvests the metadata from the different sources and creates the datasets in CKAN.
+
+There are differente harvesters:
+   * `csw`: Harvests the metadata from a CSW server using OWSLib.
+   * `table`: Harvests the metadata from a XLS/XLSX file that contains the metadata records in a table format using the CKAN `field_name` of the [custom schemas](./ogc2ckan/mappings/ckan_fields) as the column name.
+   * `ogc`: Harvests the metadata from a OGC server (WCS/WFS, WMS & WMTS services) using OWSLib.
+   * `xml`: Harvests the metadata from a XML file that contains the metadata records in a ISO19139 format.
+
+You can create your own Harvester.
+
+### Configuration file (`config.yaml`)
+The configuration file is located in the [`ckan-ogc/conf/config.yaml.template`](./config.yaml.template) file. It is a YAML file that contains the configuration of the harvesters and the CKAN DCAT default info.
+
+The configuration file contains the elements that are used to configure the harvesters and the CKAN DCAT default info. The required elements inside the Harvesters are specified at Harvester Schema in the [`ogc2ckan/model/harvest_schema.py`](./ogc2ckan/model/harvest_schema.py) file.
+
+### Custom organizations
+A custom organization is a YAML file that contains the custom metadata fields that will be used to create the dataset in CKAN. The custom organization is located in the [`ogc2ckan/mappings/organizations`](./ogc2ckan/mappings/organizations) folder.
+
+If you need create yous custom organization YAML file use the [`template-org.yaml`](./ogc2ckan/mappings/organizations/template-org.yaml) to create your custom file. Specified by the `dataset_id` the custom organization will be used in the harvested datasets (if the organization exists in the CKAN instance) to create the dataset with the custom metadata fields.
+
 
 ## Containers
 List of *containers*: