Skip to content

Commit

Permalink
Update opendataharvest.md
Browse files Browse the repository at this point in the history
  • Loading branch information
srappel authored Jun 3, 2024
1 parent 130ed8a commit 9a67773
Showing 1 changed file with 70 additions and 48 deletions.
118 changes: 70 additions & 48 deletions docs/utils/opendataharvest.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,59 +5,28 @@ nav_order: 3
parent: GeoDiscovery Utilities
---

# OpenDataHarvest
# OpenDataHarvest Tool

[GitHub Repo](https://github.com/UWM-Libraries/GeoDiscovery-Utils/tree/main/opendataharvest)

## Basic crosswalk mapping:
### Overview
The OpenDataHarvest tool is a component of the GeoDiscovery-Utils repository. This tool is designed to facilitate the harvesting and processing of open geospatial data for integration into the GeoDiscovery portal, a platform aimed at providing access to a wide range of geospatial datasets.

Title:
DCAT: title
OGM Aardvark: dct_title_s
### Features
- **Automated Data Harvesting**: OpenDataHarvest automates the process of collecting geospatial data from various open data sources. This ensures that the GeoDiscovery portal is continually updated with the latest datasets available.
- **Data Transformation**: The tool includes functionalities to transform the harvested data into formats that are compatible with the GeoDiscovery portal, ensuring seamless integration.
- **Metadata Handling**: OpenDataHarvest handles metadata extraction and processing, ensuring that all datasets are accompanied by comprehensive metadata for better discoverability and usability.

Description:
DCAT: description
OGM Aardvark: dct_description_sm
### Usage
The tool can be integrated into workflows for regularly updating the GeoDiscovery portal with new and updated datasets. It is suitable for use by libraries, research institutions, and other organizations involved in managing geospatial data.

Keywords/Tags:
DCAT: keyword
OGM Aardvark: dct_subject_sm
### Integration
OpenDataHarvest is part of a broader set of utilities in the GeoDiscovery-Utils repository, all of which support the functionalities of the GeoDiscovery portal. The tool is designed to work in conjunction with other components to provide a robust geospatial data management and discovery solution.

Publisher:
DCAT: publisher
OGM Aardvark: dct_publisher_sm

Contact Point:
DCAT: contactPoint
OGM Aardvark: dct_contributor_sm

Access Rights:
DCAT: accessLevel
OGM Aardvark: dct_accessRights_s

Temporal Coverage:
DCAT: temporal
OGM Aardvark: dct_temporal_sm

Spatial Coverage:
DCAT: spatial
OGM Aardvark: dct_spatial_sm

Identifier:
DCAT: identifier
OGM Aardvark: dct_identifier_s

Rights:
DCAT: rights
OGM Aardvark: dct_rights_sm

Format:
DCAT: format
OGM Aardvark: dct_format_s

Landing Page:
DCAT: landingPage
OGM Aardvark: dct_isPartOf_sm
### Benefits
- **Efficiency**: Automates the repetitive task of data harvesting, saving time and resources.
- **Up-to-date Data**: Ensures that the GeoDiscovery portal remains current with the latest available geospatial data.
- **Enhanced Discoverability**: Through comprehensive metadata processing, the tool enhances the discoverability of datasets within the portal.

## The [config.yaml](https://github.com/UWM-Libraries/GeoDiscovery-Utils/blob/main/opendataharvest/config.yaml) file.

Expand All @@ -70,7 +39,7 @@ The opendataharvest tool gets both it's configuration parameters (e.g. where to
default values for fields,
and manifests of open data sites to harvest from.

You can see the configuration options at the top:
### Configuration Options

```yaml
CONFIG:
Expand All @@ -83,7 +52,7 @@ CONFIG:
SCHEMA: "https://raw.githubusercontent.com/UWM-Libraries/GeoDiscovery/main/schema/geoblacklight-schema-aardvark.json"
```
Next the default values are set in the "Localization" section:
### Localization and Default Values
```yaml
DEFAULT:
Expand All @@ -107,6 +76,8 @@ DEFAULT:
Following a small section of test sites, the rest of the file has nested records for each of the Hubs or portals we harvest from.
### Example of a data portal in the YAML file
Here is an example of a record for the Wisconsin Department of Health Services Data Portal DCAT-compliant portal:
```yaml
Expand Down Expand Up @@ -145,3 +116,54 @@ including ESRI basemaps.
We don't want to ingest these into our portal, so we add them to the skiplist.
There are some datasets that have other elements such as `DatasetPrefix` that are not being used at this time.

## Basic crosswalk mapping:

Title:
DCAT: title
OGM Aardvark: dct_title_s

Description:
DCAT: description
OGM Aardvark: dct_description_sm

Keywords/Tags:
DCAT: keyword
OGM Aardvark: dct_subject_sm

Publisher:
DCAT: publisher
OGM Aardvark: dct_publisher_sm

Contact Point:
DCAT: contactPoint
OGM Aardvark: dct_contributor_sm

Access Rights:
DCAT: accessLevel
OGM Aardvark: dct_accessRights_s

Temporal Coverage:
DCAT: temporal
OGM Aardvark: dct_temporal_sm

Spatial Coverage:
DCAT: spatial
OGM Aardvark: dct_spatial_sm

Identifier:
DCAT: identifier
OGM Aardvark: dct_identifier_s

Rights:
DCAT: rights
OGM Aardvark: dct_rights_sm

Format:
DCAT: format
OGM Aardvark: dct_format_s

Landing Page:
DCAT: landingPage
OGM Aardvark: dct_isPartOf_sm

0 comments on commit 9a67773

Please sign in to comment.