Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate harvesting with resource manager #227

Closed
ricardogsilva opened this issue Jun 20, 2021 · 1 comment
Closed

Integrate harvesting with resource manager #227

ricardogsilva opened this issue Jun 20, 2021 · 1 comment
Assignees
Labels
C182-SPC-2020-PACIFIC GeoNode This issue requires work on GeoNode Harvester

Comments

@ricardogsilva
Copy link
Contributor

ricardogsilva commented Jun 20, 2021

The current harvesting implementation is able to extract relevant metadata from a remote GeoNode server. However, it is not yet capable of generating local GeoNode resources with the extracted metadata. This is to be implemented by interacting with the new GeoNode resource manager, which is being introduced via GNIP#89, as per:

GeoNode/geonode#7664

Briefly, the idea is that a harvester worker shall request that the resource manager creates a new local GeoNode resource and provide it with the relevant metadata (and data too, but we will get there after this).


Harvesting workflow

On the harvesting side, the business logic is something like:

  • harvesting.tasks.harvesting_dispatcher() is called by the celery beat scheduler (or on-demand by the user) - this triggers a call to harvesting.tasks._harvest_resource()
  • _harvest_resource() instantiates the relevant harvester worker and calls its get_resource() and update_geonode_resource() methods sequentially:
    • get_resource() extracts metadata from the remote service
    • update_geonode_resource() takes that extracted metadata and creates a local GeoNode resource

This issue only deals with the implementation of the update_geonode_resource() method - the other parts of the workflow are already implemented.


Goal

Finish the implementation of the geonode.harvesting.harvesters.base.BaseHarvesterWorker.update_geonode_resource()
method.

The default implementation of this update_geonode_resource() method shall receive an instance of RecordDescription and a harvesting_session_id as input parameters and then proceed to:

  • Ask the GeoNode resource manager to create or update the relevant GeoNode resource. It shall provide the resource manager with the required details which will come from both the input RecordDescription and additional properties on the related models.Harvester (like the default resource access permissions, the default owner, etc)

  • Add additional information to the harvesting session

Child classes are free to either reuse this default implementation or reimplement the method if they require more complex behavior.

Examples:

  • The default GeoNode harvester worker does not need any additional functionality. Therefore it shall use the base implementation.
  • The PDN harvester worker (for Nexus project), shall need to populate custom DB tables with the information extracted from the remote service.

NOTES:

  • At this point it is not clear how we will signal to the resource manager that a certain resource should be bound to the harvester
  • Please submit this work as a PR against the harvesting_configuration branch on the ricardogsilva/geonode repo - that branch is currently being used for a PR on the initial harvesting functionality

Related to #176

@ricardogsilva
Copy link
Contributor Author

Closing this, as initial integration has already been merged, and there is PR geosolutions-it/geonode#864 which continues the work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C182-SPC-2020-PACIFIC GeoNode This issue requires work on GeoNode Harvester
Projects
None yet
Development

No branches or pull requests

2 participants