-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNIP 92: non-spatial structured data as (pre)viewable FAIR datasets #8714
Comments
@gannebamm my +1 here, the proposal is actually very good. Of course we need to carefully choose how to convert the structured data into a "standard" format that GeoNode can use later on. |
Added the GNIP to the wiki page |
Thanks @afabiani for adding it to the wiki page!
here you go: soil example dataset You can switch the metadata language to English and use the BZE_LW English Version. The site.xlsx is the spatial dataset we currently upload as a point layer. The other two xlsx files (LABORATORY_DATA, HORIZON_DATA) are examples of non-spatial datasets. I know, in the end, everything is spatial somehow since the lab and horizon datasets do explicitly or implicitly reference a sample site. Nonetheless, we would like to publish those as non-spatial datasets and enable custom applications to fetch those in an accessible and interoperable way through an API. An example of this kind of custom application can be seen at soilgrids. If you click a coordinate you will derive loads of additional data like this: However, most of our data is already stored in PostgreSQL databases. I know other research institutes also have working databases that could maybe just get integrated. If we use some ORM like sqlalchemy we could even open this up for a more diverse set of SQL-like data providers, as explained here. But maybe that one is out of scope and we should stay true and close to our current stack, which does use PostgreSQL. I will ask my colleagues to provide some more examples. |
My +1. thanks Florian |
@gannebamm this proposal is the natural prosecution of the conceptual change we did from "layers" to "datasets" in GeoNode. Before starting the discussion about their presentation (web client, API, standards, whatever) I wonder where we imagine storing these datasets. The first option that comes to my mind is
I know that this goes against the general advice to keep the services models separate, and in theory, we should only rely on OGC standard interfaces to query spatial datasets, but in the case of GeoNode and Geoserver:
|
As I had an offline discussion with @gannebamm on the topic, I am sneaking in on the discussion.
I think @gannebamm is having a slightly different workflow in mind (correct me if I am wrong). The data would not be imported into a central data store, but managed as reference to an existing database. I guess this is the most flexible and scalable approach as otherwise you would need to make sure to preserve the structure in the On the other hand, this would bring in the requirement of some sort of default structure anyways, if you do not want to implement special visualizers for each dataset. Maybe it could also be a mappable structure, filled out by the user during import. Maybe also both scenarios (1. existing DB; 2. import into |
Maybe the https://github.com/rvinzent/django-dynamic-models technology evaluated and used (?) for the SOS integration (contrib module) can help for this feature request. See: https://github.com/GeoNode/geonode-contribs/issues/172 |
Dear @giohappy, together with @gannebamm @mwallschlaeger we have started to iterate the requirements and the concept behind a non-spatial dataset feature for GeoNode. We have started a small prototype by setting up a Django App / Contrib module. At the moment, uploading data is achieved by providing a CSV file with a sidecar JSON Tabular Data Resource that describes the schema and types of fields. @gannebamm pointed us to the new geonode-importer module and we were wondering if this would be a good fit for ingesting the data. It looks to be designed in a way that it would allow the addition of custom/new handlers. Do you think it would fit our purpose? |
Dear @matthesrieke sorry for the late reply. First of all, a GNIP for the new importer is on its way, we want to make it a community module asap. At the moment it's hosted under GeoSolution's own repo. As you note, the new importer lets you implement specific handlers, and it can assume complete control of the lifecycle of a resource. For example, the handler is in charge of doing any housekeeping when a resource backed by specific data and tables is deleted. So, the primary use case here is to map a GeoNode resource to an external DB. If we generalize this I'd say that this case isn't strictly related to non-spatial datasets. In our vision, a non-spatial dataset could still be served by Geoserver, that way we can benefit from all the services and WFS-based client tools that we already have. They can work for non-spatial data too.
IMHO we should agree on the first point first, which is the subject of this GNIP. |
I am not sure if I understand the two dimensions stated.
We care where the data is located and would like to ingest it into the PostgreSQL backend for later use.
I think because this is not the intended way to use WFS tools like QGIS are likely to fail to understand the non-spatial data served in a WFS. Did anyone test this successfully? @afabiani @matthesrieke @t-book @francbartoli -> Maybe we should schedule a talk to discuss this? It is getting quite complex, and I think it would help to dig deep into the pros and cons of the possible approaches and define our needs. Maybe @mattiagiupponi can provide a short intro into the importer and the non-spatial data serving capabilities of GeoServer, and @matthesrieke can describe the prototype he developed to test the approach. In the end, less complexity is always welcome. If other community developers are interested in coming by, I can host a public meeting. Scheduling this will be rough, though. |
I'm not saying that this isn't relevant, My point is to distinguish the two requirements:
We're happy to discuss this in a call. |
@giohappy @afabiani @matthesrieke @t-book @mattiagiupponi (and everyone else interested!) There are some open slots next week for me. Please fill out this poll: https://terminplaner4.dfn.de/FOKIDXEtIVBq8sQB |
@gannebamm It looks today is winner? Does the meating happen? |
Hi @gannebamm |
Hi @mattiagiupponi thanks for adding the documentation, we will take a closer look. Would you be available for a short meeting to discuss possible technical approaches? Maybe this Thursday between 10-12am? You could also reply to me by mail ( |
Hi @matthesrieke |
thanks @mattiagiupponi ! Yes, 10am tomorrow is fine for me. I will be joined by @autermann and @ridoo |
@mattiagiupponi I would like to attend, too. |
@matthesrieke @gannebamm we're planning to complete the transition to the new importer very soon, and make it the default importer in 4.1.x. As you know the new importer misses the CSV handler. We were waiting to implement a solution that should replace the upload steps that we have now, where the lat/lon column can be selected at upload time.
This solution would provide an alternative that's not too expensive and complex to implement, and gives the opportunity to remove the current upload system (at the moment it's still required only for CSV files). I'm not against the solution based on Tabular Data Resource and VSI. What's your opinion? |
@matthesrieke please take a look at @giohappy comment. I do not see that as an issue. We will have two importer handlers, one for geospatial csv and one dedicated for non-spatial csv with TDR / VSI. What do you think? |
@gannebamm @giohappy We also see no problem. Both solution can co-exist and serve different use cases (one for simple csv uploads and one for whole datapackages). I like the "csvt solution" as well -- quite pragmatic. How would you communicate to the user about the configured name pattern for columns containing geometry information? |
@ridoo @gannebamm unfortunately our experiments with the CSV driver options and the For the moment we have implemented the basic solution, where only a fixed set of column names are recognized. There's a PR ready for an internal review, but if you want to take a look and suggest improvements you're welcome! GeoNode/geonode-importer#157 |
@giohappy @mattiagiupponi That is a pity to read. I only played around with it on the CLI, so I cannot tell much more on this. On our
Comment 1: I see that the style of a layer is mandatory during upload. To my understanding right now, I can decide to get errors either from the
For now I can ignore this .. but for the future, it would be nice to have a less hackish way introducing tabular data. Update: The error happens when GeoNode tries to invalidate the GWC. GeoServer does not know the resources logs a Do you think this is the right way to pass a fake SLD file along the upload of non-spatial/tabular data? Comment 2: During upload the the nonspatial/tabular data become of type VECTOR .. Calling http://localhost/geoserver/rest/layers/laboratory_data.xml gives me <layer>
<name>laboratory_data</name>
<type>VECTOR</type>
<resource class="featureType">
<name>geonode:laboratory_data</name>
<atom:link xmlns:atom="http://www.w3.org/2005/Atom" rel="alternate" href="http://localhost/geoserver/rest/workspaces/geonode/datastores/geonode_data/featuretypes/laboratory_data.xml" type="application/xml"/>
</resource>
<attribution>
<logoWidth>0</logoWidth>
<logoHeight>0</logoHeight>
</attribution>
<dateCreated>2023-03-02 15:49:59.20 UTC</dateCreated>
</layer> It seems that The PR look ok on a first glimpse (could not spent too much time on it, though). |
Hi @ridoo A possible approach is to use the custom_resource_manager provided by the importer. This manager is meant to override the default one to exclude common communication with GeoServer during create/copy/update phase of the resource, I guess in your case you have also to override the "create" method so GeoNode should not try to create the SLD style by adding something like this: def create(self, uuid, **kwargs) -> ResourceBase:
return ResourceBase.objects.get(uuid=uuid) NOTE: the layer in GeoServer (as always) should be imported and published by the previous step Then override the handler def create_geonode_resource(
self, layer_name: str, alternate: str, execution_id: str, resource_type: Dataset = Dataset, files=None
):
.......
saved_dataset = custom_resource_manager.create(
None,
resource_type=resource_type,
defaults=dict(
name=alternate,
workspace=workspace,
subtype="raster",
alternate=f"{workspace}:{alternate}",
dirty_state=True,
title=layer_name,
owner=_exec.user,
files=list(set(list(_exec.input_params.get("files", {}).values()) or list(files))),
),
)
.......
return saved_dataset Related to the second comment, I'm sure we talked about that for now, GeoNode is not ready to handle non-spatial resources and it will require some work to enable it. |
@mattiagiupponi thanks for the hint, I will by-pass the importer's Yes, we have talked about the limitation regarding non-spatial/tabualr data in GeoNode. However, I was unsure if you had further thought about possible pitfalls and/or ideas to overcome those :). |
@giohappy @mattiagiupponi |
@gannebamm I'm a bit lost. I don't see a PR connected to this issue, and I'm not sure if a solution has been implemented for the presentation of non-spatial datasets. |
@ridoo Giovanni is correct. Didn´t we create a PR somewhere for this feature? |
@gannebamm we did create PR #10842 which was needed to keep all unpacked files from an uploaded zip-file. However, the actual work to support non-spatial (tabular) data is a bit distributed:
|
After integration with
The records are displayed, but the preview (GetFeatureRequest) no longer works. I will investigate further. |
Tabular Preview works again after re-adding the It also appeared, that there was some orphaned code handling a faked thumbnail, which caused an "action not implemented, yet" error, as the To resolve the error, I added the Not sure, what caused the GeoWebCache Exception, actually. |
GNIP 92 - non-spatial structured data as (pre)viewable FAIR datasets
Overview
We need to store structured non-spatial datasets besides geodata as GeoNode resources. The non-spatial datasets shall provide a simple viewer as a preview and should be able to get used as part of dashboards. The datasets should be findable, accessible and provided in an interoperable way, therefore complying with the FAIR principles.
Proposed By
Florian Hoedt, Thünen-Institute Centre for Information Management
Assigned to Release
This proposal is for GeoNode 4.0
State
Motivation
Status Quo: Non-spatial but structured datasets like csv/ Excel files can be uploaded as documents. As document objects, these datasets do inherit the resource base metadata models but can not be viewed in a meaningful way.
As a research institute, our scientists often use PostgreSQL databases and tables to store and structure their research data. Currently, those datasets can not be published in any way in GeoNode. As a research institute, we need to store/register structured non-spatial datasets besides geodata as GeoNode datasets (in the meaning of a v4.0 dataset).
Objective: Implement a new category of RessourceBase for structured non-spatial datasets. Instead of using the GeoServer importer to ingest, e.g. shapefiles, into the PostGIS enabled backend, you should be able to define a connection string to the table to use [?]. The non-spatial datasets shall provide a simple viewer as a preview and should be able to get used as part of dashboards.
Proposal
How to realize the above-mentioned feature is still to be discussed.
As part of an internal discussion, we thought about using PostgREST as an accessible and interoperable tabular data provider. One major aspect is to synchronise authorization mechanisms with the new service. Currently, Django and GeoServer do synchronise their roles via GeoFence. Something similar should be implemented by the tabular service provider. There seem to be options to use JWT as part of the django-rest-framework to grant such authorization as explained here: https://gitter.im/begriffs/postgrest?at=61f06b40742c3d4b21b63843
Apart from using PostgREST as a tabular data provider we also considered the new OGC APIs. These may provide enough functionality for this GNIP. For example the EDR (https://ogcapi.ogc.org/edr/).
Backwards Compatibility
It is not intended to backport this GNIP to 3.x
Future evolution
Explain which could be future evolutions.
Feedback
See discussion below...
Voting
Project Steering Committee:
Links
Remove unused links below.
The text was updated successfully, but these errors were encountered: