Link bounding boxes to geographic coverage #7455

steeleworthy · 2020-12-03T19:11:12Z

Feature Request Summary

Users as well as indexes such as the Canadian[ Geodisy]((https://geo.frdr-dfdr.ca/) would benefit from having DV link bounding box metadata to geographic coverage metadata. Both fields are repeatable but there is no relationship between the them. This means that datasets with multiple boxes or coverage fields risk having unlinked and potentially messy geo metadata. Being able to link place names to a corresponding bounding box would add context to the user and enable further geospatial discovery down the line.

As an example, see this dataset. https://doi.org/10.5683/SP2/4E6LGS. The dataset features 30+ unique locations. Presently, if we were to add bounding boxes for each location, there would be no association to the free-text geographic coverage metadata fields. We would get only a long list of bounding boxes, And one list would not have the context that the other could provide. As a stopgap, we instead created one massive bounding box that covers all the place names, but that doesn't reflect the data well.

It would be great if we could associate a unique bounding box with each of these unique locations rather than using 1 large bounding box that covers all, as the vast space between the unique locations has almost no bearing on the data collections at the locations themselves.

Thanks,

djbrooke · 2020-12-03T19:39:00Z

Hey @steeleworthy - thanks for the suggestion. I'll mention @jggautier to get his thoughts as I'm not as well-versed in Geospatial metadata.

jggautier · 2020-12-04T16:27:10Z

Hi @steeleworthy. There's some discussion in #7108 about whether or not multiple bounding boxes makes sense, and whether or not the DDI codebook schema allows for that. We spoke with Wendy Thomas, who chairs DDI's technical committee, and she and others she spoke with didn't seem to think so, but perhaps the value of multiple bounding boxes wasn't explored enough. Thanks for pointing to https://doi.org/10.5683/SP2/4E6LGS. Is that an example that we could bring up with Wendy as the DDI community considers changes to Codebook?

I imagine that if multiple bounding boxes are allowed, that would really change the definition of the metadata field, and the committee might consider introducing one or more new fields instead to try to maintain backwards compatibility.

steeleworthy · 2020-12-07T15:28:25Z

Hi @jggautier , This all sounds good. Writing that, I figured there might be something within the DDI let along the DV coding to consider. Anyway,

Yes, go ahead and share the DOI. I have a few other that have similar use-cases. I'll look for them to see if they may help.
If any action it taken on this, maybe it would make sense to also reach out to some of the Geodisy developers?

Thanks, will be in touch.

jggautier · 2020-12-08T14:47:30Z

Thanks! Pinging @pdante-ubc and @markjwgoodwin who work on Geodisy. :)

In #7455 and in DDI's corresponding issue tracker at https://ddi-alliance.atlassian.net/browse/DDICODE-70, using "named geographic entities (e.g., "United States" and "Israel") when the data include non-contiguous areas in a single study" and using "bounding polygon" fields was mentioned as something that DDI Codebook already supports.

Of course Dataverse's geospatial metadatablock has fields for "named geographic entities" (like the "Nation" field). Would it be better if the geospatial metadatablock included repeatable bounding polygon fields? How might Geodisy and other search systems use bounding polygons to improve how people find and reuse data? And regarding https://doi.org/10.5683/SP2/4E6LGS, would associating each named geographic entity with a bounding polygon help?

pdante-ubc · 2020-12-09T23:32:31Z

Connecting bounding boxes/polygons to named geographic entities would be useful to Geodisy in that we could have our bounding box-based record pages list only the location referenced by the bounding box rather than listing every named geographic entity in the dataset for each bounding box. It will require a little programming on our part to deal with this change, but I think the benefits to both us and other groups probably outweigh short-term increases in development work for Geodisy.

On to the polygon question: Do researchers actually want to enter polygons into the metadata rather than just uploading a representative geospatial file? My initial thought is that if the area is much more complete than a box that researchers will probably tend to just upload a geospatial file rather than entering a bunch of points.

As for Geodisy and polygons: Geodisy only uses bounding boxes because that's what our front end (GeoBlacklight) uses for map-based search. We would actually have to convert polygons to boxes if we got them, so it would be more work for us with out any actual benefit for our project. It would only be like 3-4 lines of code to implement if there wasn't the polygon challenge of knowing if they mean the inside or outside of the polygon. For example, if the polygon is the defined as the border between the oceans and the continents, programmatically it would be challenging to know if the dataset is the land part or the ocean part. This also could be an issue in cases where the polygon crosses the international dateline such that the farthest east points are farther west than the farthest west points. With the current bounding box setup the dateline issue is obvious because the depositor is entering the edge values in the appropriate field, but with a freeform polygon we would increase ambiguity, You could say that polygon points should be entered clockwise (or counter-clockwise; I'm not sure what the GIS convention is), but that puts a greater burden on depositors and I think the likelihood of errors increases. Is there better way of indicating what is inside vs outside with respect to a polygon other than order the points are listed? As a stopgap solution Geodisy could just assume that the polygon being represented is the smaller of the two options, but that does open us up to a small percentage of errors. I may be making a mountain out of a mole hill in terms of the polygon problem as I have no idea how often this edge case would come up.

steeleworthy · 2020-12-10T01:58:40Z

I need to avoid some of the technical matter in this thread since it's out of my wheelhouse, but I have two general thoughts to add from my experience working with researchers in geographic and environmental sciences (both the social science and natural science end of things).

on Metadata and precision. I've found that many users appreciate seeing place names and bounding box coordinates in the record. They know the difference between 49.045 and 49.945. For them, being given the opportunity to give a place name in repeatable fields but not the corresponding point/box data seems odd. As one said to me, "If you're going to be precise with one, why not the other?"
on increasing depositor burden. This is something I worry about as well! I will say that if a researcher brings a dataset to me anywhere near the end of the work - and most of the time that's what we see at my local - asking them to develop an additional shapefile to better represent the geographic scope/extent of their work is often going to be a tough one. However, asking them to identify bounding box coordinates is a fairly low bar.

I'm giving all this context with the understanding that we can only do what's possible within the DDI and the code. I get it if we can only push the platform so far. Thanks, everyone!

jggautier · 2021-01-26T21:20:59Z

Thanks @pdante-ubc and @steeleworthy. This is really helpful. I'm in the camp where we shouldn't necessarily be beholden to any specific metadata standard, like DDI Codebook here. Changes and additions have been made over the years to Dataverse's fields and the structure of those fields, which happen to deviate from Codebook, because it better supports user's needs. If there's no way in DDI Codebook to express certain information, then we simply don't include that in the Codebook exports and we work with DDI to update the standard so that we can later (which is what's already happening).

Writing about this issue in DDI's ticketing system, Wendy quoted a colleague's advise that "most software systems (as far as I know) don't support search over multiple bounding boxes" but "that doesn't mean it's necessarily an unsound idea, especially if software does support multiple bounding boxes." So if Geodisy is an example of that software, what if after we work out how best to support multiple bounding boxes in Dataverse, we ask the DDI group to consider supporting it, too?

I'm assuming that Geodisy is using the Dataverse JSON exports. Is that right?

I think that the current bounding box compound field that Dataverse ships with should not be repeatable, so the solution to this GitHub issue should not be to associate the current bounding box fields with any particular geographic coverage fields. It doesn't make sense when considering the definition of the field, and letting it be repeatable seems more and more like an error that should've been corrected long ago.

Thankfully, I think there are very few datasets in the known Dataverse repositories that have repeating bounding boxes, so if we agree that the solution to #7108 is to make the field non-repeatable, correcting datasets that do have repeatable boxes shouldn't be too difficult.

I also think that it'll be better if adding new fields and structure for multiple bounding boxes that can be associated with each geographic coverage field is worked out along with the suggestions in #6713, which includes adding lat/long points.

I'm no expert in describing geospatial data so I've made peace with how unable I am to reply helpfully to @pdante-ubc's questions about bound polygons. Does anyone think it would it be helpful if we could find and look into other systems that have metadata fields that map to DDI Codebook's bound polygon fields?

pdante-ubc · 2021-01-26T23:01:58Z

@jggautier, Geodisy is using the Dataverse JSON export.

I'm not 100% sure I know what you mean by "search over multiple bounding boxes." Geodisy does allow a dataset to have multiple bounding boxes, but each bounding box is represented as a separate record in our system, so a dataset with multiple bounding boxes entered would be represented by , for example, "Example Dataset name (1 of 6)".

I guess the situation I can see where multiple bounding boxes could exist for a single place name would be something along the lines of multiple data collection areas spread around BC. Country: Canada, Province/State: BC would be the same for all of them, but the researcher might want to include bounding boxes for each site.

jggautier · 2021-01-28T18:13:35Z

Sorry, I didn't mean to suggest that I wrote "search over multiple bounding boxes." I was quoting someone else who wrote it - "most software systems (as far as I know) don't support search over multiple bounding boxes". I'm also not sure exactly what he means. Did you get a chance to review the ticket in the DDI's ticket system I mentioned?

I'd like to propose a metadata model based on what's been discussed here so far.

The current metadata model right now is like:

Geographic Coverage 1
- Nation
- State
- City
- Other
Geographic Coverage n
- Nation
- State
- City
- Other
Geographic Bounding Box 1
- West Longitude
- East Longitude
- North Latitude
- South Latitude
Geographic Bounding Box n
- West Longitude
- East Longitude
- North Latitude
- South Latitude

The model below makes the existing Geographic Bounding Box field non-repeatable and adds a new repeatable subfield, Geographic Bounding Box, with its own subfields for its four points, to the repeatable Geographic Coverage field:

Geographic Bounding Box (not repeating since it represents "the largest geographic extent of the Dataset's geographic coverage")
Geographic Coverage 1
- Nation
- State
- City
- Other
- Geographic Bounding Box 1
  - West Longitude
  - East Longitude
  - North Latitude
  - South Latitude
- Geographic Bounding Box n
  - West Longitude
  - East Longitude
  - North Latitude
  - South Latitude
Geographic Coverage n
- Nation
- State
- City
- Other
- Geographic Bounding Box 1
  - West Longitude
  - East Longitude
  - North Latitude
  - South Latitude
- Geographic Bounding Box n
  - West Longitude
  - East Longitude
  - North Latitude
  - South Latitude

I didn't want to make and upload an image of a diagram since I'm hoping this hierarchical list will be easy for others to understand and edit. Is it clear? What do you all think? I think it's important we figure out what the model should be before tackling how to represent it in the UI.

mreekie · 2023-03-30T19:22:05Z

grooming:

reviewing the funded deliverables to make sure they are tagged.
saw this mentioned in: Mon, Jan 9, 10am EST meeting notes.
There is not tag, and I am not going to add one.

qqmyers mentioned this issue Dec 3, 2020

Should the Geographic Bounding Box allow optional coordinates and multiple boxes in the UI? #7091

Closed

jggautier added the Feature: Metadata label Dec 4, 2020

pdurbin added the Feature: Geospatial label Oct 9, 2022

mreekie added the pm.netcdf-hdf5.d All 3 aims are currently under this deliverable label Mar 30, 2023

pdurbin added Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. labels Oct 9, 2023

pdurbin mentioned this issue Oct 19, 2023

Remove or move lat/long metadata from bounding box fields IQSS/dataverse.harvard.edu#66

Open

This was referenced Mar 20, 2024

Feature Request/Idea: ISO 19115 versus Dataverse Geospatial Metadata Block (Proposed Changes) #10398

Open

Geospatial metadata block > bounding box coordinates fields > tool tips are INCORRECT and mention use of commas #10397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link bounding boxes to geographic coverage #7455

Link bounding boxes to geographic coverage #7455

steeleworthy commented Dec 3, 2020

djbrooke commented Dec 3, 2020

jggautier commented Dec 4, 2020 •

edited

Loading

steeleworthy commented Dec 7, 2020

jggautier commented Dec 8, 2020 •

edited

Loading

pdante-ubc commented Dec 9, 2020

steeleworthy commented Dec 10, 2020

jggautier commented Jan 26, 2021 •

edited

Loading

pdante-ubc commented Jan 26, 2021

jggautier commented Jan 28, 2021 •

edited

Loading

mreekie commented Mar 30, 2023

Link bounding boxes to geographic coverage #7455

Link bounding boxes to geographic coverage #7455

Comments

steeleworthy commented Dec 3, 2020

djbrooke commented Dec 3, 2020

jggautier commented Dec 4, 2020 • edited Loading

steeleworthy commented Dec 7, 2020

jggautier commented Dec 8, 2020 • edited Loading

pdante-ubc commented Dec 9, 2020

steeleworthy commented Dec 10, 2020

jggautier commented Jan 26, 2021 • edited Loading

pdante-ubc commented Jan 26, 2021

jggautier commented Jan 28, 2021 • edited Loading

mreekie commented Mar 30, 2023

jggautier commented Dec 4, 2020 •

edited

Loading

jggautier commented Dec 8, 2020 •

edited

Loading

jggautier commented Jan 26, 2021 •

edited

Loading

jggautier commented Jan 28, 2021 •

edited

Loading