-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link bounding boxes to geographic coverage #7455
Comments
Hey @steeleworthy - thanks for the suggestion. I'll mention @jggautier to get his thoughts as I'm not as well-versed in Geospatial metadata. |
Hi @steeleworthy. There's some discussion in #7108 about whether or not multiple bounding boxes makes sense, and whether or not the DDI codebook schema allows for that. We spoke with Wendy Thomas, who chairs DDI's technical committee, and she and others she spoke with didn't seem to think so, but perhaps the value of multiple bounding boxes wasn't explored enough. Thanks for pointing to https://doi.org/10.5683/SP2/4E6LGS. Is that an example that we could bring up with Wendy as the DDI community considers changes to Codebook? I imagine that if multiple bounding boxes are allowed, that would really change the definition of the metadata field, and the committee might consider introducing one or more new fields instead to try to maintain backwards compatibility. |
Hi @jggautier , This all sounds good. Writing that, I figured there might be something within the DDI let along the DV coding to consider. Anyway,
Thanks, will be in touch. |
Thanks! Pinging @pdante-ubc and @markjwgoodwin who work on Geodisy. :) In #7455 and in DDI's corresponding issue tracker at https://ddi-alliance.atlassian.net/browse/DDICODE-70, using "named geographic entities (e.g., "United States" and "Israel") when the data include non-contiguous areas in a single study" and using "bounding polygon" fields was mentioned as something that DDI Codebook already supports. Of course Dataverse's geospatial metadatablock has fields for "named geographic entities" (like the "Nation" field). Would it be better if the geospatial metadatablock included repeatable bounding polygon fields? How might Geodisy and other search systems use bounding polygons to improve how people find and reuse data? And regarding https://doi.org/10.5683/SP2/4E6LGS, would associating each named geographic entity with a bounding polygon help? |
Connecting bounding boxes/polygons to named geographic entities would be useful to Geodisy in that we could have our bounding box-based record pages list only the location referenced by the bounding box rather than listing every named geographic entity in the dataset for each bounding box. It will require a little programming on our part to deal with this change, but I think the benefits to both us and other groups probably outweigh short-term increases in development work for Geodisy. On to the polygon question: Do researchers actually want to enter polygons into the metadata rather than just uploading a representative geospatial file? My initial thought is that if the area is much more complete than a box that researchers will probably tend to just upload a geospatial file rather than entering a bunch of points. As for Geodisy and polygons: Geodisy only uses bounding boxes because that's what our front end (GeoBlacklight) uses for map-based search. We would actually have to convert polygons to boxes if we got them, so it would be more work for us with out any actual benefit for our project. It would only be like 3-4 lines of code to implement if there wasn't the polygon challenge of knowing if they mean the inside or outside of the polygon. For example, if the polygon is the defined as the border between the oceans and the continents, programmatically it would be challenging to know if the dataset is the land part or the ocean part. This also could be an issue in cases where the polygon crosses the international dateline such that the farthest east points are farther west than the farthest west points. With the current bounding box setup the dateline issue is obvious because the depositor is entering the edge values in the appropriate field, but with a freeform polygon we would increase ambiguity, You could say that polygon points should be entered clockwise (or counter-clockwise; I'm not sure what the GIS convention is), but that puts a greater burden on depositors and I think the likelihood of errors increases. Is there better way of indicating what is inside vs outside with respect to a polygon other than order the points are listed? As a stopgap solution Geodisy could just assume that the polygon being represented is the smaller of the two options, but that does open us up to a small percentage of errors. I may be making a mountain out of a mole hill in terms of the polygon problem as I have no idea how often this edge case would come up. |
I need to avoid some of the technical matter in this thread since it's out of my wheelhouse, but I have two general thoughts to add from my experience working with researchers in geographic and environmental sciences (both the social science and natural science end of things).
I'm giving all this context with the understanding that we can only do what's possible within the DDI and the code. I get it if we can only push the platform so far. Thanks, everyone! |
Thanks @pdante-ubc and @steeleworthy. This is really helpful. I'm in the camp where we shouldn't necessarily be beholden to any specific metadata standard, like DDI Codebook here. Changes and additions have been made over the years to Dataverse's fields and the structure of those fields, which happen to deviate from Codebook, because it better supports user's needs. If there's no way in DDI Codebook to express certain information, then we simply don't include that in the Codebook exports and we work with DDI to update the standard so that we can later (which is what's already happening). Writing about this issue in DDI's ticketing system, Wendy quoted a colleague's advise that "most software systems (as far as I know) don't support search over multiple bounding boxes" but "that doesn't mean it's necessarily an unsound idea, especially if software does support multiple bounding boxes." So if Geodisy is an example of that software, what if after we work out how best to support multiple bounding boxes in Dataverse, we ask the DDI group to consider supporting it, too? I'm assuming that Geodisy is using the Dataverse JSON exports. Is that right? I think that the current bounding box compound field that Dataverse ships with should not be repeatable, so the solution to this GitHub issue should not be to associate the current bounding box fields with any particular geographic coverage fields. It doesn't make sense when considering the definition of the field, and letting it be repeatable seems more and more like an error that should've been corrected long ago. Thankfully, I think there are very few datasets in the known Dataverse repositories that have repeating bounding boxes, so if we agree that the solution to #7108 is to make the field non-repeatable, correcting datasets that do have repeatable boxes shouldn't be too difficult. I also think that it'll be better if adding new fields and structure for multiple bounding boxes that can be associated with each geographic coverage field is worked out along with the suggestions in #6713, which includes adding lat/long points. I'm no expert in describing geospatial data so I've made peace with how unable I am to reply helpfully to @pdante-ubc's questions about bound polygons. Does anyone think it would it be helpful if we could find and look into other systems that have metadata fields that map to DDI Codebook's bound polygon fields? |
@jggautier, Geodisy is using the Dataverse JSON export. I'm not 100% sure I know what you mean by "search over multiple bounding boxes." Geodisy does allow a dataset to have multiple bounding boxes, but each bounding box is represented as a separate record in our system, so a dataset with multiple bounding boxes entered would be represented by , for example, "Example Dataset name (1 of 6)". I guess the situation I can see where multiple bounding boxes could exist for a single place name would be something along the lines of multiple data collection areas spread around BC. Country: Canada, Province/State: BC would be the same for all of them, but the researcher might want to include bounding boxes for each site. |
Sorry, I didn't mean to suggest that I wrote "search over multiple bounding boxes." I was quoting someone else who wrote it - "most software systems (as far as I know) don't support search over multiple bounding boxes". I'm also not sure exactly what he means. Did you get a chance to review the ticket in the DDI's ticket system I mentioned? I'd like to propose a metadata model based on what's been discussed here so far. The current metadata model right now is like:
The model below makes the existing Geographic Bounding Box field non-repeatable and adds a new repeatable subfield, Geographic Bounding Box, with its own subfields for its four points, to the repeatable Geographic Coverage field:
I didn't want to make and upload an image of a diagram since I'm hoping this hierarchical list will be easy for others to understand and edit. Is it clear? What do you all think? I think it's important we figure out what the model should be before tackling how to represent it in the UI. |
grooming:
|
Feature Request Summary
Users as well as indexes such as the Canadian[ Geodisy]((https://geo.frdr-dfdr.ca/) would benefit from having DV link bounding box metadata to geographic coverage metadata. Both fields are repeatable but there is no relationship between the them. This means that datasets with multiple boxes or coverage fields risk having unlinked and potentially messy geo metadata. Being able to link place names to a corresponding bounding box would add context to the user and enable further geospatial discovery down the line.
As an example, see this dataset. https://doi.org/10.5683/SP2/4E6LGS. The dataset features 30+ unique locations. Presently, if we were to add bounding boxes for each location, there would be no association to the free-text geographic coverage metadata fields. We would get only a long list of bounding boxes, And one list would not have the context that the other could provide. As a stopgap, we instead created one massive bounding box that covers all the place names, but that doesn't reflect the data well.
It would be great if we could associate a unique bounding box with each of these unique locations rather than using 1 large bounding box that covers all, as the vast space between the unique locations has almost no bearing on the data collections at the locations themselves.
Thanks,
The text was updated successfully, but these errors were encountered: