Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate watercourse_100mseg.gpkg #44

Merged
merged 31 commits into from
Jan 4, 2021
Merged

Generate watercourse_100mseg.gpkg #44

merged 31 commits into from
Jan 4, 2021

Conversation

florisvdh
Copy link
Member

@florisvdh florisvdh commented Nov 27, 2020

Initial code added.

I plan to add/do:

  • code necessary to uniquely identify corresponding segments and endpoints with a common, unique code
  • steps to write / read intermediate output from GRASS as gpkg on GDrive (unneeded since data source has been entirely prepared in GRASS)
  • first steps to attach GRTSmaster_habitats address to points (chosen to leave this out of this processed data source since it is specific to the handling of the 3260 sampling frame, it would not be the generally recommendable practice to make unique (c.q. spatially balanced) IDs for these segments and points)
  • code to further clean variable names and simplify the data in some standardized way (unneeded since data source has been entirely prepared in GRASS)
  • code to write the result to disk (maybe as one GPKG)
  • first checks on the result
  • better referral to watercourses (consolidate raw data source) - see https://doi.org/10.5281/zenodo.4420905
  • provide a function read_watercourse_100mseg() in n2khab
  • consolidate the processed data source on Zenodo - see https://doi.org/10.5281/zenodo.4452578

@ToonHub I'll let you know when the processed data source is ready.

First impression of segments + points (from GRASS gui):

Screenshot

afbeelding

@florisvdh florisvdh changed the title Generate watercourse_segments & watercourse_segmentpoints Generate watercourse_100msegments & watercourse_100msegmentpoints Nov 27, 2020
@florisvdh florisvdh changed the title Generate watercourse_100msegments & watercourse_100msegmentpoints Generate watercourse_100mseg & watercourse_100msegpoints Nov 27, 2020
@florisvdh
Copy link
Member Author

florisvdh commented Nov 30, 2020

Compiled bookdown: generating_watercourse_100mseg.html.zip

@ToonHub I generated a first version of this processed data source; can you have a look at it? It is derived from the raw data source watercourse_segments (version watercourse_segments_20180601), which is still to be referred online (ideally Zenodo), it currently sits below this GDrive link. It represents the 'VHAS' subdataset ('waterloopsegmenten') and is identical to the 'Wlas' shapefile in the VHA_201806 GDrive folder that you provided, except for the filename standardization (and the WOR file I think).

The dataset has two layers (100m segments and endpoints). The two attribute variables of both layers are explained in the text.

I propose not to add GRTSmaster_habitats addresses inside this data source. It would unnecessarily inflate the data source, while it is only done in the context of the 3260 sampling frame. Another reason is that we best keep processed data sources 'multi-purpose' and therefore more generic; in this case the GRTSmaster_habitats approach would not lead to unique ID's without further tricks and spatially balanced addresses for lines can also be assigned in other ways (methods for lines exist).

@florisvdh florisvdh marked this pull request as ready for review November 30, 2020 16:12
@florisvdh florisvdh changed the title Generate watercourse_100mseg & watercourse_100msegpoints Generate watercourse_100mseg.gpkg Nov 30, 2020
@ToonHub
Copy link
Contributor

ToonHub commented Dec 2, 2020

I checked watercourse_100mseg.gpkg, both the points and lines. It looks fine to me, but there is one issue I am not sure about.
Currently, the selection of 100 m segments starts from the most upstream parts and when a segment stops when two watercourse come together. The end points of the created segments will be the starting points of the actual sampling units.
This sometimes results in very short segments when two watercourse come together. See example below.
Wouldn't is be better to start selecting segments from the most downstream parts and select the starting points?

Another thing.
What behaviour do we want when two watercourses come together? Do we want that two new segments start (in the upstreams direction)? That is the way it is now.
Or do we want that the segment in the 'main' watercourse (for example a 1 st order watercourse) continues and that a new segment starts for 2nd order water course that flows into the main watercourse?
See example below. It show a 1st order watercourse and two second order watercourses. At one of the branches a very short segment starts.

image

@florisvdh
Copy link
Member Author

florisvdh commented Dec 2, 2020

A roundup of new plans (after discussion):

  • consider starting from the watercourses dataset (Vhag.shp; pink below) instead of watercourse_segments (blue below). This will result in fewer segments shorter than 100 m.
  • define 100 m segments in the other direction, i.e. from linestring end towards linestring beginning. This will place the shorter segments at the upstream ('source') side. This can be done by first flipping the linestrings (v.edit tool=flip in GRASS) before running v.split.
  • as we want to retain the same relationship between points and (upstream) corresponding segment, the startpoints then have to be created instead of the endpoints.

Notes: it appears that:

  • watercourses and watercourse_segments coincide for the directions. However, there are clear differences as well, as seen below. Each has unique features that the other hasn't, but especially watercourses has most features. So their coverage is different and the one of watercourse_segments seems much worse.
  • many shorter watercourse segments still exist in watercourses - the total number of lines is still 20894. This partly has to do with networked patterns, i.e. another watercourse that connects two locations of the a watercourse, but also unconcluded / unfinished cases it seems, where merges would seem possible in one way or another (in order to achieve a longer 'main' watercourse).
    • a consequence is that there will be a gain by using watercourses, but it will still result in a lot of <100m segments (20894)
Screenshot 1: watercourses + watercourse_segments 2018

afbeelding

I also compared this version (1 Jun 2018) with the current version (7 Aug 2020):

  • the watercourses (26651 instead of 20894 lines) and watercourse_segments (61783 instead of 49465 lines) layers have been much extended. Also it seems that the above mentioned mismatches have been resolved in watercourse_segments
Screenshot 2: watercourses + watercourse_segments 2020

afbeelding

My current conclusion is to proceed with watercourses, version 7 Aug 2020).

…ents *

See Generate watercourse_100m...: uses watercourses, not watercourse_segments
@florisvdh
Copy link
Member Author

florisvdh commented Dec 17, 2020

@ToonHub I updated the processed data source. Can you have a look?

Compiled bookdown: generating_watercourse_100mseg.html.zip

It is now derived from the raw data source watercourses (version watercourses_20200807), which is still to be referred online (ideally Zenodo) at https://doi.org/10.5281/zenodo.4420905 and which currently also sits below this GDrive link. It represents the 'VHAG' subdataset ('waterlopen') of the VHA of 7 Aug 2020 at Geopunt.

For the rest previous comments still apply:

The dataset has two layers (100m segments and endpoints). The two attribute variables of both layers are explained in the text.

I propose not to add GRTSmaster_habitats addresses inside this data source. It would unnecessarily inflate the data source, while it is only done in the context of the 3260 sampling frame. Another reason is that we best keep processed data sources 'multi-purpose' and therefore more generic; in this case the GRTSmaster_habitats approach would not lead to unique ID's without further tricks, and spatially balanced addresses for lines can also be assigned in other ways (methods for lines exist).

Screenshot

afbeelding

Copy link
Contributor

@ToonHub ToonHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked both layers in watercourse_100mseg. Looks fine!

@florisvdh florisvdh merged commit 6b1d8f7 into master Jan 4, 2021
@florisvdh florisvdh deleted the watercourse_segments branch January 4, 2021 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants