Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training preparation #2

Merged
merged 14 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ exclude = .git
max-line-length = 87

per-file-ignores =
./m.neural_network.preparedata/m.neural_network.preparedata.py: F821
./m.neural_network.preparedata/m.neural_network.preparedata.py: E501
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: F821
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: E501
./m.neural_network.preparedata/m.neural_network.preparedata.py: E501,F821
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: F821,E501
./m.neural_network.preparedata.worker_export/m.neural_network.preparedata.worker_export.py: E501
./m.neural_network.preparetraining/m.neural_network.preparetraining.py: E501,F821
./m.neural_network.preparetraining.worker/m.neural_network.preparetraining.worker.py: E501,F821
4 changes: 3 additions & 1 deletion m.neural_network.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ <h2>DESCRIPTION</h2>
<li><a href="m.neural_network.preparedata.html">m.neural_network.preparedata</a>: Prepares and exports tiles for the label process</li>
<li><a href="m.neural_network.preparedata.worker_export.html">m.neural_network.preparedata.worker_export</a>: Worker for parallel processing for exporting for <b>m.neural_network.preparedata</b></li>
<li><a href="m.neural_network.preparedata.worker_nullcells.html">m.neural_network.preparedata.worker_nullcells</a>: Worker to analyse the number of null cells in parallel for <b>m.neural_network.preparedata</b></li>
<li><a href="m.neural_network.preparetraining.html">m.neural_network.preparetraining</a>: Takes and reformats labeled tiles for the neural network training</li>
<li><a href="m.neural_network.preparetraining.html">m.neural_network.preparetraining</a>: Prepares imagery and labelled data for training and application of a neural network.</li>
<li><a href="m.neural_network.preparetraining.worker.html">m.neural_network.preparetraining</a>: Worker to rasterize labelled data in parallel for <b>m.neural_network.preparetraining</b><</li>
</ul>

<h2>REQUIREMENTS</h2>
Expand All @@ -39,6 +40,7 @@ <h2>REQUIREMENTS</h2>

<ul>
<li>grass-gis-helpers>=2.2.0</li>
<li>GDAL/OGR and Python bindings</li>
</ul>

<h2>AUTHORS</h2>
Expand Down
7 changes: 7 additions & 0 deletions m.neural_network.preparetraining.worker/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
MODULE_TOPDIR = ../..

PGM = m.neural_network.preparetraining.worker

include $(MODULE_TOPDIR)/include/Make/Script.make

default: script
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<h2>DESCRIPTION</h2>

<em>m.neural_network.preparetraining.worker</em> is used within <em>m.neural_network.preparetraining</em> to rasterize label data in parallel.
<p>
<h2>SEE ALSO</h2>

<em>
<a href="g.region.html">g.region</a>
<a href="r.mapcalc.html">r.mapcalc</a>,
<a href="v.to.rast.html">v.to.rast</a>,
</em>

<h2>AUTHORS</h2>
<p>Guido Riembauer, <a href="https://www.mundialis.de/">mundialis GmbH &amp; Co. KG</a><br>
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
#!/usr/bin/env python3
"""############################################################################
#
# MODULE: m.neural_network.preparetraining.worker
# AUTHOR(S): Guido Riembauer
# PURPOSE: Worker module for m.neural_network.preparetraining to check
# and rasterize label data
# COPYRIGHT: (C) 2024 by mundialis GmbH & Co. KG and the GRASS Development
# Team.
#
# This program is free software under the GNU General Public
# License (v3). Read the file COPYING that comes with GRASS
# for details.
#
##############################################################################
"""

# %Module
# % description: Worker module for m.neural_network.preparetraining to check and rasterize label data
# % keyword: raster
# % keyword: statistics
# %end

# %option G_OPT_F_INPUT
# % required: yes
# % multiple: no
# % label: Path to the label vector file
# % guisection: Input
# %end

# %option G_OPT_F_INPUT
# % key: img_path
# % required: yes
# % multiple: no
# % label: Path to the corresponding imagery raster file
# % guisection: Input
# %end

# %option
# % key: class_column
# % type: string
# % required: yes
# % multiple: no
# % answer: class_number
# % label: Column of the label vector that holds the class number
# % guisection: Parameters
# %end

# %option
# % key: class_values
# % type: integer
# % required: yes
# % multiple: yes
# % answer: 2
# % label: Expected and output values for the class/es of interest
# % guisection: Parameters
# %end

# %option
# % key: no_class_value
# % type: integer
# % required: yes
# % multiple: no
# % answer: 1
# % label: Expected and output value for the non class of interest areas
# % description: Can be understood as a "rest" class for a multiclass system and a "no-class" for a binary classification
# % guisection: Parameters
# %end

# %option G_OPT_F_OUTPUT
# % required: yes
# % multiple: no
# % label: Path to the output label raster file
# % guisection: Output
# %end

# %option
# % key: new_mapset
# % type: string
# % required: yes
# % multiple: no
# % label: Name of the new mapset to work in
# % guisection: Parameters
# %end

import atexit
import os
import shutil

import grass.script as grass
from grass_gis_helpers.mapset import switch_to_new_mapset
from osgeo import gdal

NEWGISRC = None
GISRC = None
ID = grass.tempname(8)
NEW_MAPSET = None


def cleanup():
"""Switch mapsets and deleting the new one."""
# switch back to original mapset
griembauer marked this conversation as resolved.
Show resolved Hide resolved
grass.utils.try_remove(NEWGISRC)
os.environ["GISRC"] = GISRC
# delete the new mapset (doppelt haelt besser)
gisenv = grass.gisenv()
gisdbase = gisenv["GISDBASE"]
location = gisenv["LOCATION_NAME"]
mapset_dir = os.path.join(gisdbase, location, NEW_MAPSET)
if os.path.isdir(mapset_dir):
shutil.rmtree(mapset_dir)


def main():
"""Run label rasterization."""
global NEWGISRC, GISRC, NEW_MAPSET
input = options["input"]
img_file = options["img_path"]
NEW_MAPSET = options["new_mapset"]
class_values = options["class_values"].split(",")
no_class_value = options["no_class_value"]
class_col = options["class_column"]
output = options["output"]

# switch to the new mapset
GISRC, NEWGISRC, old_mapset = switch_to_new_mapset(NEW_MAPSET)
# get extent from reference img file
info = gdal.Info(img_file, format="json")
south = info["cornerCoordinates"]["lowerLeft"][1]
west = info["cornerCoordinates"]["lowerLeft"][0]
north = info["cornerCoordinates"]["upperRight"][1]
east = info["cornerCoordinates"]["upperRight"][0]
cols, rows = info["size"]
# set the region
grass.run_command(
"g.region",
n=north,
s=south,
e=east,
w=west,
rows=rows,
cols=cols,
quiet=True,
)

# import the label dataset
labelvect = f"labelvect_{ID}"
labelrast = f"labelrast_{ID}"
grass.run_command("v.import", input=input, output=labelvect, quiet=True)

# check the values of the vector
dbselect = list(grass.parse_command("v.db.select", map=labelvect).keys())
colnames = dbselect[0].split("|")
rows = [item.split("|") for item in dbselect[1:]]
try:
idx = colnames.index(class_col)
except ValueError:
grass.fatal(_(f"File {input} has no column {class_col}"))
class_numbers = [item[idx] for item in rows]
class_num_set_ref = set([*class_values, no_class_value])
difference = set(class_numbers).difference(class_num_set_ref)
if len(difference) > 0:

grass.fatal(
_(
f"Label file {input} has features with unexpected values"
f" in column {class_col}: {difference}. Allowed values "
f"are [{','.join(class_values)}, {no_class_value}].",
),
)

tile_empty = False
if len(class_numbers) == 0 or set(class_numbers) == set([no_class_value]):
grass.warning(
_(
f"Label file {input} contains no features with the "
f"expected class values {class_values} in "
f"column {class_col}. It is assumed that the classes "
"do not occur in this tile.",
),
)
tile_empty = True

# rasterize
if tile_empty is True:
grass.run_command(
"r.mapcalc",
expression=f"{labelrast}={no_class_value}",
quiet=True,
)
else:
labelrast_tmp = f"{labelrast}_tmp"
grass.run_command(
"v.to.rast",
input=labelvect,
output=labelrast_tmp,
type="area",
use="attr",
attribute_column=class_col,
quiet=True,
)
# if there is any nodata left in the label, this will be assigned
# to the no-class class
exp = f"{labelrast}=if(isnull({labelrast_tmp}),{no_class_value},{labelrast_tmp})"
grass.run_command("r.mapcalc", expression=exp, quiet=True)

grass.run_command(
"r.out.gdal",
input=labelrast,
output=output,
type="Byte",
createopt="COMPRESS=LZW", # no tiles or overviews required for the small tiles (?)
flags="c",
quiet=True,
)


if __name__ == "__main__":
options, flags = grass.parser()
atexit.register(cleanup)
main()
7 changes: 7 additions & 0 deletions m.neural_network.preparetraining/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
MODULE_TOPDIR = ../..

PGM = m.neural_network.preparetraining

include $(MODULE_TOPDIR)/include/Make/Script.make

default: script
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
<h2>DESCRIPTION</h2>

<em>m.neural_network.preparetraining</em> prepares imagery and labelled data for training and application of a neural network.
<p>While <a href="m.neural_network.preparedata">m.neural_network.preparedata</a> initially provides a setup for labelling tiles of imagery,
<em>m.neural_network.preparetraining</em> rasterizes the vector labels and restructures the imagery data.

<h2>Notes</h2>
It is expected that all data lie in the directory structure and naming format as created by <a href="m.neural_network.preparedata">m.neural_network.preparedata</a>.
This data is provided to <em>m.neural_network.preparetraining</em> via the <em>input_traindir</em> and <em>input_applydir</em> parameters.
<em>m.neural_network.preparetraining</em> creates a new directory with the two directories <em>train</em> and <em>apply</em>. Each of these contains
the following directories/data:

<ul>
<li><em>train_images:</em>: contains tilewise multiband .vrt-files with all imagery bands and an ndsm band to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>train_masks:</em>: contains tilewise rasterized .tif label files to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>val_images:</em>: contains tilewise multiband .vrt-files with all imagery bands and an ndsm band to be used for validation. This directory holds data both in the <em>train</em> and <em>apply</em> dirs. In the <em>train</em> dir, this data is used for validation during training, while in the <em>apply</em> dir, this directory holds all imagery used for prediction.</li>
<li><em>val_masks:</em>: contains tilewise rasterized .tif label files to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>singleband_vrts:</em>: contains singleband .vrts for each imagery band of each tile. They are stored here as a basis to create the tilewise multiband .vrts.</li>
<li><em>tile_XX_YY.vrt:</em> (only in the <em>train</em> dir): One multiband tile .vrt is stored here for the NN model to read in the number of bands.</li>
</ul>
<p>
In order to save diskspace, all imagery is stored as .vrts, so the original datasets (created by <a href="m.neural_network.preparedata">m.neural_network.preparedata</a>) should
not be moved (or <em>m.neural_network.preparetraining</em> should be run again afterwards).
</p>
<p>
The user can indicate what percentage of the training tiles are used for validation (during training) with the <em>val_percentage</em> parameter.
</p>
<p>
It is not possible to run <em>m.neural_network.preparetraining</em> repeatedly with the same <em>output</em> directory, as the training/validation split up happens during runtime.
Hence, <em>m.neural_network.preparetraining</em> expects that the <em>output</em> directory does not exist.
</p>
<p>
With the <em>class_values</em> and the <em>no_class_value</em> parameters, the user defines the allowed range of values in the <em>class_column</em> of the labelled data. In
case an unexpected value is found, an error is thrown which indicates the affected tile.
</p>
<p>
If a tile is not completely covered either by <em>class_values</em> or <em>no_class_value</em>, the not allocated areas will be filled with <em>no_class_value</em> in the rasterized version.
</p>

<h2>EXAMPLES</h2>

<div class="code"><pre>
m.neural_network.preparetraining input_traindir=nn_data_with_labels/train input_applydir=nn_data_with_labels/apply nprocs=6 class_column=class_number class_values=2 no_class_value=1 output=nn_data_structured
</pre></div>


<h2>SEE ALSO</h2>

<em>
<a href="https://grass.osgeo.org/grass-stable/manuals/v.import.html">v.import</a>,
<a href="https://grass.osgeo.org/grass-stable/manuals/g.region.html">g.region</a>
<a href="https://grass.osgeo.org/grass-stable/manuals/r.mapcalc.html">r.mapcalc</a>,
<a href="https://grass.osgeo.org/grass-stable/manuals/v.to.rast.html">v.to.rast</a>,
</em>

<h2>REQUIREMENTS</h2>
<ul>
<li>GDAL and OGR Python bindings</li>
<li><a href="https://pypi.org/project/grass-gis-helpers/">grass-gis-helpers</a> Python library >= 2.2.0</li>
</ul>

<h2>AUTHORS</h2>
Guido Riembauer, <a href="https://www.mundialis.de/">mundialis GmbH &amp; Co. KG</a><br>
Loading
Loading