-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images_to_samples: Create binary labels from multi-class label data #167
Comments
Personally I prefer the first one. |
I would call approaches dynamic/static rather than online/offline. In an ideal world, that is when we can fetch training samples through the web, there could be an opportunity for us not to bother anymore with duplicating our data into HDF5 files. In that respect, I would prefer the dynamic approach (#1). |
This feature was broken with PR #208. This feature should apply to any subset of classes as was done with the parameter target_ids. In previous implementations, the offline/static approach was chosen. The disadvantages of this approach can be dealt with by a sampling to plain .tifs and .geojson rather than HDF5s. |
A solution is proposed in my 215-solaris-tiling branch. The vector ground truth is first tiled to vector chips (as geojson) alongside the imagery. The desired attribute values for a particular attribute field are then burned to raster. The advantage of this approach is that the geojson chips contain all initial attribute information and only to be created once. The burning of specific values can be requested multiple times and store to different copies as raster chips. |
This issue will be solve with the merge of the branch 222-stac-item-input and the change made at line 471 in train_segmentation.py. This change make the reading of the gt binary if only one class is specify at attribute_values in the dataset yaml. |
Problem
To train class-specific models, GDL cannot currently create samples with single-class labels if label data provided is multi-class.
Solution
Two approaches seem feasible to implement this missing feature. First let's look at necessary developements, common to both approaches, that would let user control this feature seemlessly:
1. Online approach
Description
Dynamically zero out (i.e. set as "background") irrelevant class values during training without modifying hdf5 files.
Advantages
Disadvantages
Implementation
The get_item method of data loader would be in charge of zeroing out irrelevant values, returning a binary label.
2. Offline approach
Description
In images_to_samples.py, once geopackage is read and rasterized, create hdf5 samples with binary values (i.e. "class of interest" and "non class of interest"). All irrelevant class values are then zeroed out to match the value of "non class of interest" class. Training continues as usual
Advantages
Disadavantages
Implementation
After having filtering out undesired samples (i.e. those that do not meet the class_prop threshold and min annotated percent), irrelevant class values are zeroed out before final samples is written to hdf5, in add_to_dataset function
Dev effort estimate
Necessary effort to implement either approaches ranges from 1 to 2 workdays. This dev seems fairly simple at a first glance.
Useful resources and articles
Buildings extraction
of Building Segmentation Masks"
Road extraction
Road Extraction in VHR Remote Sensing Images"
The text was updated successfully, but these errors were encountered: