-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSL4EO-S12: add new dataset/datamodule #1151
Conversation
tests/data/ssl4eo/data.py
Outdated
|
||
if path.endswith("B1.tif") or path.endswith("B9.tif"): | ||
profile["width"] = profile["height"] = SIZE // 6 | ||
elif ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a better way to do this...
torchgeo/datasets/ssl4eo.py
Outdated
directory = os.path.join(self.root, self.split, f"{index // self.times:07}") | ||
subdirs = sorted(os.listdir(directory)) | ||
directory = os.path.join(directory, subdirs[index % self.times]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat hard-coded, but is much faster than storing a list of every directory like we do in SeCo.
torchgeo/datasets/ssl4eo.py
Outdated
for band in self.metadata[self.split]["bands"]: | ||
filename = os.path.join(directory, f"{band}.tif") | ||
with rasterio.open(filename) as f: | ||
image = f.read(out_shape=(1, self.size, self.size)).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each Sentinel band is in its original resolution (10–60 m) and needs to be resampled to the higher resolution.
torchgeo/datasets/ssl4eo.py
Outdated
a matplotlib Figure with the rendered sample | ||
""" | ||
if self.split == "s1": | ||
# See Sentinel1.plot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish we could call the plot methods of Sentinel1 and Sentinel2 directly so we didn't have to copy the same logic into every dataset. I bet we could figure out a way to do this...
torchgeo/datasets/ssl4eo.py
Outdated
|
||
co_polarization = torch.clamp(co_polarization / 0.3, min=0, max=1) | ||
cross_polarization = torch.clamp(cross_polarization / 0.05, min=0, max=1) | ||
ratio = torch.clamp(ratio / 25, min=0, max=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RitwikGupta this plotting trick isn't working, all I see is a black image with red specs. I know you're not the biggest fan of RGB plotting, but this dataset presumably only has one type of image processing so it should be possible to reliably plot. Not sure how to tell if the image is power/decibel/amplitude or gamma/sigma and what that actually means for plotting. Any advice? You can download the 100 patch subset if you want to test the plotting method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart The Sentinel-1 imagery they pulled from GEE is in the decibel scale. Normally, Sentinel-1 data from Copernicus is in the power scale. The ratio you want to calculate should be in the power scale. This code will work to do visualize this specific data:
cross_pol = np.exp(cross_pol / 10)
co_pol = np.exp(co_pol / 10)
ratio = cross_pol / co_pol
fig, ax = plt.subplots(1, 4, figsize=(20, 10))
ax[0].imshow(cross_pol, cmap='bone')
ax[1].imshow(co_pol, cmap='bone')
ax[2].imshow(ratio, cmap='bone')
ax[3].imshow(np.concatenate([cross_pol, co_pol, ratio], axis=2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the relevant GEE documentation that says they convert to decibel when pulling S1 data in GEE: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I thought it was supposed to be [co_pol, cross_pol, ratio], not [cross_pol, co_pol, ratio]?
- Am I not supposed to do normalization (co_pol / 0.3, cross_pol / 0.05, ratio / 25)?
- Even with divide by 10, no normalization, swap co_pol/cross_pol, the images don't look great:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which image are you using? I can test with the same one.
- I think you can keep [co_pol, cross_pol, ratio].
- Are you doing
np.exp(x / 10)
or justx /10
? - You should use the log scale values in that post for viz purposes only. It's not required, but it may help. Do min-max norm to those ranges, not just division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I was using x / 10
instead of np.exp(x / 10)
. However, it still looks pretty bad. I also tried 10 ** (x / 10)
. I'm just plotting the first 20 images of the 100 patch subset if you want to try to tinker with it.
In the post, it's unclear to me how one is supposed to plot db data. If I immediately convert from db to power, would I use the linear values or the db values? Is linear different from power? What does it mean for values to range from -15 to 5 (for red)? Does that mean to convert values from [-15, 5] to [0, 1]? Do I perform np.exp(x / 10)
before or after all this normalization? Do I use co / cross
or co - cross
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you want to use power values to plot? In my paper, I simply plot the dB values (99% percentile and scale each one to [0,1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to try to make an RGB plot (see #821 (comment) for an example) because it's prettier and in theory provides more useful information. Unfortunately it's also much harder to plot. We could plot 2 grayscale plots in the worst case scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then can also plot dB for each channel? e.g. [VV,VH,VV/VH]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what I was doing originally and it didn't work either. I just copied the plotting code from our Sentinel1
dataset, which worked fine for data downloaded from ASF DAAC, but apparently not for data downloaded from GEE.
S1 plotting is definitely still broken, but I would be okay with merging this as is and fixing S1 plotting in a follow-up PR since I need this dataset soon. I'm planning on adding a datamodule for this, but will wait until #1168 is merged so I can properly test it. |
8bdad76
to
4b0774c
Compare
You can download the 100-patch subset for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calebrob6 this directly contradicts the change made in #1027. Should we revert that PR too? Given that:
- All of our datasets use a different scale factor, ignoring which satellite their images come from
- All images are normalized before they are returned from the dataset or datamodule, changing the appropriate scale factor
I'm starting to think we should use one of the default enhancements from QGIS:
- Min / max
- 2% / 98%
- ± 2 std dev
There could even be an option to select which of these you want to use. We can decide and implement this later, but I want to make sure we're not contradicting ourselves and won't revert #1027 multiple times until we come up with a uniform policy. This also relates to #496.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dividing by 10k obviously didn't work though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why did we merge #1027?
c31d3af
to
651347b
Compare
I'm guessing the images from gee and other places are different
…On Wed, Mar 29, 2023, 8:04 PM Adam J. Stewart ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On torchgeo/datasets/ssl4eo.py
<#1151 (comment)>:
Then why did we merge #1027
<#1027>?
—
Reply to this email directly, view it on GitHub
<#1151 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTS4JTDHT6ICPWN5GU3W6RTSJANCNFSM6AAAAAAVLIDS24>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yep. We could just plot 2 grayscale images instead of one RGB image for now and fix this in the future when someone (me) has a better understanding of SAR imagery. I'd prefer not to merge this with broken plotting, but we could open an issue and promise to fix it before 0.5.0 is released. |
Sorry all. I was out on vacation and then sick for a bit. I will take a look at this again. GEE imagery is in the decibel scale instead of the power scale most Sentinel-1 imagery is released in. This is standard for analysis and visualization. |
Any updates @RitwikGupta? We might want to just merge this PR and fix S1 plotting at a later date. |
Yep, I would vote to make the plot show something and merge |
* SSL4EO-S12: add new dataset * Style fixes * 100% coverage * fix mypy * black fixes * mypy fix * Convert from db to power * Don't cast to numpy * Remove comments referring to SeCo * SSL4EO: add extraction time * Add RandomSeasonContrast * Fix axes indexing * Add datamodule * fix tests * mypy fixes * fix missing import * Fix tests * isort fix * Typo fix * s2c: add B10 * Update test channels * S2 plotting was broken * Fix plotting * Black fix * Rename conf files * Remove file introduced by bad merge * Fix pixel size of bands * black fix * Better S1 plotting --------- Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Adds the SSL4EO-S12 dataset.
Some things the dataset doesn't currently support because I don't need them, but could potentially be added someday:
Plotting seems broken at the moment, will upload plots once that's working better.