Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'DataFrame' object has no attribute 'cloud_pct' ? #137

Open
yhouali opened this issue Jan 12, 2024 · 12 comments
Open

'DataFrame' object has no attribute 'cloud_pct' ? #137

yhouali opened this issue Jan 12, 2024 · 12 comments

Comments

@yhouali
Copy link

yhouali commented Jan 12, 2024

image

@ejm714
Copy link
Collaborator

ejm714 commented Jan 12, 2024

This is due to an unhandled error case where there is no valid satellite imagery for the provided point. Out of curiosity, did you run cyfi predict-point or cyfi predict?

This bug was fixed in a recent PR (#136) but a new release hasn't been cut yet. In the meantime, you can install the latest from github. A release including this fix will be coming shortly!

@yhouali
Copy link
Author

yhouali commented Jan 12, 2024

I tried both. predict-point for a single point and predict for a csv. I get same error each time!
I don't think that it is due to satellite imagery because I searched for a cloud-free sentinel-2 image before choosing the date and all my points are within the same image. Also, I don't understand why it downloads 22 images (for CSV) and 7 (for single point) while all the points are within the same tile and date ?!

@ejm714
Copy link
Collaborator

ejm714 commented Jan 14, 2024

Multiple images are downloaded because we use a 30 day look back period; images are downloaded for the full period and then the most recent image where the bounding box around the sample contains less than 5% cloud pixels is used.

Can you share the input csv and point (lat, lon, date) that you used so I can reproduce the error? If the image is indeed cloud free, it's possible that no water pixels are detected, since we filter to the water pixels in the bounding box.

@yhouali
Copy link
Author

yhouali commented Jan 15, 2024

Yes I can provide you the points once back to my office laptop. Meanwhile, you said that you are masking data to get only water, this raised a question for me, do estuaries masked also? because it can be ambiguous whether classify them as inland or sea water (as I understand, a land-sea mask is used, right?). Sorry for all this questions ! and Thank you for your reactivity!

@yhouali
Copy link
Author

yhouali commented Jan 15, 2024

Here is the points:
latitude,longitude,date
52.442396,5.164413,2023-09-07
52.580301,5.120468,2023-09-07
52.576963,5.396499,2023-09-07
52.647008,5.500870,2023-09-07
52.757685,5.366287,2023-09-07
52.813516,5.496226,2023-09-07
52.866785,5.179400,2023-09-07
52.994759,5.288819,2023-09-07

Also, for the code, I think it will be better to check first if the images at the given dates are cloudy or not and (if yes) then search for the closest date of a cloud free image (a cloud scoring could be done also). These are just suggestions based on your feedback. I don't know how the code works (as I didn't checked the source code yet) so forgive me if I got things wrong.

@ejm714
Copy link
Collaborator

ejm714 commented Jan 19, 2024

@yhouali I'm not able to reproduce your error. Using cyfi version 1.1.2, I saved those points in a csv called samples.csv and ran cyfi predict samples.csv

This is the corresponding output file

❯ cat preds.csv
sample_id,date,latitude,longitude,density_cells_per_ml,severity
8de38837a4e61dc62bc844b920a98b1c,2023-09-07,52.442396,5.164413,5934.0,low
c1d8469faf9cb46e63e57635be811601,2023-09-07,52.580301,5.120468,6618.0,low
c30a1f7c833107a7458fe9485fb59cfb,2023-09-07,52.576963,5.396499,8261.0,low
ecd1d5814c237bb75ccfd4cac81e8adf,2023-09-07,52.647008,5.50087,7230.0,low
3de64f676ebe0ecd0c5d66018c929e3d,2023-09-07,52.757685,5.366287,55418.0,moderate
e63f17533520707d1d6c54982f3beeed,2023-09-07,52.813516,5.496226,6759.0,low
96c056ebee3c1b89433fba0428be1686,2023-09-07,52.866785,5.1794,5951.0,low
229e458c4a1d82e294674e5c6008200f,2023-09-07,52.994759,5.288819,6046.0,low

@ejm714
Copy link
Collaborator

ejm714 commented Jan 19, 2024

To answer your other questions:

do estuaries masked also?

We use Sentinel 2's scene classification band to identify water pixels

I think it will be better to check first if the images at the given dates are cloudy or not and (if yes) then search for the closest date of a cloud free image (a cloud scoring could be done also)

Since users can specify a different max cloud threshold, we download the imagery and save to disk a numpy array of the bounding box around the point before cloud filtering is done. Because downloaded imagery is cached, it is then easy for a user to change the max cloud percent without having to re-download any imagery. You're correct that there is a design decision trade off in downloading all the imagery first.

@ejm714
Copy link
Collaborator

ejm714 commented Jan 19, 2024

I'm going to close this issue since I was able to generate predictions for your points without error using the latest available release. Feel free to re-open if you continue to encounter issues.

@ejm714 ejm714 closed this as completed Jan 19, 2024
@yhouali
Copy link
Author

yhouali commented Jan 24, 2024

Dear Emily, thank you very much for your reactivity, assistance and all the information and explanations ! The points you sent me are useful for now. I will try just after to solve the problem! Best regards,

@NickSievert
Copy link

@yhouali, Did you ever resolve this issue?

I'm encountering the same problem. I have cyfi v 1.1.3 installed on my machine (windows). I've tried using both my own points/dates as well as those provided in the quick start guide as examples but unfortunately receive the same error as was documented previously: "AttributeError: 'DataFrame' object has no attribute 'cloud_pct'".

I have also tried both 'cyfi predict-point' and 'cyfi predict' and get the same error.

The traceback does show that I'm able to successfully load the sample points, and successfully download the associated satellite imagery

Data input (from quickstart)
latitude,longitude,date
41.424144,-73.206937,2023-06-22
36.045,-79.0919415,2023-07-01
35.884524,-78.953997,2023-08-04

Traceback:
╭───────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────╮
│ C:\Users\Sieven######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\cli.py:166 in predict_point │
│ │
│ 163 │ samples.to_csv(samples_path, index=False) │
│ 164 │ │
│ 165 │ pipeline = CyFiPipeline.from_disk(DEFAULT_MODEL_PATH) │
│ ❱ 166 │ pipeline.run_prediction(samples_path, preds_path=None) │
│ 167 │ │
│ 168 │ # print out user-specified lat / lon │
│ 169 │ pipeline.output_df["latitude"] = [latitude] │
│ │
│ C:\Users\Sieven######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\pipeline.py:342 in run_prediction │
│ │
│ 339 │ │
│ 340 │ def run_prediction(self, predict_csv, preds_path=None, debug=False): │
│ 341 │ │ self._prep_predict_data(predict_csv, debug) │
│ ❱ 342 │ │ self._prepare_predict_features() │
│ 343 │ │ self._predict_model() │
│ 344 │ │ if preds_path is not None: │
│ 345 │ │ │ self._write_predictions(preds_path) │
│ │
│ C:\Users######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\pipeline.py:295 in _prepare_predict_features │
│ │
│ 292 │ │ self.predict_samples = samples │
│ 293 │ │
│ 294 │ def _prepare_predict_features(self): │
│ ❱ 295 │ │ self.predict_features = self._prepare_features(self.predict_samples, train_split │
│ 296 │ │
│ 297 │ def _predict_model(self): │
│ 298 │ │ preds = [] │
│ │
│ C:\Users######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\pipeline.py:121 in _prepare_features │
│ │
│ 118 │ │ logger.info(f"Satellite imagery saved to {self.cache_dir}") │
│ 119 │ │ │
│ 120 │ │ ## Generate features │
│ ❱ 121 │ │ selected_image_meta, features = generate_all_features( │
│ 122 │ │ │ samples, satellite_meta, self.features_config, self.cache_dir │
│ 123 │ │ ) │
│ 124 │
│ │
│ C:\Users######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\data\features.py:284 in generate_all_features │
│ │
│ 281 │ │ │ non-metadata feature are included in the features dataframe │
│ 282 │ """ │
│ 283 │ # Generate satellite features, only includes samples with imagery │
│ ❱ 284 │ satellite_features = calculate_satellite_features(satellite_meta, config, cache_dir) │
│ 285 │ │
│ 286 │ ct_with_satellite = satellite_features.index.nunique() │
│ 287 │ if ct_with_satellite == 0: │
│ │
│ C:\Users######\projects\hab\cyfi\venv\Lib\site-packages\cyfi\data\features.py:65 in calculate_satellite_features │
│ │
│ 62 │ # Drop rows where bounding box contained too many clouds │
│ 63 │ if config.max_cloud_percent is not None: │
│ 64 │ │ logger.info( │
│ ❱ 65 │ │ │ f"Dropping {(satellite_features.cloud_pct > config.max_cloud_percent).sum(): │
│ 66 │ │ ) │
│ 67 │ │ satellite_features = satellite_features[ │
│ 68 │ │ │ satellite_features.cloud_pct <= config.max_cloud_percent │
│ │
│ C:\Users######\projects\hab\cyfi\venv\Lib\site-packages\pandas\core\generic.py:6299 in getattr
│ │
│ 6296 │ │ │ and self._info_axis._can_hold_identifiers_and_holds_name(name) │
│ 6297 │ │ ): │
│ 6298 │ │ │ return self[name] │
│ ❱ 6299 │ │ return object.getattribute(self, name) │
│ 6300 │ │
│ 6301 │ @Final
│ 6302 │ def setattr(self, name: str, value) -> None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'DataFrame' object has no attribute 'cloud_pct'

@klwetstone
Copy link
Collaborator

@NickSievert Can you share a few more details of your setup so we can reproduce it? We have not yet been able to with cyfi 1.3.3 and python 3.10 or 3.12.

Which version of python are you using? How did you install cyfi (pip vs. conda-forge)?

@ejm714 ejm714 reopened this Oct 17, 2024
@NickSievert
Copy link

Thank you so much for your prompt response. The issue appears to be a network security issue on my end rather than a problem with your package. I will follow-up to confirm resolution via a network security solution for others reference but I believe this issue can be closed.

I am using python v 3.12
I installed cyfi via pip

When attempting to download the satellite imagery via a manual script I get the following error:
"rasterio._err.CPLE_HttpResponseError: CURL error: schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN"

I will work with my IT program to address this network issue.

I overlooked the satellite download initially because although it indicated that errors were raised in downloading it concluded with the message "2024-10-17 14:49:41.888 | SUCCESS | cyfi.pipeline:_prepare_features:117 - Downloaded satellite imagery"

Debugging satellite download traceback:
PS C:\Users#########> cyfi predict-point --lat 41.2 --lon -73.2 --date 2023-09-14
2024-10-17 14:49:29.690 | SUCCESS | cyfi.pipeline:_prep_predict_data:288 - Loaded 1 sample points (unique combinations of date, latitude, and longitude) for prediction
2024-10-17 14:49:34.917 | PROGRESS | cyfi.data.satellite_data:download_satellite_data:438 - Downloading satellite imagery for 7 Sentinel-2 items.
0%| | 0/7 [00:00<?, ?it/s]2024-10-17 14:49:40.561 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2B_MSIL2A_20230914T153909_R011_T18TXL_20230915T012705
2024-10-17 14:49:40.600 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2A_MSIL2A_20230909T153821_R011_T18TXL_20230910T002731
2024-10-17 14:49:40.609 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2B_MSIL2A_20230815T153819_R011_T18TXL_20230815T233300
2024-10-17 14:49:40.734 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2B_MSIL2A_20230825T153819_R011_T18TXL_20230825T221503
2024-10-17 14:49:40.734 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2A_MSIL2A_20230830T153821_R011_T18TXL_20230831T022152
2024-10-17 14:49:40.741 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2B_MSIL2A_20230904T153819_R011_T18TXL_20230904T223211
2024-10-17 14:49:41.509 | DEBUG | cyfi.data.satellite_data:download_row:414 - rasterio.errors.RasterioIOError raised for sample ID c0847b54bba6a81b25c3b12ea8bee5e3, Sentinel-2 item ID S2A_MSIL2A_20230820T153821_R011_T18TXL_20230821T000845
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.08it/s]
2024-10-17 14:49:41.888 | SUCCESS | cyfi.pipeline:_prepare_features:117 - Downloaded satellite imagery

Traceback manual sentinel 2 tile download:
import rasterio
from rasterio import windows
from rasterio import features
from rasterio import warp

import numpy as np
from PIL import Image

with rasterio.open(asset_href) as ds:
aoi_bounds = features.bounds(area_of_interest)
warped_aoi_bounds = warp.transform_bounds("epsg:4326", ds.crs, *aoi_bounds)
aoi_window = windows.from_bounds(transform=ds.transform, *warped_aoi_bounds)
band_data = ds.read(window=aoi_window)

Traceback (most recent call last):
File "rasterio\_base.pyx", line 310, in rasterio._base.DatasetBase.init
File "rasterio\_base.pyx", line 221, in rasterio._base.open_dataset
File "rasterio\_err.pyx", line 359, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_HttpResponseError: CURL error: schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants