Replies: 6 comments
-
Hi @alronlam |
Beta Was this translation helpful? Give feedback.
-
Hi @alronlam, the ookla dataset 'indonesia-ookla-2020-q1-fixed.csv' is missing as well. |
Beta Was this translation helpful? Give feedback.
-
Oh hi @butchtm , the link to the GDrive folder for these files are in the top-most part of the notebook. |
Beta Was this translation helpful? Give feedback.
-
Additional detail: I think I also ran into this RAM issue when aligning with the raw Ookla dataset: https://registry.opendata.aws/speedtest-global-performance/ I tried utilizing the latest fixed line data from Ookla:
My workaround was to utilize an older, filtered version of the data that was for Indonesia only (because this raw data was for the whole world). So I guess one principle here is that we should always filter the feature datasets as much as we can before aligning to the AOIs to avoid such issues. But in the example of HRSL, this data is already for Indonesia alone. Not sure what else we can do to make it work for such big datasets (some kind of parallel processing?). Or in these cases, are we forced to use other tools like BQ? |
Beta Was this translation helpful? Give feedback.
-
hi @alronlam, I'm trying to see if I can just convert the HRSL data (1.8GB csv file) to a geojson file and load it as such, but even that is already crashing Colab. Colab might not be ideal for working with production sized datasets but for learning/exploring the modules. |
Beta Was this translation helpful? Give feedback.
-
low prio; converting this into a discussion |
Beta Was this translation helpful? Give feedback.
-
Colab notebook for testing:
https://colab.research.google.com/drive/147HWUgaBztsZuBPrI_HTckBrz_vl9l1l#scrollTo=wvLenjgDUgod
Scenario
Error
![gw_vzs_hrsl](https://user-images.githubusercontent.com/1049495/177068371-6908fbc4-7061-440f-8a25-4c99e45696ec.PNG)
Colab crashes due to exceeding the RAM limit.
Just creating this issue to check if there are straightforward ways to optimize. Otherwise, are there workarounds for handling such vector datasets that are relatively large?
Beta Was this translation helpful? Give feedback.
All reactions