With the climate rapidly changing and humans having increasing impact on global ecosystems, today it is not longer sufficient to classify and analyse ecosystems based on their geographical location and historical data. Here, we classify ecosystems based on their wider geoecological parameters. The goal is to create a classifier that allows us to dynamically classify ecosystems. Perspectively, used to predict i.e. expected species richness, species extinction rates, soil status, soil detioration and eventually ecosystem collapse.
Minimal viable product (MVP): On EcoVerse we show the development of ecosystems over time.
Goal: Showing a trend of ecosystem development and migration over time
Validation: Highlighting renaturation efforts of the past decade such as Great Green Wall Initiative, Chinese Loess Plateau Rehabilitation and the Netherlands’ Marker Wadden
This project is designed and pursued in the context of the Data Analytics Consulting Bootcamp 2024-2 from neuefische GmbH. During which a capstone project is to be designed for the timeframe of four weeks, including setup, data retrieval, analysis and integration.
Collaborators:
Member | Role |
---|---|
Heiko Främbs | Communications Project management |
Soma Pasumarthy | Web integration Database maintainance |
Alexander Schmidt | Data acquisition Data handling |
Noah Kürtös | Conceptualization Scripts |
About the repository: Collabotors ought to work in independent branches and merge into main need to be approved by at least one other collabotor. For team communications this miro board is used.
Name | Description | Content | Data URL | Notes |
---|---|---|---|---|
EarthData | Library of accumulated satellite data | Radiation based climate data (detailed here ) | GLDAS Noah Land Surface Model L4 monthly 0.25 x 0.25 degree V2.1 (GLDAS_NOAH025_M) | From 2000-2024 |
GLDAS Noah Land Surface Model L4 monthly 0.25 x 0.25 degree V2.0 (GLDAS_NOAH025_M) | From 1948-2014 | |||
Elevation data relative | ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer | From 2009 as .geotif |
||
Night time illumination data | VNP46A1 - VIIRS/NPP Daily Gridded Day Night Band 500m Linear Lat Lon Grid Night | From 2012 onward as .geotif |
||
Vegetation index (NDVI) | MOD13A2.061 Terra Vegetation Indices 16-Day Global 1km | Earth engine snippets as .geotif |
||
Copernicus | Sentinel mission satellite data. 90m global resolution (TanDEM-X mission) | Elevation data absolute | Copernicus DEM - Global and European Digital Elevation Model | Mission: from 2011-2015. Data availble: from 2019-2026 Earth engine snippets as .geotif |
DEPRECIATED Meteostat |
Library of global weatherstations | Weather data | Data is sourced through the python library meteostat (documentation) |
Representation bias towards EU and NA |
DEPRECIATED Open elevation |
Free API alterntavie to Google | Elevation data absolute | Data is sourced through the API open_elevation_request.py | Open elevation data doesn't feature data from lat > 60° |
NASA's EarthData features a wide range of global satellite data which can be used to infer climate paramters. These parameters are already widely used for local studies featured in their "News" section. Moreover, there is a "Global Ecosystem Viewer" from the United states geological survey for global ecosystems (27.10.2016).
Hence, the focus of this project was to integrate and automate the classification of ecosystems using dynamically updated satellite data.
All data is acquired through EarthData or through the GoogleEarth engine. The data is stored on an AWS postgreSQL server. Currently migrating.
1.) VIIRS night time illumination
2.) MODIS vegetation index
3.) Copernicus elevation data
For modelling all parameters were aggregated for each grid cell and a model was trained on 73 different locations of 15 different ecosystems using RandomForestClassifier
from sklearn
.
label_encoder = LabelEncoder()
scaler = StandardScaler()
#define model data
X = df.drop(columns=["ecosystem", "name"])
y = label_encoder.fit_transform(df["ecosystem"])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#split data
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
#modelling
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
For forecasting LinearRegression
from sklearn
was performed for each parameter per grid cell individually.
# Prepare data for the model
X, y = np.array(df_pixel["year"]).reshape(-1, 1), np.array(df_pixel[parameter])
# Check if there is enough data to fit the model
if len(X) > 1:
# Fit the model and calculate R-squared
model.fit(X, y)
r_squared = model.score(X, y)
predict_val = model.predict(np.array([[predict_year]]))[0] # Extract single value