-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Is your feature request related to a problem? Please describe.
Yes. The current pipeline relies on a static data/external/precipitation.csv file for weather information. This is a significant limitation because:
- The data is stale: It cannot provide the actual weather conditions for the specific time and location of each trip in the dataset.
- The features are limited: It only includes precipitation, while other critical factors like temperature, wind speed, humidity, and visibility, which heavily influence traffic, are missing.
This results in the model missing out on valuable predictive signals, limiting its accuracy.
Describe the solution you'd like
I propose replacing the static CSV file with a dynamic integration of a live weather API (e.g., OpenWeatherMap, which has a generous free tier).
The implementation would involve:
- Creating a new module (e.g.,
src/features/weather_api.py) to handle API calls. - This module will contain a function that accepts a latitude, longitude, and timestamp, and returns a rich set of weather features like
temperature,humidity,wind_speed,visibility, andweather_condition. - Integrating this function into the main feature engineering pipeline, so it is called for each trip to generate the new features.
- Managing the API key securely through the
config.pyfile or environment variables, with clear instructions in theREADME.mdfor new users.
Describe alternatives you've considered
An alternative would be to find a larger, more comprehensive historical weather dataset. However, this approach still relies on a static file, which may not have data for the exact times or locations needed and can become outdated. A live API integration is a far more robust, flexible, and scalable solution that brings the project closer to a real-world application.
Mockups
N/A. This is a backend and data pipeline enhancement, so no visual mockups are required.
Potential Impact
- Improved Model Accuracy: This change will almost certainly improve the predictive power of the models by providing them with more relevant and dynamic features, leading to a lower RMSE.
- More Robust Pipeline: The system will no longer be dependent on a single, static external file.
- Enhanced Real-World Viability: This feature is a key step in moving the project from a static data science experiment to a system capable of making real-time predictions.