Skip to content

[FEATURE]: Enhance Weather Data with a Live API Integration #47

@AyushAnand413

Description

@AyushAnand413

Is your feature request related to a problem? Please describe.

Yes. The current pipeline relies on a static data/external/precipitation.csv file for weather information. This is a significant limitation because:

  1. The data is stale: It cannot provide the actual weather conditions for the specific time and location of each trip in the dataset.
  2. The features are limited: It only includes precipitation, while other critical factors like temperature, wind speed, humidity, and visibility, which heavily influence traffic, are missing.

This results in the model missing out on valuable predictive signals, limiting its accuracy.

Describe the solution you'd like

I propose replacing the static CSV file with a dynamic integration of a live weather API (e.g., OpenWeatherMap, which has a generous free tier).

The implementation would involve:

  1. Creating a new module (e.g., src/features/weather_api.py) to handle API calls.
  2. This module will contain a function that accepts a latitude, longitude, and timestamp, and returns a rich set of weather features like temperature, humidity, wind_speed, visibility, and weather_condition.
  3. Integrating this function into the main feature engineering pipeline, so it is called for each trip to generate the new features.
  4. Managing the API key securely through the config.py file or environment variables, with clear instructions in the README.md for new users.

Describe alternatives you've considered

An alternative would be to find a larger, more comprehensive historical weather dataset. However, this approach still relies on a static file, which may not have data for the exact times or locations needed and can become outdated. A live API integration is a far more robust, flexible, and scalable solution that brings the project closer to a real-world application.

Mockups

N/A. This is a backend and data pipeline enhancement, so no visual mockups are required.

Potential Impact

  • Improved Model Accuracy: This change will almost certainly improve the predictive power of the models by providing them with more relevant and dynamic features, leading to a lower RMSE.
  • More Robust Pipeline: The system will no longer be dependent on a single, static external file.
  • Enhanced Real-World Viability: This feature is a key step in moving the project from a static data science experiment to a system capable of making real-time predictions.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions