Skip to content

Conversation

@dlindenbaum
Copy link

This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language.

Features:

  • New VLLMInferer class in solaris/nets/vllm_infer.py for multimodal inference
  • CLI command solaris_vllm_infer for easy command-line usage
  • Support for any vLLM-compatible vision models (LLaVA, Qwen-VL, etc.)
  • Batch processing and multi-GPU inference via tensor parallelism
  • Geospatial metadata preservation (projection, geotransform)
  • Configurable prompts and sampling parameters
  • JSON output format for structured results

New files:

  • solaris/nets/vllm_infer.py - Core vLLM inference module
  • solaris/bin/solaris_vllm_infer.py - CLI entry point
  • requirements-vllm.txt - vLLM-compatible dependencies (Python 3.8+)
  • environment-vllm.yml - Conda environment with vLLM support
  • VLLM_MULTIMODAL_GUIDE.md - Comprehensive usage guide
  • examples/vllm/ - Example configurations and scripts

Modified files:

  • setup.py - Added solaris_vllm_infer CLI entry point
  • requirements.txt - Added note about vLLM optional dependencies

Use cases:

  • Generate descriptions of overhead imagery
  • Land use classification with natural language
  • Building and infrastructure detection
  • Change detection and analysis
  • Visual question answering on satellite images

Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+ Use environment-vllm.yml or requirements-vllm.txt for installation.

🤖 Generated with Claude Code

Thank you for submitting your PR. Please read the template below, fill it out as appropriate, and make additional changes to your code as needed. Please feel free to submit your PR even if it doesn't satisfy all of the requirements below - simply prepend [WIP] to the PR title until it is ready for review by a maintainer. If you need assistance or review from a maintainer, add the label Status: Help Needed or Status: Review Needed respectively. After review, a maintainer will add the label Status: Revision Needed if further work is required for the PR to be merged.

Description

Please include a summary of the change and which issue is resolved. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected - these changes will not be merged until major releases!)

How Has This Been Tested?

Please describe tests that you added to the pytest codebase (if applicable).

Checklist:

  • My PR has a descriptive title
  • My code follows PEP8
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new errors
  • I have added tests that prove my fix is effective or that my feature works
  • My PR passes Travis CI tests
  • My PR does not reduce coverage in Codecov

If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.

This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language.

Features:
- New VLLMInferer class in solaris/nets/vllm_infer.py for multimodal inference
- CLI command solaris_vllm_infer for easy command-line usage
- Support for any vLLM-compatible vision models (LLaVA, Qwen-VL, etc.)
- Batch processing and multi-GPU inference via tensor parallelism
- Geospatial metadata preservation (projection, geotransform)
- Configurable prompts and sampling parameters
- JSON output format for structured results

New files:
- solaris/nets/vllm_infer.py - Core vLLM inference module
- solaris/bin/solaris_vllm_infer.py - CLI entry point
- requirements-vllm.txt - vLLM-compatible dependencies (Python 3.8+)
- environment-vllm.yml - Conda environment with vLLM support
- VLLM_MULTIMODAL_GUIDE.md - Comprehensive usage guide
- examples/vllm/ - Example configurations and scripts

Modified files:
- setup.py - Added solaris_vllm_infer CLI entry point
- requirements.txt - Added note about vLLM optional dependencies

Use cases:
- Generate descriptions of overhead imagery
- Land use classification with natural language
- Building and infrastructure detection
- Change detection and analysis
- Visual question answering on satellite images

Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+
Use environment-vllm.yml or requirements-vllm.txt for installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Major refactoring to use LiteLLM instead of vLLM, focusing on US-based
and NVIDIA vision-language models for multimodal satellite imagery analysis.

Key Changes:
- Replaced vLLM with LiteLLM for unified API access to multiple providers
- Primary focus on Google Gemini (US-based) and NVIDIA models
- Simplified deployment - no local GPU required for inference
- API-based approach for better scalability and maintenance

New Implementation:
- solaris/nets/litellm_infer.py - LiteLLM-based multimodal inference
  * Supports Google Gemini (1.5-flash, 1.5-pro, pro-vision)
  * Supports NVIDIA vision models via API catalog
  * Preserves geospatial metadata (projection, geotransform)
  * Base64 image encoding for API calls
  * Robust error handling and logging

Updated CLI:
- solaris/bin/solaris_vllm_infer.py - Updated to use LiteLLM
  * Help text shows Gemini and NVIDIA model examples
  * Environment variable support (GEMINI_API_KEY, NVIDIA_API_KEY)
  * Maintains backward compatibility with existing command name

Dependencies:
- requirements-litellm.txt - LiteLLM and Google Generative AI packages
- environment-litellm.yml - Conda environment for LiteLLM
- Lighter weight than vLLM (no local model hosting required)

Documentation:
- LITELLM_MULTIMODAL_GUIDE.md - Comprehensive guide for LiteLLM usage
  * Google Gemini setup instructions
  * API key configuration
  * Model comparison (Flash vs Pro)
  * Cost estimation and optimization tips
  * Security and privacy considerations

Example Configurations:
- examples/vllm/litellm_config_gemini.yml - Gemini Flash (fast, efficient)
- examples/vllm/litellm_config_gemini_pro.yml - Gemini Pro (most capable)
- examples/vllm/litellm_config_nvidia.yml - NVIDIA models template
- examples/vllm/example_litellm_inference.py - Python API examples

Removed Files:
- Old vLLM-specific files (vllm_infer.py, vllm configs, vllm guide)
- No longer needed with API-based approach

Benefits:
- US-based models for security and compliance
- No local GPU/CUDA requirements for inference
- Production-ready APIs from Google and NVIDIA
- Cost-effective (especially Gemini Flash)
- Easier deployment and scaling
- Regular model updates from providers

Supported Use Cases:
- Land use classification
- Building and infrastructure detection
- Environmental monitoring
- Change detection analysis
- General overhead imagery description

API Key Setup:
export GEMINI_API_KEY='your-key'  # Get from https://makersuite.google.com/app/apikey
export NVIDIA_API_KEY='your-key'  # Get from https://build.nvidia.com/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants