Skip to content

Commit

Permalink
fix: restructure dataflow code and container setup
Browse files Browse the repository at this point in the history
- Move functions to module level for worker access
- Update imports and function structure
- Fix Airflow DAG to use GCS path for script
- Update Dockerfile.processing to properly package code
  • Loading branch information
martincollignon committed Dec 1, 2024
1 parent f719cef commit 167d5e0
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 6 deletions.
12 changes: 6 additions & 6 deletions backend/Dockerfile.processing
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ RUN apt-get update && apt-get install -y \
ENV CPLUS_INCLUDE_PATH=/usr/include/gdal
ENV C_INCLUDE_PATH=/usr/include/gdal

# Install Apache Beam with GCP support first (as recommended in the docs)
RUN pip install 'apache-beam[gcp]==2.60.0'

# Copy and install additional requirements
# Copy requirements first for better caching
COPY requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

# Copy Dataflow scripts
COPY dataflow/validate_geometries.py /app/validate_geometries.py
# Copy the entire package structure
COPY dataflow /app/dataflow
COPY setup.py /app/setup.py

# Install the package in development mode
WORKDIR /app
RUN pip install -e .
2 changes: 2 additions & 0 deletions backend/dataflow/validate_geometries.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import logging
import geopandas as gpd
import pandas as pd

class ValidateGeometriesOptions(PipelineOptions):
@classmethod
Expand Down

0 comments on commit 167d5e0

Please sign in to comment.