-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Docker ETL with local inputs/outputs #1606
Comments
Should we actually use Tox (and yet another layer of virtual Python environments) or should we have a separate script (that would duplicate a lot of what Tox is doing and thus be at risk getting out of sync)? |
I've got it running and able to do the equivalent of |
A good overview of docker logging. They outline several available logging strategies. It seems like using a docker logging driver is probably the right option for us, and there is a dedicated Google Cloud Logging driver In theory the docker logs pudl_etl either when the tests are running or after they've completed. I wonder if this might be affected by the fact that tox/pytest are sitting between the process and the logging? Maybe I should try running |
In order to get some direct logging output (not going through tox/pytest) and also to test whether the ETL scripts can write to the mounted PUDL_OUT directory I decided to run the ETL scripts rather than the tests in the For some reason the scripts run fine when they're writing to a directory inside the container, but when an external directory is mounted they have trouble writing there. SQLAlchemy complains that it can't open the ferc1.sqlite database. However, log files do get written into the PUDL_OUT directory, and the scripts can successfully create the
|
Logs are still being buffered and not getting output until the container stops. I've tried:
And I still get the same behavior no matter what: no logs are output until the container shuts down. I'm tailing the logs using
It turns out that services:
pudl-etl:
environment:
- PYTHONUNBUFFERED=1
- API_KEY_EIA
image: catalystcoop/pudl-etl:hello-docker
container_name: pudl_etl
logging:
driver: local
command: bash -c "for i in `ls`; do echo $i; sleep 1; done" However, if I set The So it seems like:
|
A 2-year long thread about possibly related environment variable issues: docker/compose#7423 Note: I'm using |
However, if I run
|
I think your bash for loop and
I was able to remove the variable warning by adding an additional $ sign. |
Ahhh, well. I guess that's one less thing to be confused by! |
I think conda is the culprit for holding back the logs. When I run:
in the container, it outputs the logs in real time. When I run the same code in the conda env the logs get held back:
|
Adding |
How did none of my searching find this. |
Okay, the problem with not being able to write to |
Given a Docker container with our CI environment (#1605):
PUDL_IN
andPUDL_OUT
. Maybe withdocker compose
?tox -e ci
while reading & writing data on the local volume.tox -e nuke
: all CI, full ETL, and data validation, reading & writing data on the local volume.The text was updated successfully, but these errors were encountered: