Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSFUSE OOM when running in cloudrun #2908

Open
loveeklund-osttra opened this issue Jan 17, 2025 · 0 comments
Open

GCSFUSE OOM when running in cloudrun #2908

loveeklund-osttra opened this issue Jan 17, 2025 · 0 comments
Labels
p2 P2 question Customer Issue: question about how to use tool

Comments

@loveeklund-osttra
Copy link

Describe the issue
I'm running GCSfuse in cloudrun by installing and starting it myself in my docker image. I originally tried mounting a bucket using the "managed GCSfuse", but experienced the same issues, so for better debugging and being able to run locally I just run it myself in the container.
I can run the same docker image locally with same settings ( memory and cpu) and it works as expected, meaning it writes file to cloudstorage and when a file is written there is no memory impact of that file in my container. But when I run in cloudrun it gets OOM after a while. How many files/total bytes it can write before it goes OOM differs between runs. Sometimes it can write out 2gb worth of data on a cloudrun with 512 mb of memory, which makes me be believe it partially works. But other times it writes out 300 mb and then dies with OOM issue.

When looking at the logs I see the following ( see full logs in attached )
DEFAULT 2025-01-17T15:54:00.600122Z {"seconds":1737129240,"nanos":597263725},"severity":"TRACE","message":"fuse_debug: Op 0x00000306 connection.go:420] <- FlushFile (inode 5, PID 29)"} DEFAULT 2025-01-17T15:54:02.076610Z {"seconds":1737129242,"nanos":74016326},"severity":"TRACE","message":"fuse_debug: Op 0x00000306 connection.go:513] -> OK ()"} DEFAULT 2025-01-17T15:54:02.077044Z {"seconds":1737129242,"nanos":74182085},"severity":"TRACE","message":"fuse_debug: Op 0x00000308 connection.go:420] <- ReleaseFileHandle (PID 0, handle 2)"}
So I assume file is being flushed and released as it should.

System & Version (please complete the following information):

  • OS: debian:10 ( from python:3.11-buster)
  • Platform GCP cloudrun
  • Version 2.7.0

Steps to reproduce the behavior with following information:
Mountcommand :
gcsfuse --config-file=/usr/src/gcsfuse_config.yaml $GCS_BUCKET_NAME /pipeline_data
config_file:

logging:
  severity: trace
implicit-dirs: true
gcs-connection:
  max-conns-per-host: 1
  max-idle-conns-per-host: 1
  sequential-read-size-mb: 20
gcs-retries:
  max-retry-sleep: 0

See attached for logs,
I have removed some data from the logs I don't seem to be able to copy paste all from the console and I don't have export logs on.
gcs_fuse_logs.txt

Additional context
My script is in python and it generates batches of jsonl data and writes it to files.

I've tried running it both using GCS fuse and code that looks like bellow that uses the python cloud storage lib, the code bellow works fine. So I don't think it necessary is a cloudrun issue, but probably some cloudrun gcsfuse in combination problem

storage_client = storage.Client(project_id)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f"{filename[1:]}")  # Full path on GCS
with blob.open("w") as f:  # Use blob.open() to write directly to GCS
    logging.info(f"Writing to GCS: gs://{bucket_name}{filename}")
    row_num = 0
    for row in row_generator:
        json.dump(row, f)
        f.write("\n")
        row_num += 1
        if row_num >= batch_size:
            logging.info("Finished writing batch to GCS.  Sleeping...")
            time.sleep(float(
                sleep_time)) if sleep_time else None  # handle missing sleep_time variable
            break

Other notes:
I have tried with multiple different file sizes ranging from 10 mb to 200 mb none seem to work
I've tried setting env variables, without seeing any changes
GOMEMLIMIT=100MiB
GOGC=70
I've tried with different memory ( ranging from 512mb to 2gb) and cpu amounts ( 1 -4 cpus ) but had the same issues.

Please let me know if you need anything else or have any further questions.

@loveeklund-osttra loveeklund-osttra added p2 P2 question Customer Issue: question about how to use tool labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2 P2 question Customer Issue: question about how to use tool
Projects
None yet
Development

No branches or pull requests

1 participant