Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File system access fails on read-only file systems #198

Open
2 tasks done
theory opened this issue Jan 27, 2025 · 8 comments · Fixed by #200
Open
2 tasks done

File system access fails on read-only file systems #198

theory opened this issue Jan 27, 2025 · 8 comments · Fixed by #200
Assignees
Labels
bug Something isn't working

Comments

@theory
Copy link

theory commented Jan 27, 2025

What happens?

I attempted to create a foreign table following the Tembo tutorial, using v0.3.0, and was surprised by this error:

postgres=# CREATE FOREIGN TABLE trips ()
  SERVER parquet_server
  OPTIONS (files 's3://tembo-demo-bucket/yellow_tripdata_2024-01.parquet');
ERROR:  Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'httpfs':
Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

Tembo runs its containers with read-only file systems. The issue does not occur with v0.2.3. I wonder if it'd be possible to either:

  1. Prevent DuckDB from writing to the file system; or
  2. Point it at some other directory, for a persistent volume, for example, perhaps via a GUC

To Reproduce

OS:

Linux

ParadeDB Version:

pg_anaytics 0.3.0

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB pg_analytics Extension

Full Name:

David Wheeler

Affiliation:

Tembo

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • Yes, I have
@theory theory added the bug Something isn't working label Jan 27, 2025
theory added a commit to tembo-io/tembo that referenced this issue Jan 27, 2025
v0.3.0 tries to create files on the file system, which fails in our
read-only containers. v0.2.3 does not appear to exhibit the same
pattern. Follow paradedb/pg_analytics#198 for potential fixes.
theory added a commit to tembo-io/tembo that referenced this issue Jan 27, 2025
v0.3.0 tries to create files on the file system, which fails in our
read-only containers. v0.2.3 does not appear to exhibit the same
pattern. Follow paradedb/pg_analytics#198 for potential fixes.
@theory
Copy link
Author

theory commented Jan 27, 2025

For anyone who runs into permission issues rather than a read-only file system, you can grant the appropriate permissions like so:

RUN mkdir .duckdb/ && chmod -R a+rwX .duckdb/ && \
    mkdir /var/lib/postgresql/.duckdb/ && \
    chmod -R a+rwX /var/lib/postgresql/.duckdb/

@philippemnoel
Copy link
Collaborator

Thank you for reporting. Issue found, we'll push v0.3.1 tomorrow (Friday Jan 31).

@wasd171
Copy link

wasd171 commented Jan 31, 2025

@philippemnoel Thanks for pushing the v0.3.1 release! However, I am not sure whether it has actually fixed the issue:

  1. I've installed pg_analytics@0.3.1 from https://github.com/paradedb/pg_analytics/releases/download/v0.3.1/postgresql-17-pg-analytics_0.3.1-1PARADEDB-bookworm_arm64.deb with ghcr.io/cloudnative-pg/postgresql:17.2-33-bookworm as a base image
  2. I am trying to run select * from parquet_describe('s3://**********/file.parquet');
  3. I get: Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'httpfs': Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

I've also tried to CREATE FOREIGN TABLE ..., this gives a similar error Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'aws': Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

local=> \dx
                                            List of installed extensions
        Name        | Version |   Schema   |                              Description
--------------------+---------+------------+------------------------------------------------------------------------
 pg_analytics       | 0.3.1   | public     | pg_analytics: Postgres for analytics, powered by DuckDB
 pg_search          | 0.14.1  | paradedb   | pg_search: Full text search for PostgreSQL using BM25
 pg_stat_statements | 1.11    | public     | track planning and execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector             | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods

@philippemnoel philippemnoel reopened this Jan 31, 2025
@philippemnoel
Copy link
Collaborator

That's surprising. I will investigate further.

In the meantime, you can get around this issue by doing:

SET extension_directory = '/path/to/your/extension/directory';

using this syntax https://docs.paradedb.com/integrations/configuration/settings

and setting extension_directory to the PGDATA directory

@philippemnoel
Copy link
Collaborator

@philippemnoel Thanks for pushing the v0.3.1 release! However, I am not sure whether it has actually fixed the issue:

  1. I've installed pg_analytics@0.3.1 from https://github.com/paradedb/pg_analytics/releases/download/v0.3.1/postgresql-17-pg-analytics_0.3.1-1PARADEDB-bookworm_arm64.deb with ghcr.io/cloudnative-pg/postgresql:17.2-33-bookworm as a base image
  2. I am trying to run select * from parquet_describe('s3://**********/file.parquet');
  3. I get: Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'httpfs': Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

I've also tried to CREATE FOREIGN TABLE ..., this gives a similar error Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'aws': Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

local=> \dx
                                            List of installed extensions
        Name        | Version |   Schema   |                              Description
--------------------+---------+------------+------------------------------------------------------------------------
 pg_analytics       | 0.3.1   | public     | pg_analytics: Postgres for analytics, powered by DuckDB
 pg_search          | 0.14.1  | paradedb   | pg_search: Full text search for PostgreSQL using BM25
 pg_stat_statements | 1.11    | public     | track planning and execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector             | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods

Hi Konstantin. Could you please share a full reproduction step-by-step with every command executed that I can follow? We have integrated pg_analytics v0.3.1 within our testing framework for our deployment platform and are not facing this issue anymore.

@wasd171
Copy link

wasd171 commented Jan 31, 2025

Hi Konstantin. Could you please share a full reproduction step-by-step with every command executed that I can follow? We have integrated pg_analytics v0.3.1 within our testing framework for our deployment platform and are not facing this issue anymore.

Sure! Here is my Dockerfile:

# See https://cloudnative-pg.io/blog/creating-container-images/ for building the image
# See https://github.com/cloudnative-pg/postgres-containers/pkgs/container/postgresql for the available versions
ARG CNPG_VERSION="17.2-33-bookworm"
FROM "ghcr.io/cloudnative-pg/postgresql:${CNPG_VERSION}"

ARG POSTGRESQL_VERSION="17"
ARG TARGETARCH

# To install any package we need to be root
USER root

# We do this to avoid installing curl into the image
ARG DEBIAN_VERSION_CODENAME="bookworm"
RUN <<EOF
set -xe
VERSION_CODENAME=$(cat /etc/*release | grep VERSION_CODENAME | awk -F"=" '{print $2}')
if [ ! "${DEBIAN_VERSION_CODENAME}" = "${VERSION_CODENAME}" ]; then
  echo "This container uses ${VERSION_CODENAME}, not ${DEBIAN_VERSION_CODENAME}"
  exit 1
fi
EOF

ARG PG_SEARCH_VERSION="0.14.1"
ARG PG_ANALYTICS_VERSION="0.3.1"
ADD "https://github.com/paradedb/paradedb/releases/download/v${PG_SEARCH_VERSION}/postgresql-${POSTGRESQL_VERSION}-pg-search_${PG_SEARCH_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb" /tmp/
ADD "https://github.com/paradedb/pg_analytics/releases/download/v${PG_ANALYTICS_VERSION}/postgresql-${POSTGRESQL_VERSION}-pg-analytics_${PG_ANALYTICS_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb" /tmp/

# See https://docs.paradedb.com/deploy/third-party-extensions
# We install a subset: pg_search, pg_analytics, pgvector
RUN <<EOF
set -xe
apt-get update
apt-get install --yes --no-install-recommends \
	/tmp/postgresql-${POSTGRESQL_VERSION}-pg-search_${PG_SEARCH_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb \
	/tmp/postgresql-${POSTGRESQL_VERSION}-pg-analytics_${PG_ANALYTICS_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb \
	"postgresql-${POSTGRESQL_VERSION}-pgvector"
rm -rf /tmp/*
rm -rf /var/lib/apt/lists/*
EOF

# Change to the uid of postgres (26)
USER 26

Then I launch this container via CNPG operator and execute the following commands as a root:

CREATE DATABASE test TEMPLATE template0;
\c test;
CREATE EXTENSION pg_analytics;
CREATE FOREIGN DATA WRAPPER parquet_wrapper HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator;
CREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper;
CREATE USER MAPPING FOR public
SERVER parquet_server OPTIONS (
  type 'S3',
  provider 'CREDENTIAL_CHAIN',
  CHAIN 'env',
  ENDPOINT 'eu-central-2'
);

SELECT * FROM parquet_describe('s3://###/#.parquet');

And this gives back the

ERROR:  Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'httpfs':
Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

After running your suggestion everything seems to work

SELECT duckdb_execute($$SET extension_directory = '/var/lib/postgresql/data/pgdata/.duckdb'$$);
SELECT * FROM parquet_describe('s3://###/#.parquet');
   column_name   | column_type | null | key | default | extra
-----------------+-------------+------+-----+---------+-------
 title           | VARCHAR     | YES  |     |         |
 link            | VARCHAR     | YES  |     |         |
 activity_status | VARCHAR     | YES  |     |         |
 profile_id      | VARCHAR     | YES  |     |         |
 id              | BIGINT      | YES  |     |         |

@philippemnoel
Copy link
Collaborator

Hi Konstantin. Could you please share a full reproduction step-by-step with every command executed that I can follow? We have integrated pg_analytics v0.3.1 within our testing framework for our deployment platform and are not facing this issue anymore.

Sure! Here is my Dockerfile:

See https://cloudnative-pg.io/blog/creating-container-images/ for building the image

See https://github.com/cloudnative-pg/postgres-containers/pkgs/container/postgresql for the available versions

ARG CNPG_VERSION="17.2-33-bookworm"
FROM "ghcr.io/cloudnative-pg/postgresql:${CNPG_VERSION}"

ARG POSTGRESQL_VERSION="17"
ARG TARGETARCH

To install any package we need to be root

USER root

We do this to avoid installing curl into the image

ARG DEBIAN_VERSION_CODENAME="bookworm"
RUN <<EOF
set -xe
VERSION_CODENAME=$(cat /etc/*release | grep VERSION_CODENAME | awk -F"=" '{print $2}')
if [ ! "${DEBIAN_VERSION_CODENAME}" = "${VERSION_CODENAME}" ]; then
echo "This container uses ${VERSION_CODENAME}, not ${DEBIAN_VERSION_CODENAME}"
exit 1
fi
EOF

ARG PG_SEARCH_VERSION="0.14.1"
ARG PG_ANALYTICS_VERSION="0.3.1"
ADD "https://github.com/paradedb/paradedb/releases/download/v${PG_SEARCH_VERSION}/postgresql-${POSTGRESQL_VERSION}-pg-search_${PG_SEARCH_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb" /tmp/
ADD "https://github.com/paradedb/pg_analytics/releases/download/v${PG_ANALYTICS_VERSION}/postgresql-${POSTGRESQL_VERSION}-pg-analytics_${PG_ANALYTICS_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb" /tmp/

See https://docs.paradedb.com/deploy/third-party-extensions

We install a subset: pg_search, pg_analytics, pgvector

RUN <<EOF
set -xe
apt-get update
apt-get install --yes --no-install-recommends
/tmp/postgresql-${POSTGRESQL_VERSION}-pg-search_${PG_SEARCH_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}${TARGETARCH}.deb
/tmp/postgresql-${POSTGRESQL_VERSION}-pg-analytics
${PG_ANALYTICS_VERSION}-1PARADEDB-${DEBIAN_VERSION_CODENAME}_${TARGETARCH}.deb
"postgresql-${POSTGRESQL_VERSION}-pgvector"
rm -rf /tmp/*
rm -rf /var/lib/apt/lists/*
EOF

Change to the uid of postgres (26)

USER 26
Then I launch this container via CNPG operator and execute the following commands as a root:

CREATE DATABASE test TEMPLATE template0;
\c test;
CREATE EXTENSION pg_analytics;
CREATE FOREIGN DATA WRAPPER parquet_wrapper HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator;
CREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper;
CREATE USER MAPPING FOR public
SERVER parquet_server OPTIONS (
  type 'S3',
  provider 'CREDENTIAL_CHAIN',
  CHAIN 'env',
  ENDPOINT 'eu-central-2'
);

SELECT * FROM parquet_describe('s3://###/#.parquet');

And this gives back the

ERROR:  Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'httpfs':
Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

After running your suggestion everything seems to work

SELECT duckdb_execute($$SET extension_directory = '/var/lib/postgresql/data/pgdata/.duckdb'$$);
SELECT * FROM parquet_describe('s3://###/#.parquet');
   column_name   | column_type | null | key | default | extra
-----------------+-------------+------+-----+---------+-------
 title           | VARCHAR     | YES  |     |         |
 link            | VARCHAR     | YES  |     |         |
 activity_status | VARCHAR     | YES  |     |         |
 profile_id      | VARCHAR     | YES  |     |         |
 id              | BIGINT      | YES  |     |         |

You did not create a FOREIGN TABLE, as per our tutorial: https://docs.paradedb.com/integrations/overview. Is there a specific reason behind that?

@wasd171
Copy link

wasd171 commented Jan 31, 2025

You did not create a FOREIGN TABLE, as per our tutorial: https://docs.paradedb.com/integrations/overview. Is there a specific reason behind that?

Yes, to provide the minimal example. Using CREATE FOREIGN TABLE does not seem to change much

CREATE FOREIGN TABLE trips ()
SERVER parquet_server
OPTIONS (files 's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet');
ERROR:  Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'aws':
Failed to create directory "/var/lib/postgresql/.duckdb/": Read-only file system

Setting extension_directory manually seems to semi-fix it

SELECT duckdb_execute($$SET extension_directory = '/var/lib/postgresql/data/pgdata/.duckdb'$$);
CREATE FOREIGN TABLE trips ()
SERVER parquet_server
OPTIONS (files 's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet');
ERROR:  HTTP Error: HTTP GET error on 'https://paradedb-benchmarks.s3.amazonaws.com/yellow_tripdata_2024-01.parquet' (HTTP 400)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants