Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test of reading the data from the bucket located in Minio container failed #1408

Closed
Anna050689 opened this issue May 29, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@Anna050689
Copy link

Anna050689 commented May 29, 2023

Environment

Delta-rs version:
0.6.3

Binding:

Environment:

  • Cloud provider: Minio

Bug

What happened:
The test failed:

@pytest.fixture(autouse=True, scope="session")
def minio_server():
    docker_client = DockerClient.from_env()
    minio_container = docker_client.containers.run(
        "minio/minio",
        "server /data",
        detach=True,
        ports={"9000": 9000},
        environment=[
            f"MINIO_ACCESS_KEY={AWS_ACCESS_KEY_ID}",
            f"MINIO_SECRET_KEY={AWS_SECRET_ACCESS_KEY}",
            f"MINIO_REGION={MINIO_REGION}"
        ],
    )
    try:
        session = boto3.session.Session()
        s3_client = session.client(
            "s3",
            endpoint_url=ENDPOINT_URL,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            region_name=MINIO_REGION,
            config=Config(signature_version="s3v4", region_name=MINIO_REGION)
        )
        public_bucket_name = "public-bucket"
        s3_client.create_bucket(Bucket=public_bucket_name)


        public_bucket_policy = {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {"AWS": "*"},
                    "Action": ["s3:GetObject"],
                    "Resource": [f"arn:aws:s3:::{public_bucket_name}/*"],
                }
            ],
        }
        s3_client.put_bucket_policy(
            Bucket=public_bucket_name, Policy=json.dumps(public_bucket_policy)
        )
        path_to_fixtures = "./tests/unit/data_loaders/fixtures"
        upload_files_to_bucket(path_to_fixtures, public_bucket_name, s3_client)

        private_bucket_name = "private-bucket"
        s3_client.create_bucket(Bucket=private_bucket_name)
        upload_files_to_bucket(path_to_fixtures, private_bucket_name, s3_client)

def test_load_data_in_delta_format_located_in_public_bucket_in_aws_s3_with_valid_credentials(minio_server, rp_logger):
    rp_logger.info("Loading data in Delta format from the public bucket in AWS S3 with valid credentials")

    table = DeltaTable(
        "s3://public-bucket/delta_tables/table_with_data/",
        storage_options={
            "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID,
            "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY,
            "AWS_REGION": "eu-central-1",
            "AWS_ENDPOINT_URL": "http://localhost:9000"
            })
    df = table.to_pandas()

    assert assert_frame_equal(
        df, pd.DataFrame(
            {
                "gender": [0, 1, 0, 1],
                "height": [157.18518021548246, 166.7731072622863, 162.91821942384928, 173.51448996432848],
                "id": [925, 84, 821, 383]
            }
        )
    ) is None

The error - deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Generic S3 error: Error performing get request delta_tables/table_with_data/_delta_log/_last_checkpoint: response error "request error", after 0 retries: builder error for URL (http://localhost:9000/public-bucket/delta_tables/table_with_data/_delta_log/_last_checkpoint): URL scheme is not allowed

What you expected to happen:
The test should be passed

@Anna050689 Anna050689 added the bug Something isn't working label May 29, 2023
@roeap
Copy link
Collaborator

roeap commented May 29, 2023

Hi @Anna050689, thanks for reporting.

It's not well documented, but you have to explicitly allow for non-https urls. Try passing in "allow_hhtp": "true" as an additional option, and it should hoepfully work.

@roeap
Copy link
Collaborator

roeap commented May 29, 2023

Alo I recommend using a more recent release if at all possible. we are regularly squashing bugs at every iteration :).

@Anna050689
Copy link
Author

Thank you a lot for your help) It works

@roeap
Copy link
Collaborator

roeap commented May 31, 2023

Great, I'll close the issue then.

@roeap roeap closed this as completed May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants