Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Connector v2] Sink to iceberg MinIO; but getting org.apache.iceberg.aws.s3.S3FileIO does not implement FileIO #8585

Open
3 tasks done
larryloi opened this issue Jan 23, 2025 · 2 comments
Labels

Comments

@larryloi
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Iceberg sink Connector got error to spark-iceberg

Cannot initialize FileIO, org.apache.iceberg.aws.s3.S3FileIO does not implement FileIO

=== With below spark iceberg environment

spark iceberg environment from https://github.com/tabular-io/docker-spark-iceberg

ENV SPARK_VERSION=3.5.2
ENV SPARK_MAJOR_VERSION=3.5
ENV ICEBERG_VERSION=1.6.0

seatunnel environment build with below docker compose
image: apache/seatunnel:2.3.9
container_name: seatunnel-master
environment:
- TZ=Asia/Macau
- ST_DOCKER_MEMBER_LIST=172.19.0.11,172.19.0.12,172.19.0.13
entrypoint: >
/bin/sh -c "
/opt/seatunnel/bin/seatunnel-cluster.sh -r master -DJvmOption="-Xms2G -Xmx2G"
"
ports:
- "5701:5701"
volumes:
- ./config:/opt/seatunnel/config
networks:
integration:
ipv4_address: 172.19.0.11

(2 worker nodes are the same)

SeaTunnel Version

tried 2.3.8 and 2.3.9 docker images

SeaTunnel Config

env {
  job.name = "mssql_starrocks.inventory.INV.orders__ODS_startup_latest"
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 5000
}

source {
  SqlServer-CDC {
    base-url = "jdbc:sqlserver://db01:1433;databaseName=inventory"
    plugin_output = "orders_demo00"
    username = "seatunnel_src"
    password = ""
    startup.mode="latest"
    database-names = ["inventory"]
    table-names = ["inventory.INV.orders_demo00"]
    table-names-config = [
      {
        table = "inventory.INV.orders_demo00"
        primaryKeys = ["id"]
      }
    ]
  }
}


sink {
  Iceberg {
    catalog_name = "ods"
    iceberg.catalog.config = {
      "type" = "rest"
      "uri" = "http://iceberg-rest:8181"
      "warehouse" = "s3://warehouse/"
      "io-impl" = "org.apache.iceberg.aws.s3.S3FileIO"
      "s3.endpoint" = "http://minio:9000"
      "s3.access-key-id" = "admin"
      "s3.secret-access-key" = "password"
      "s3.path-style-access" = true
    }
    namespace = "ods_namespace"
    table = "iceberg_sink_orders_demo00"
    iceberg.table.write-props = {
      write.format.default = "parquet"
      write.target-file-size-bytes = 536870912
    }
    iceberg.table.primary-keys = ["id"]
    iceberg.table.partition-keys = ["f_datetime"]
    iceberg.table.upsert-mode-enabled = true
    iceberg.table.schema-evolution-enabled = true
    case_sensitive = true
  }
}

Running Command

./bin/seatunnel.sh --config ./JOBS/mssql_iceberg/inventory.INV.orders__ODS_startup_latest.yaml

Error Exception

=== But got the below error, and already tried different verisons of iceberg-aws-bundle-<version>.jar and iceberg-spark-runtime-<SPARK_MAJOR_VERSION>_2.12-<ICEBERG_VERSION>.jar. but also get the same problem. is it FileIO did not implement in org.apache.iceberg.aws.s3.S3FileIO  or org.apache.iceberg.io.FileIO ????

Caused by: java.lang.IllegalArgumentException: Cannot initialize FileIO, org.apache.iceberg.aws.s3.S3FileIO does not implement FileIO.
        at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:320)
        at org.apache.iceberg.rest.RESTSessionCatalog.newFileIO(RESTSessionCatalog.java:827)
        at org.apache.iceberg.rest.RESTSessionCatalog.initialize(RESTSessionCatalog.java:204)
        at org.apache.iceberg.rest.RESTCatalog.initialize(RESTCatalog.java:72)
        at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:239)
        at org.apache.iceberg.CatalogUtil.buildIcebergCatalog(CatalogUtil.java:284)
        at org.apache.seatunnel.connectors.seatunnel.iceberg.IcebergCatalogLoader.loadCatalog(IcebergCatalogLoader.java:61)
        at org.apache.seatunnel.connectors.seatunnel.iceberg.catalog.IcebergCatalog.open(IcebergCatalog.java:91)
        at org.apache.seatunnel.api.sink.DefaultSaveModeHandler.open(DefaultSaveModeHandler.java:78)
        at org.apache.seatunnel.engine.server.master.JobMaster.handleSaveMode(JobMaster.java:523)
        ... 21 more
Caused by: java.lang.ClassCastException: org.apache.iceberg.aws.s3.S3FileIO cannot be cast to org.apache.iceberg.io.FileIO
        at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:317)
        ... 30 more

Zeta or Flink or Spark Version

ENV SPARK_VERSION=3.5.2
ENV SPARK_MAJOR_VERSION=3.5
ENV ICEBERG_VERSION=1.6.0

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@larryloi larryloi added the bug label Jan 23, 2025
@litiliu
Copy link
Contributor

litiliu commented Jan 24, 2025

I think it's related to classloader or classpath, Seems the org.apache.iceberg.aws.s3.S3FileIO And org.apache.iceberg.io.FileIO were loaded by the different classloader. Have you tried to use the Flink or Spark engine?

Image

@larryloi
Copy link
Author

means I cannot use "rest" type to connect iceberg-rest and minIO endpoint directly. we need to use spark engine to access MinIO, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants