Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4514] improvement(iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 56MB, IcebergRESTServer size from 162M to 70M #4548

Conversation

LiuQhahah
Copy link
Contributor

@LiuQhahah LiuQhahah commented Aug 15, 2024

What changes were proposed in this pull request?

remove some unnecessary dependencies for Iceberg REST server and Iceberg catalog

Why are the changes needed?

Fix: #4514 #4696

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed

@jerryshao jerryshao requested a review from FANNG1 August 16, 2024 01:24
@jerryshao
Copy link
Contributor

@FANNG1 can you please help to review this?

@FANNG1
Copy link
Contributor

FANNG1 commented Aug 29, 2024

@LiuQhahah thanks for the PR, I compile the package /gradlew clean compileDistribution -x test -x rat -x web -x lintOpenAPI and found some extra jars in distribution/package/catalogs/lakehouse-iceberg/libs/, could you try to exclude them?

hadoop-yarn-common-2.10.2.jar
hadoop-yarn-api-2.10.2.jar
curator-client-2.13.0.jar
derby-10.10.2.0.jar
leveldbjni-all-1.8.jar
rocksdbjni-7.10.2.jar
sqlite-jdbc-3.42.0.0.jar

implementation(project(":core")) {
exclude("*")
}
implementation(project(":iceberg:iceberg-common")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some jars in :iceberg:iceberg-common are critical, we couldn't exclude all the dependences, we could exclude extra jars in the build file of :iceberg:iceberg-common

@jerryshao
Copy link
Contributor

@FANNG1 @LiuQhahah can we please move fast on this, this is an important PR, @FANNG1 can you please help @LiuQhahah on this to make it work soon.

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 4, 2024

@LiuQhahah is there anything blocking you?

@LiuQhahah
Copy link
Contributor Author

@LiuQhahah is there anything blocking you?

I have updated the code.
please check whether any issue with it.

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 5, 2024

@LiuQhahah is there anything blocking you?

I have updated the code. please check whether any issue with it.

You exclude all dependences from :iceberg:iceberg-common which are necessary for catalog-iceberg, the better way is to exclude specific dependences explictly.

  implementation(project(":iceberg:iceberg-common")) {
      exclude("*")
  }

@LiuQhahah
Copy link
Contributor Author

@LiuQhahah is there anything blocking you?

I have updated the code. please check whether any issue with it.

You exclude all dependences from :iceberg:iceberg-common which are necessary for catalog-iceberg, the better way is to exclude specific dependences explictly.

  implementation(project(":iceberg:iceberg-common")) {
      exclude("*")
  }

Hi @FANNG1

I try to update the file.
however, from the result, I can't exclude the rocksdbjni-7.10.2.jar .
Can you help with some ideas?

thanks

$ ls distribution/package/catalogs/lakehouse-iceberg/libs  | grep rock
rocksdbjni-7.10.2.jar

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 5, 2024

@LiuQhahah is there anything blocking you?

I have updated the code. please check whether any issue with it.

You exclude all dependences from :iceberg:iceberg-common which are necessary for catalog-iceberg, the better way is to exclude specific dependences explictly.

  implementation(project(":iceberg:iceberg-common")) {
      exclude("*")
  }

Hi @FANNG1

I try to update the file. however, from the result, I can't exclude the rocksdbjni-7.10.2.jar . Can you help with some ideas?

thanks

$ ls distribution/package/catalogs/lakehouse-iceberg/libs  | grep rock
rocksdbjni-7.10.2.jar

could I continue exclude some jars based on your PR?

@LiuQhahah
Copy link
Contributor Author

[#4778] feat(iceberg) remove server-common depencences from Iceberg catalog #4863

No Problem.
thank you

@FANNG1 FANNG1 changed the title [#4514] improvement(catalog-lakehouse-iceberg): reduce catalog-lakehouse-iceberg libs size from 184MB to 38MB [#4514] improvement(catalog-lakehouse-iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 57MB, IcebergRESTServer size from 162M to 71M Sep 9, 2024
@FANNG1 FANNG1 force-pushed the #4514-Shrink-the-Iceberg-catalog-binary-package-size branch from f2c401e to fb6ebe7 Compare September 9, 2024 11:23
@FANNG1 FANNG1 marked this pull request as draft September 9, 2024 13:49
@FANNG1
Copy link
Contributor

FANNG1 commented Sep 10, 2024

du -sh distribution/package/catalogs/lakehouse-iceberg/libs/* | sort -h | tail -n 15
836K	distribution/package/catalogs/lakehouse-iceberg/libs/httpcore5-5.2.4.jar
844K	distribution/package/catalogs/lakehouse-iceberg/libs/httpclient5-5.3.1.jar
892K	distribution/package/catalogs/lakehouse-iceberg/libs/caffeine-2.9.3.jar
896K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-serde-2.3.9.jar
1.0M	distribution/package/catalogs/lakehouse-iceberg/libs/commons-compress-1.22.jar
1.4M	distribution/package/catalogs/lakehouse-iceberg/libs/htrace-core4-4.1.0-incubating.jar
1.5M	distribution/package/catalogs/lakehouse-iceberg/libs/commons-math3-3.1.1.jar
1.5M	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-databind-2.14.2.jar
1.6M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-mapreduce-client-core-2.10.2.jar
1.7M	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-bundled-guava-1.5.2.jar
1.8M	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-core-1.5.2.jar
3.8M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-common-2.10.2.jar
4.1M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-hdfs-client-2.10.2.jar
5.0M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-hdfs-2.10.2.jar
7.8M	distribution/package/catalogs/lakehouse-iceberg/libs/hive-metastore-2.3.9.jar

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 10, 2024

du -sh distribution/package/iceberg-rest-server/libs/* | sort -h | tail -n 15
1.0M	distribution/package/iceberg-rest-server/libs/commons-compress-1.22.jar
1.2M	distribution/package/iceberg-rest-server/libs/jersey-common-2.41.jar
1.4M	distribution/package/iceberg-rest-server/libs/htrace-core4-4.1.0-incubating.jar
1.5M	distribution/package/iceberg-rest-server/libs/commons-math3-3.1.1.jar
1.5M	distribution/package/iceberg-rest-server/libs/jackson-databind-2.15.2.jar
1.6M	distribution/package/iceberg-rest-server/libs/hadoop-mapreduce-client-core-2.10.2.jar
1.7M	distribution/package/iceberg-rest-server/libs/iceberg-bundled-guava-1.5.2.jar
1.8M	distribution/package/iceberg-rest-server/libs/iceberg-core-1.5.2.jar
1.8M	distribution/package/iceberg-rest-server/libs/log4j-core-2.22.0.jar
1.8M	distribution/package/iceberg-rest-server/libs/protobuf-java-3.24.4.jar
2.9M	distribution/package/iceberg-rest-server/libs/guava-32.1.3-jre.jar
3.8M	distribution/package/iceberg-rest-server/libs/hadoop-common-2.10.2.jar
4.1M	distribution/package/iceberg-rest-server/libs/hadoop-hdfs-client-2.10.2.jar
5.0M	distribution/package/iceberg-rest-server/libs/hadoop-hdfs-2.10.2.jar
7.8M	distribution/package/iceberg-rest-server/libs/hive-metastore-2.3.9.jar

@FANNG1 FANNG1 changed the title [#4514] improvement(catalog-lakehouse-iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 57MB, IcebergRESTServer size from 162M to 71M [#4514] improvement(catalog-lakehouse-iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 56MB, IcebergRESTServer size from 162M to 70M Sep 10, 2024
@yuqi1129
Copy link
Contributor

1.6M distribution/package/iceberg-rest-server/libs/hadoop-mapreduce-client-core-2.10.2.jar

The following jars can be exclude also:

  1. 2.9M distribution/package/iceberg-rest-server/libs/guava-32.1.3-jre.jar
  2. 1.8M distribution/package/iceberg-rest-server/libs/log4j-core-2.22.0.jar
  3. 1.6M distribution/package/iceberg-rest-server/libs/hadoop-mapreduce-client-core-2.10.2.jar

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 10, 2024

  1. 2.9M distribution/package/iceberg-rest-server/libs/guava-32.1.3-jre.jar
  2. 1.8M distribution/package/iceberg-rest-server/libs/log4j-core-2.22.0.jar

Iceberg REST server could run on standalone mode, couldn't reuse the jars from Gravitino server, so the two jars seems necessary.

  1. 1.6M distribution/package/iceberg-rest-server/libs/hadoop-mapreduce-client-core-2.10.2.jar

There are runtime dependences for the MapReduce client.

Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
    	at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:4051)
    	at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:4019)
    	at org.apache.iceberg.hive.HiveClientPool.<init>(HiveClientPool.java:55)
    	at org.apache.iceberg.hive.CachedClientPool.lambda$clientPool$0(CachedClientPool.java:96)
    	at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
    	at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:[191](https://github.com/apache/gravitino/actions/runs/10784541742/job/29908323252?pr=4863#step:8:192)6)
    	at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
    	at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
    	at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
    	at com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
    	at org.apache.iceberg.hive.CachedClientPool.clientPool(CachedClientPool.java:96)
    	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
    	at org.apache.iceberg.hive.HiveCatalog.loadNamespaceMetadata(HiveCatalog.java:443)

@FANNG1 FANNG1 marked this pull request as ready for review September 10, 2024 02:33
@FANNG1
Copy link
Contributor

FANNG1 commented Sep 10, 2024

@jerryshao @LiuQhahah @yuqi1129 @caican00 please help to review again

config.put("catalog.jdbc_backend.warehouse", "/tmp/usr/jdbc/warehouse");
config.put("catalog.jdbc_backend.jdbc.password", "gravitino");
config.put("catalog.jdbc_backend.jdbc.user", "gravitino");
config.put("catalog.jdbc_backend.jdbc-driver", "org.h2.Driver");
config.put("catalog.jdbc_backend.jdbc-driver", "org.sqlite.JDBC");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use Sqlite instead of H2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no special reason, it has sqlite dependence

@jerryshao
Copy link
Contributor

Can you please list all the jars for the Iceberg catalog and Iceberg rest service after you excluded jars?

Besides, I think #4696 is also addressed here in this PR, can you please update the PR title and description?

exclude(group = "org.junit.jupiter")
}
testImplementation(libs.bundles.jersey)
testImplementation(libs.bundles.jetty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need jetty and jersey here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

exclude("org.mortbay.jetty")
}
// use hdfs-default.xml
implementation(libs.hadoop2.hdfs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this, is hdfs-client enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to exclude hadoop-hdfs jar because we dependshdfs-default.xml to initialize the related environment.

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 11, 2024

Can you please list all the jars for the Iceberg catalog and Iceberg rest service after you excluded jars?

Besides, I think #4696 is also addressed here in this PR, can you please update the PR title and description?

du -sh distribution/package/catalogs/lakehouse-iceberg/libs/* | sort -h
4.0K	distribution/package/catalogs/lakehouse-iceberg/libs/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
8.0K	distribution/package/catalogs/lakehouse-iceberg/libs/failureaccess-1.0.1.jar
8.0K	distribution/package/catalogs/lakehouse-iceberg/libs/jcip-annotations-1.0-1.jar
 12K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-catalog-common-0.7.0-incubating-SNAPSHOT.jar
 12K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-shims-2.3.9.jar
 16K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-shims-scheduler-2.3.9.jar
 16K	distribution/package/catalogs/lakehouse-iceberg/libs/jta-1.1.jar
 16K	distribution/package/catalogs/lakehouse-iceberg/libs/xmlenc-0.52.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/api-asn1-api-1.0.0-M20.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/error_prone_annotations-2.21.1.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/java-xmlbuilder-0.4.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/jsr305-3.0.2.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-identity-2.0.3.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/opencsv-2.3.jar
 20K	distribution/package/catalogs/lakehouse-iceberg/libs/token-provider-2.0.3.jar
 24K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-simplekdc-2.0.3.jar
 28K	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-aliyun-1.5.2.jar
 32K	distribution/package/catalogs/lakehouse-iceberg/libs/kerby-config-2.0.3.jar
 32K	distribution/package/catalogs/lakehouse-iceberg/libs/kerby-xdr-2.0.3.jar
 36K	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-common-1.5.2.jar
 36K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-util-2.0.3.jar
 40K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-iceberg-common-0.7.0-incubating-SNAPSHOT.jar
 40K	distribution/package/catalogs/lakehouse-iceberg/libs/kerby-util-2.0.3.jar
 44K	distribution/package/catalogs/lakehouse-iceberg/libs/apacheds-i18n-2.0.0-M15.jar
 44K	distribution/package/catalogs/lakehouse-iceberg/libs/asm-3.1.jar
 44K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-cli-1.2.jar
 48K	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-annotations-2.10.2.jar
 48K	distribution/package/catalogs/lakehouse-iceberg/libs/wildfly-client-config-1.0.1.Final.jar
 56K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-shims-0.23-2.3.9.jar
 64K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-logging-1.2.jar
 64K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-catalog-lakehouse-iceberg-0.7.0-incubating-SNAPSHOT.jar
 68K	distribution/package/catalogs/lakehouse-iceberg/libs/jboss-logging-3.3.1.Final.jar
 68K	distribution/package/catalogs/lakehouse-iceberg/libs/okio-1.6.0.jar
 72K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-common-2.0.3.jar
 76K	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-hive-metastore-1.5.2.jar
 76K	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-annotations-2.14.2.jar
 80K	distribution/package/catalogs/lakehouse-iceberg/libs/api-util-1.0.0-M20.jar
 84K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-server-2.0.3.jar
 96K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-pool-1.5.4.jar
100K	distribution/package/catalogs/lakehouse-iceberg/libs/jsp-api-2.1.jar
100K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-admin-2.0.3.jar
100K	distribution/package/catalogs/lakehouse-iceberg/libs/kerby-asn1-2.0.3.jar
112K	distribution/package/catalogs/lakehouse-iceberg/libs/bonecp-0.8.0.RELEASE.jar
116K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-client-2.0.3.jar
116K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-crypto-2.0.3.jar
120K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-shims-common-2.3.9.jar
124K	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-auth-2.10.2.jar
144K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-digester-1.8.jar
160K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-dbcp-1.4.jar
164K	distribution/package/catalogs/lakehouse-iceberg/libs/antlr-runtime-3.5.2.jar
164K	distribution/package/catalogs/lakehouse-iceberg/libs/jboss-threads-2.3.6.Final.jar
180K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-storage-api-2.4.0.jar
188K	distribution/package/catalogs/lakehouse-iceberg/libs/gson-2.2.4.jar
192K	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-aws-1.5.2.jar
192K	distribution/package/catalogs/lakehouse-iceberg/libs/stax2-api-4.2.1.jar
196K	distribution/package/catalogs/lakehouse-iceberg/libs/kerby-pkix-2.0.3.jar
200K	distribution/package/catalogs/lakehouse-iceberg/libs/jdo-api-3.0.1.jar
220K	distribution/package/catalogs/lakehouse-iceberg/libs/checker-qual-3.37.0.jar
220K	distribution/package/catalogs/lakehouse-iceberg/libs/kerb-core-2.0.3.jar
224K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-api-0.7.0-incubating-SNAPSHOT.jar
228K	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-core-asl-1.9.13.jar
232K	distribution/package/catalogs/lakehouse-iceberg/libs/httpcore5-h2-5.2.4.jar
232K	distribution/package/catalogs/lakehouse-iceberg/libs/libthrift-0.9.3.jar
244K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-beanutils-1.9.4.jar
252K	distribution/package/catalogs/lakehouse-iceberg/libs/aircompressor-0.26.jar
268K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-net-3.1.jar
276K	distribution/package/catalogs/lakehouse-iceberg/libs/cglib-2.2.jar
276K	distribution/package/catalogs/lakehouse-iceberg/libs/jsch-0.1.55.jar
280K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-lang-2.6.jar
280K	distribution/package/catalogs/lakehouse-iceberg/libs/wildfly-common-1.5.4.Final.jar
292K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-configuration-1.6.jar
308K	distribution/package/catalogs/lakehouse-iceberg/libs/libfb303-0.9.3.jar
324K	distribution/package/catalogs/lakehouse-iceberg/libs/httpcore-4.4.13.jar
324K	distribution/package/catalogs/lakehouse-iceberg/libs/okhttp-2.7.5.jar
328K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-codec-1.11.jar
328K	distribution/package/catalogs/lakehouse-iceberg/libs/reload4j-1.2.19.jar
332K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-common-0.7.0-incubating-SNAPSHOT.jar
388K	distribution/package/catalogs/lakehouse-iceberg/libs/javolution-5.5.1.jar
428K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-common-2.3.9.jar
440K	distribution/package/catalogs/lakehouse-iceberg/libs/RoaringBitmap-1.0.1.jar
452K	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-core-2.14.2.jar
492K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-io-2.15.0.jar
512K	distribution/package/catalogs/lakehouse-iceberg/libs/woodstox-core-5.3.0.jar
524K	distribution/package/catalogs/lakehouse-iceberg/libs/protobuf-java-2.5.0.jar
528K	distribution/package/catalogs/lakehouse-iceberg/libs/jets3t-0.9.0.jar
568K	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-api-1.5.2.jar
572K	distribution/package/catalogs/lakehouse-iceberg/libs/xnio-api-3.8.8.Final.jar
576K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-collections-3.2.2.jar
608K	distribution/package/catalogs/lakehouse-iceberg/libs/joda-time-2.8.1.jar
632K	distribution/package/catalogs/lakehouse-iceberg/libs/avro-1.11.3.jar
644K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-lang3-3.14.0.jar
676K	distribution/package/catalogs/lakehouse-iceberg/libs/apacheds-kerberos-codec-2.0.0-M15.jar
736K	distribution/package/catalogs/lakehouse-iceberg/libs/commons-collections4-4.4.jar
744K	distribution/package/catalogs/lakehouse-iceberg/libs/gravitino-core-0.7.0-incubating-SNAPSHOT.jar
744K	distribution/package/catalogs/lakehouse-iceberg/libs/nimbus-jose-jwt-9.30.1.jar
764K	distribution/package/catalogs/lakehouse-iceberg/libs/httpclient-4.5.13.jar
764K	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-mapper-asl-1.9.13.jar
836K	distribution/package/catalogs/lakehouse-iceberg/libs/httpcore5-5.2.4.jar
844K	distribution/package/catalogs/lakehouse-iceberg/libs/httpclient5-5.3.1.jar
892K	distribution/package/catalogs/lakehouse-iceberg/libs/caffeine-2.9.3.jar
896K	distribution/package/catalogs/lakehouse-iceberg/libs/hive-serde-2.3.9.jar
1.0M	distribution/package/catalogs/lakehouse-iceberg/libs/commons-compress-1.22.jar
1.4M	distribution/package/catalogs/lakehouse-iceberg/libs/htrace-core4-4.1.0-incubating.jar
1.5M	distribution/package/catalogs/lakehouse-iceberg/libs/commons-math3-3.1.1.jar
1.5M	distribution/package/catalogs/lakehouse-iceberg/libs/jackson-databind-2.14.2.jar
1.7M	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-bundled-guava-1.5.2.jar
1.8M	distribution/package/catalogs/lakehouse-iceberg/libs/iceberg-core-1.5.2.jar
3.8M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-common-2.10.2.jar
4.1M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-hdfs-client-2.10.2.jar
5.0M	distribution/package/catalogs/lakehouse-iceberg/libs/hadoop-hdfs-2.10.2.jar
7.8M	distribution/package/catalogs/lakehouse-iceberg/libs/hive-metastore-2.3.9.jar

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 11, 2024

du -sh distribution/package/iceberg-rest-server/libs/* | sort -h
4.0K	distribution/package/iceberg-rest-server/libs/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
4.0K	distribution/package/iceberg-rest-server/libs/simpleclient_tracer_common-0.16.0.jar
8.0K	distribution/package/iceberg-rest-server/libs/failureaccess-1.0.1.jar
8.0K	distribution/package/iceberg-rest-server/libs/jcip-annotations-1.0-1.jar
8.0K	distribution/package/iceberg-rest-server/libs/metrics-annotation-4.2.25.jar
8.0K	distribution/package/iceberg-rest-server/libs/simpleclient_common-0.16.0.jar
8.0K	distribution/package/iceberg-rest-server/libs/simpleclient_tracer_otel-0.16.0.jar
8.0K	distribution/package/iceberg-rest-server/libs/simpleclient_tracer_otel_agent-0.16.0.jar
 12K	distribution/package/iceberg-rest-server/libs/gravitino-catalog-common-0.7.0-incubating-SNAPSHOT.jar
 12K	distribution/package/iceberg-rest-server/libs/hive-shims-2.3.9.jar
 12K	distribution/package/iceberg-rest-server/libs/j2objc-annotations-2.8.jar
 12K	distribution/package/iceberg-rest-server/libs/simpleclient_servlet-0.16.0.jar
 12K	distribution/package/iceberg-rest-server/libs/slf4j-reload4j-1.7.36.jar
 16K	distribution/package/iceberg-rest-server/libs/hive-shims-scheduler-2.3.9.jar
 16K	distribution/package/iceberg-rest-server/libs/jta-1.1.jar
 16K	distribution/package/iceberg-rest-server/libs/profiler-1.1.1.jar
 16K	distribution/package/iceberg-rest-server/libs/simpleclient_servlet_common-0.16.0.jar
 16K	distribution/package/iceberg-rest-server/libs/xmlenc-0.52.jar
 20K	distribution/package/iceberg-rest-server/libs/api-asn1-api-1.0.0-M20.jar
 20K	distribution/package/iceberg-rest-server/libs/error_prone_annotations-2.21.1.jar
 20K	distribution/package/iceberg-rest-server/libs/jakarta.inject-2.6.1.jar
 20K	distribution/package/iceberg-rest-server/libs/java-xmlbuilder-0.4.jar
 20K	distribution/package/iceberg-rest-server/libs/jsr305-3.0.2.jar
 20K	distribution/package/iceberg-rest-server/libs/kerb-identity-2.0.3.jar
 20K	distribution/package/iceberg-rest-server/libs/metrics-json-4.2.25.jar
 20K	distribution/package/iceberg-rest-server/libs/opencsv-2.3.jar
 20K	distribution/package/iceberg-rest-server/libs/osgi-resource-locator-1.0.3.jar
 20K	distribution/package/iceberg-rest-server/libs/simpleclient_dropwizard-0.16.0.jar
 20K	distribution/package/iceberg-rest-server/libs/token-provider-2.0.3.jar
 24K	distribution/package/iceberg-rest-server/libs/kerb-simplekdc-2.0.3.jar
 24K	distribution/package/iceberg-rest-server/libs/metrics-healthchecks-4.2.25.jar
 24K	distribution/package/iceberg-rest-server/libs/metrics-jersey2-4.2.25.jar
 24K	distribution/package/iceberg-rest-server/libs/metrics-jmx-4.2.25.jar
 24K	distribution/package/iceberg-rest-server/libs/metrics-servlets-4.2.25.jar
 28K	distribution/package/iceberg-rest-server/libs/aopalliance-repackaged-2.6.1.jar
 28K	distribution/package/iceberg-rest-server/libs/iceberg-aliyun-1.5.2.jar
 28K	distribution/package/iceberg-rest-server/libs/jakarta.annotation-api-1.3.5.jar
 28K	distribution/package/iceberg-rest-server/libs/jetty-continuation-9.4.51.v20230217.jar
 28K	distribution/package/iceberg-rest-server/libs/log4j-slf4j2-impl-2.22.0.jar
 28K	distribution/package/iceberg-rest-server/libs/metrics-jvm-4.2.25.jar
 32K	distribution/package/iceberg-rest-server/libs/kerby-config-2.0.3.jar
 32K	distribution/package/iceberg-rest-server/libs/kerby-xdr-2.0.3.jar
 36K	distribution/package/iceberg-rest-server/libs/gravitino-iceberg-rest-server-0.7.0-incubating-SNAPSHOT.jar
 36K	distribution/package/iceberg-rest-server/libs/iceberg-common-1.5.2.jar
 36K	distribution/package/iceberg-rest-server/libs/jackson-datatype-jdk8-2.15.2.jar
 36K	distribution/package/iceberg-rest-server/libs/jackson-module-jaxb-annotations-2.15.2.jar
 36K	distribution/package/iceberg-rest-server/libs/kerb-util-2.0.3.jar
 40K	distribution/package/iceberg-rest-server/libs/gravitino-iceberg-common-0.7.0-incubating-SNAPSHOT.jar
 40K	distribution/package/iceberg-rest-server/libs/kerby-util-2.0.3.jar
 44K	distribution/package/iceberg-rest-server/libs/apacheds-i18n-2.0.0-M15.jar
 44K	distribution/package/iceberg-rest-server/libs/asm-3.1.jar
 44K	distribution/package/iceberg-rest-server/libs/commons-cli-1.2.jar
 44K	distribution/package/iceberg-rest-server/libs/jersey-container-jetty-http-2.41.jar
 48K	distribution/package/iceberg-rest-server/libs/hadoop-annotations-2.10.2.jar
 48K	distribution/package/iceberg-rest-server/libs/wildfly-client-config-1.0.1.Final.jar
 52K	distribution/package/iceberg-rest-server/libs/gravitino-server-common-0.7.0-incubating-SNAPSHOT.jar
 56K	distribution/package/iceberg-rest-server/libs/hive-shims-0.23-2.3.9.jar
 64K	distribution/package/iceberg-rest-server/libs/commons-logging-1.2.jar
 64K	distribution/package/iceberg-rest-server/libs/slf4j-api-2.0.9.jar
 68K	distribution/package/iceberg-rest-server/libs/jboss-logging-3.3.1.Final.jar
 68K	distribution/package/iceberg-rest-server/libs/jetty-util-ajax-9.4.51.v20230217.jar
 68K	distribution/package/iceberg-rest-server/libs/jetty-xml-9.4.51.v20230217.jar
 68K	distribution/package/iceberg-rest-server/libs/okio-1.6.0.jar
 72K	distribution/package/iceberg-rest-server/libs/jersey-container-servlet-core-2.41.jar
 72K	distribution/package/iceberg-rest-server/libs/kerb-common-2.0.3.jar
 72K	distribution/package/iceberg-rest-server/libs/protobuf-java-util-3.24.4.jar
 76K	distribution/package/iceberg-rest-server/libs/iceberg-hive-metastore-1.5.2.jar
 76K	distribution/package/iceberg-rest-server/libs/jackson-annotations-2.15.2.jar
 80K	distribution/package/iceberg-rest-server/libs/api-util-1.0.0-M20.jar
 80K	distribution/package/iceberg-rest-server/libs/jersey-hk2-2.41.jar
 80K	distribution/package/iceberg-rest-server/libs/jersey-media-json-jackson-2.41.jar
 84K	distribution/package/iceberg-rest-server/libs/jersey-entity-filtering-2.41.jar
 84K	distribution/package/iceberg-rest-server/libs/kerb-server-2.0.3.jar
 92K	distribution/package/iceberg-rest-server/libs/jakarta.validation-api-2.0.2.jar
 92K	distribution/package/iceberg-rest-server/libs/simpleclient-0.16.0.jar
 96K	distribution/package/iceberg-rest-server/libs/commons-pool-1.5.4.jar
 96K	distribution/package/iceberg-rest-server/libs/javax.servlet-api-3.1.0.jar
100K	distribution/package/iceberg-rest-server/libs/jsp-api-2.1.jar
100K	distribution/package/iceberg-rest-server/libs/kerb-admin-2.0.3.jar
100K	distribution/package/iceberg-rest-server/libs/kerby-asn1-2.0.3.jar
108K	distribution/package/iceberg-rest-server/libs/jetty-servlets-9.4.51.v20230217.jar
112K	distribution/package/iceberg-rest-server/libs/bonecp-0.8.0.RELEASE.jar
112K	distribution/package/iceberg-rest-server/libs/gravitino-client-java-0.7.0-incubating-SNAPSHOT.jar
116K	distribution/package/iceberg-rest-server/libs/jakarta.xml.bind-api-2.3.3.jar
116K	distribution/package/iceberg-rest-server/libs/jetty-security-9.4.51.v20230217.jar
116K	distribution/package/iceberg-rest-server/libs/kerb-client-2.0.3.jar
116K	distribution/package/iceberg-rest-server/libs/kerb-crypto-2.0.3.jar
120K	distribution/package/iceberg-rest-server/libs/hive-shims-common-2.3.9.jar
124K	distribution/package/iceberg-rest-server/libs/hadoop-auth-2.10.2.jar
124K	distribution/package/iceberg-rest-server/libs/jackson-datatype-jsr310-2.15.2.jar
128K	distribution/package/iceberg-rest-server/libs/metrics-core-4.2.25.jar
132K	distribution/package/iceberg-rest-server/libs/hk2-utils-2.6.1.jar
140K	distribution/package/iceberg-rest-server/libs/jakarta.ws.rs-api-2.1.6.jar
140K	distribution/package/iceberg-rest-server/libs/jetty-webapp-9.4.51.v20230217.jar
144K	distribution/package/iceberg-rest-server/libs/commons-digester-1.8.jar
144K	distribution/package/iceberg-rest-server/libs/jetty-servlet-9.4.51.v20230217.jar
160K	distribution/package/iceberg-rest-server/libs/commons-dbcp-1.4.jar
164K	distribution/package/iceberg-rest-server/libs/antlr-runtime-3.5.2.jar
164K	distribution/package/iceberg-rest-server/libs/jboss-threads-2.3.6.Final.jar
180K	distribution/package/iceberg-rest-server/libs/hive-storage-api-2.4.0.jar
180K	distribution/package/iceberg-rest-server/libs/jetty-io-9.4.51.v20230217.jar
192K	distribution/package/iceberg-rest-server/libs/iceberg-aws-1.5.2.jar
192K	distribution/package/iceberg-rest-server/libs/stax2-api-4.2.1.jar
196K	distribution/package/iceberg-rest-server/libs/hk2-api-2.6.1.jar
196K	distribution/package/iceberg-rest-server/libs/kerby-pkix-2.0.3.jar
200K	distribution/package/iceberg-rest-server/libs/hk2-locator-2.6.1.jar
200K	distribution/package/iceberg-rest-server/libs/jdo-api-3.0.1.jar
220K	distribution/package/iceberg-rest-server/libs/checker-qual-3.37.0.jar
220K	distribution/package/iceberg-rest-server/libs/kerb-core-2.0.3.jar
224K	distribution/package/iceberg-rest-server/libs/gravitino-api-0.7.0-incubating-SNAPSHOT.jar
228K	distribution/package/iceberg-rest-server/libs/jackson-core-asl-1.9.13.jar
232K	distribution/package/iceberg-rest-server/libs/httpcore5-h2-5.2.4.jar
232K	distribution/package/iceberg-rest-server/libs/jetty-http-9.4.51.v20230217.jar
232K	distribution/package/iceberg-rest-server/libs/libthrift-0.9.3.jar
244K	distribution/package/iceberg-rest-server/libs/commons-beanutils-1.9.4.jar
252K	distribution/package/iceberg-rest-server/libs/aircompressor-0.26.jar
256K	distribution/package/iceberg-rest-server/libs/gson-2.8.9.jar
268K	distribution/package/iceberg-rest-server/libs/commons-net-3.1.jar
276K	distribution/package/iceberg-rest-server/libs/cglib-2.2.jar
276K	distribution/package/iceberg-rest-server/libs/jsch-0.1.55.jar
280K	distribution/package/iceberg-rest-server/libs/commons-lang-2.6.jar
280K	distribution/package/iceberg-rest-server/libs/wildfly-common-1.5.4.Final.jar
292K	distribution/package/iceberg-rest-server/libs/commons-configuration-1.6.jar
296K	distribution/package/iceberg-rest-server/libs/jersey-client-2.41.jar
308K	distribution/package/iceberg-rest-server/libs/libfb303-0.9.3.jar
320K	distribution/package/iceberg-rest-server/libs/commons-io-2.11.0.jar
324K	distribution/package/iceberg-rest-server/libs/httpcore-4.4.13.jar
324K	distribution/package/iceberg-rest-server/libs/okhttp-2.7.5.jar
328K	distribution/package/iceberg-rest-server/libs/commons-codec-1.11.jar
328K	distribution/package/iceberg-rest-server/libs/log4j-api-2.22.0.jar
328K	distribution/package/iceberg-rest-server/libs/reload4j-1.2.19.jar
332K	distribution/package/iceberg-rest-server/libs/gravitino-common-0.7.0-incubating-SNAPSHOT.jar
348K	distribution/package/iceberg-rest-server/libs/log4j-1.2-api-2.22.0.jar
388K	distribution/package/iceberg-rest-server/libs/javolution-5.5.1.jar
428K	distribution/package/iceberg-rest-server/libs/hive-common-2.3.9.jar
440K	distribution/package/iceberg-rest-server/libs/RoaringBitmap-1.0.1.jar
512K	distribution/package/iceberg-rest-server/libs/woodstox-core-5.3.0.jar
528K	distribution/package/iceberg-rest-server/libs/jets3t-0.9.0.jar
540K	distribution/package/iceberg-rest-server/libs/jackson-core-2.15.2.jar
568K	distribution/package/iceberg-rest-server/libs/iceberg-api-1.5.2.jar
572K	distribution/package/iceberg-rest-server/libs/jetty-util-9.4.51.v20230217.jar
572K	distribution/package/iceberg-rest-server/libs/xnio-api-3.8.8.Final.jar
576K	distribution/package/iceberg-rest-server/libs/commons-collections-3.2.2.jar
608K	distribution/package/iceberg-rest-server/libs/joda-time-2.8.1.jar
632K	distribution/package/iceberg-rest-server/libs/avro-1.11.3.jar
644K	distribution/package/iceberg-rest-server/libs/commons-lang3-3.14.0.jar
676K	distribution/package/iceberg-rest-server/libs/apacheds-kerberos-codec-2.0.0-M15.jar
720K	distribution/package/iceberg-rest-server/libs/jetty-server-9.4.51.v20230217.jar
736K	distribution/package/iceberg-rest-server/libs/commons-collections4-4.4.jar
744K	distribution/package/iceberg-rest-server/libs/gravitino-core-0.7.0-incubating-SNAPSHOT.jar
744K	distribution/package/iceberg-rest-server/libs/nimbus-jose-jwt-9.30.1.jar
764K	distribution/package/iceberg-rest-server/libs/httpclient-4.5.13.jar
764K	distribution/package/iceberg-rest-server/libs/jackson-mapper-asl-1.9.13.jar
776K	distribution/package/iceberg-rest-server/libs/javassist-3.29.2-GA.jar
836K	distribution/package/iceberg-rest-server/libs/httpcore5-5.2.4.jar
844K	distribution/package/iceberg-rest-server/libs/httpclient5-5.3.1.jar
892K	distribution/package/iceberg-rest-server/libs/caffeine-2.9.3.jar
896K	distribution/package/iceberg-rest-server/libs/hive-serde-2.3.9.jar
932K	distribution/package/iceberg-rest-server/libs/jersey-server-2.41.jar
1.0M	distribution/package/iceberg-rest-server/libs/commons-compress-1.22.jar
1.2M	distribution/package/iceberg-rest-server/libs/jersey-common-2.41.jar
1.4M	distribution/package/iceberg-rest-server/libs/htrace-core4-4.1.0-incubating.jar
1.5M	distribution/package/iceberg-rest-server/libs/commons-math3-3.1.1.jar
1.5M	distribution/package/iceberg-rest-server/libs/jackson-databind-2.15.2.jar
1.7M	distribution/package/iceberg-rest-server/libs/iceberg-bundled-guava-1.5.2.jar
1.8M	distribution/package/iceberg-rest-server/libs/iceberg-core-1.5.2.jar
1.8M	distribution/package/iceberg-rest-server/libs/log4j-core-2.22.0.jar
1.8M	distribution/package/iceberg-rest-server/libs/protobuf-java-3.24.4.jar
2.9M	distribution/package/iceberg-rest-server/libs/guava-32.1.3-jre.jar
3.8M	distribution/package/iceberg-rest-server/libs/hadoop-common-2.10.2.jar
4.1M	distribution/package/iceberg-rest-server/libs/hadoop-hdfs-client-2.10.2.jar
5.0M	distribution/package/iceberg-rest-server/libs/hadoop-hdfs-2.10.2.jar
7.8M	distribution/package/iceberg-rest-server/libs/hive-metastore-2.3.9.jar

@FANNG1 FANNG1 changed the title [#4514] improvement(catalog-lakehouse-iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 56MB, IcebergRESTServer size from 162M to 70M [#4514] improvement(iceberg): shrink catalog-lakehouse-iceberg libs size from 184MB to 56MB, IcebergRESTServer size from 162M to 70M Sep 11, 2024
@FANNG1 FANNG1 force-pushed the #4514-Shrink-the-Iceberg-catalog-binary-package-size branch from ffd5c83 to 0f05dd3 Compare September 11, 2024 14:09
@@ -138,6 +138,7 @@ hive2-common = { group = "org.apache.hive", name = "hive-common", version.ref =
hive2-jdbc = { group = "org.apache.hive", name = "hive-jdbc", version.ref = "hive2"}
hadoop2-auth = { group = "org.apache.hadoop", name = "hadoop-auth", version.ref = "hadoop2" }
hadoop2-hdfs = { group = "org.apache.hadoop", name = "hadoop-hdfs", version.ref = "hadoop2" }
hadoop2-hdfs-client = { group = "org.apache.hadoop", name = "hadoop-hdfs-client", version.ref = "hadoop2" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you update the license.bin also?

Copy link
Contributor

@FANNG1 FANNG1 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license.bin contains Apache Hadoop HDFS Client already, but missing Apache Hadoop HDFS, updated

@@ -83,6 +105,7 @@ dependencies {
annotationProcessor(libs.lombok)
compileOnly(libs.lombok)

testImplementation(project(":server-common"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we require "server-common" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because IcebergConfig overwites the default HTTP port, we add a test to check whether it works in JettyServerConfig, like:

  public void testIcebergHttpPort() {
    Map<String, String> properties = ImmutableMap.of();
    IcebergConfig icebergConfig = new IcebergConfig(properties);
    JettyServerConfig jettyServerConfig = JettyServerConfig.fromConfig(icebergConfig);
    Assertions.assertEquals(
        JettyServerConfig.DEFAULT_ICEBERG_REST_SERVICE_HTTP_PORT, jettyServerConfig.getHttpPort());
    Assertions.assertEquals(
        JettyServerConfig.DEFAULT_ICEBERG_REST_SERVICE_HTTPS_PORT,
        jettyServerConfig.getHttpsPort());

@FANNG1 FANNG1 force-pushed the #4514-Shrink-the-Iceberg-catalog-binary-package-size branch from 4cc8f34 to 220a4d6 Compare September 13, 2024 01:07
@jerryshao
Copy link
Contributor

@FANNG1 can you make the CI pass?

@FANNG1 FANNG1 force-pushed the #4514-Shrink-the-Iceberg-catalog-binary-package-size branch from 220a4d6 to 8259c85 Compare September 13, 2024 07:03
@jerryshao jerryshao merged commit 93e0502 into apache:main Sep 13, 2024
26 checks passed
@LiuQhahah LiuQhahah deleted the #4514-Shrink-the-Iceberg-catalog-binary-package-size branch September 14, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Shrink the Iceberg catalog binary package size
5 participants