Skip to content

Commit 76b4eae

Browse files
committed
HADOOP-19696. module hadoop-cloud-storage-dist for the distribution
This ensures that anything done dependency-wise for packaging doesn't impact the hadoop-cloud-storage module and any downstream uses. - separate profile for each component to pull in all dependencies - hadoop-azure is always included, hadoop-aws *except* bundle.jar - hadoop-gcp and hadoop-tos are complete iff shaded - updated BUILDING.txt This is enough to let anyone cut a release with their choice of functional cloud connectors.
1 parent 9425f9e commit 76b4eae

File tree

6 files changed

+111
-36
lines changed

6 files changed

+111
-36
lines changed

BUILDING.txt

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,51 @@ Create a local staging version of the website (in /tmp/hadoop-site)
388388

389389
Note that the site needs to be built in a second pass after other artifacts.
390390

391+
----------------------------------------------------------------------------------
392+
Including Cloud Connector Dependencies in Distributions:
393+
394+
Hadoop distributions include the hadoop modules to work with data and services
395+
on cloud infrastructure
396+
397+
However, dependencies are omitted for all cloud connectors except hadoop-azure
398+
(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://).
399+
For the latter two modules, it depends on shading options.
400+
401+
For hadoop-aws the AWS SDK bundle.jar omitted, but everything else is included.
402+
403+
* This keeps binary release size below the limit of apache distributions
404+
* Reduces download and size overhead in docker usage.
405+
* Reduces the CVE attack surface
406+
* Reduces the risk of classpath conflict.
407+
408+
To produce a build with the specific desired dependencies, the build must be executed
409+
with the relevant profile of ${module}-package.
410+
411+
For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies,
412+
run with
413+
414+
mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package
415+
416+
Available package profiles:
417+
hadoop-aliyun-package
418+
hadoop-aws-package
419+
hadoop-azure-datalake-package
420+
hadoop-cos-package
421+
hadoop-gcp-package
422+
hadoop-huaweicloud-package
423+
hadoop-tos-package
424+
425+
To build a complete distribution then with all cloud dependencies included:
426+
427+
mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true \
428+
-Phadoop-aliyun-package,hadoop-aws-package,hadoop-azure-datalake-package\
429+
-Phadoop-cos-package,hadoop-gcp-package,hadoop-tos-package
430+
431+
The resulting tar file will be too large to be distributable through ASF infrastructure.
432+
433+
The hadoop-gcp and hadoop-tos artifacts include their dependencies unless the distribution
434+
is built with -DskipShade.
435+
391436
----------------------------------------------------------------------------------
392437
Installing Hadoop
393438

LICENSE-binary

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,11 +214,12 @@ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/data
214214
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/TimeoutFuture.java
215215

216216
ch.qos.reload4j:reload4j:1.2.22
217+
com.aliyun:aliyun-java-core:0.2.11-beta
217218
com.aliyun:aliyun-java-sdk-core:4.5.10
218219
com.aliyun:aliyun-java-sdk-kms:2.11.0
219220
com.aliyun:aliyun-java-sdk-ram:3.1.0
220221
com.aliyun:aliyun-java-sdk-sts:3.0.0
221-
com.aliyun:java-trace-api:0.2.11-beta.jar
222+
com.aliyun:java-trace-api:0.2.11-beta
222223
com.aliyun.oss:aliyun-sdk-oss:3.13.2
223224
com.cedarsoftware:java-util:1.9.0
224225
com.cedarsoftware:json-io:2.5.1
@@ -274,7 +275,7 @@ com.google.j2objc:j2objc-annotations:3.0.0
274275
com.google.oauth-client:google-oauth-client:1.37.0
275276
com.huaweicloud:esdk-obs-java:3.20.4.2
276277
com.jamesmurty.utils:java-xmlbuilder-1.2.jar
277-
com.microsoft.azure:azure-storage:7.0.0
278+
com.microsoft.azure:azure-storage:7.0.1
278279
com.nimbusds:nimbus-jose-jwt:10.4
279280
com.squareup.okhttp3:okhttp:jar:3.14.2
280281
com.squareup.okio:okio:jar:1.17.2
@@ -556,7 +557,7 @@ com.microsoft.azure:azure-cosmosdb:2.4.5
556557
com.microsoft.azure:azure-cosmosdb-commons:2.4.5
557558
com.microsoft.azure:azure-cosmosdb-direct:2.4.5
558559
com.microsoft.azure:azure-cosmosdb-gateway:2.4.5
559-
com.microsoft.azure:azure-data-lake-store-sdk:2.3.3
560+
com.microsoft.azure:azure-data-lake-store-sdk:2.3.9
560561
com.microsoft.azure:azure-keyvault-core:1.0.0
561562
com.microsoft.sqlserver:mssql-jdbc:6.2.1.jre7
562563
org.bouncycastle:bcpkix-jdk18on:1.78.1

dev-support/bin/dist-layout-stitching

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ run copy "${ROOT}/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-${VERSION}"
132132
run copy "${ROOT}/hadoop-common-project/hadoop-registry/target/hadoop-registry-${VERSION}" .
133133

134134
# cloud connectors go into common
135-
run copy "${ROOT}/hadoop-cloud-storage-project/hadoop-cloud-storage/target/hadoop-cloud-storage-${VERSION}" .
135+
run copy "${ROOT}/hadoop-cloud-storage-project/hadoop-cloud-storage-dist/target/hadoop-cloud-storage-dist-${VERSION}" .
136136

137137
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-${VERSION}" .
138138
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs-nfs/target/hadoop-hdfs-nfs-${VERSION}" .

hadoop-cloud-storage-project/hadoop-cloud-storage-dist/pom.xml

Lines changed: 56 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,34 @@
3737
the Jars.
3838
3939
By default, while hadoop-* artifacts are all included, dependencies
40-
are omitted for nearly everything.
41-
* keeps size down
42-
* keeps CVE attack surface down
40+
are omitted for all cloud connectors except hadoop-azure and
41+
possibly hadoop-gcs and hadoop-tos modules.
42+
For hadoop-aws the AWS SDK bundle.jar omitted, but everything else is included.
43+
44+
* This keeps binary release size below the limit of apache distributions
45+
* Reduces download and size overhead in docker usage.
46+
* Reduces the CVE attack surface
47+
* Reduces the risk of classpath conflict.
48+
4349
To produce a build with the specific desired dependencies, the build must be executed
44-
with the relevant profile of ${module}-dependencies.
50+
with the relevant profile of ${module}-package.
4551
4652
For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies,
47-
run with
48-
-Phadoop-aws-dependencies -Phadoop-azure-datalake-dependencies
53+
build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package
54+
55+
Available package profiles:
56+
hadoop-aliyun-package
57+
hadoop-aws-package
58+
hadoop-azure-datalake-package
59+
hadoop-cos-package
60+
hadoop-gcp-package
61+
hadoop-huaweicloud-package
62+
hadoop-tos-package
63+
64+
To build a complete distribution then
65+
mvn package -Pdist -DskipTests -Phadoop-aliyun-package,hadoop-aws-package,hadoop-azure-datalake-package\
66+
-Phadoop-cos-package,hadoop-gcp-package,hadoop-tos-package
67+
4968
-->
5069
<properties>
5170
<hadoop.component>cloud-storage</hadoop.component>
@@ -130,10 +149,6 @@
130149
<groupId>org.apache.hadoop</groupId>
131150
<artifactId>hadoop-gcp</artifactId>
132151
<scope>compile</scope>
133-
<!--
134-
Exclude transitive dependencies to prevent dependency convergence
135-
problems. hadoop-gcp is a self-contained shaded jar.
136-
-->
137152
<exclusions>
138153
<exclusion>
139154
<groupId>*</groupId>
@@ -202,12 +217,11 @@
202217
</build>
203218
</profile>
204219

205-
206220
<!-- Pull in aliyun -->
207221
<profile>
208-
<id>hadoop-aliyun-dependencies</id>
222+
<id>hadoop-aliyun-package</id>
209223
<activation>
210-
<activeByDefault>false</activeByDefault>
224+
<property><name>hadoop-aliyun-package</name></property>
211225
</activation>
212226
<dependencies>
213227
<dependency>
@@ -220,9 +234,9 @@
220234

221235
<!-- Pull in the AWS SDK -->
222236
<profile>
223-
<id>hadoop-aws-dependencies</id>
237+
<id>hadoop-aws-package</id>
224238
<activation>
225-
<activeByDefault>false</activeByDefault>
239+
<property><name>hadoop-aws-package</name></property>
226240
</activation>
227241
<dependencies>
228242
<dependency>
@@ -233,41 +247,41 @@
233247
</dependencies>
234248
</profile>
235249

236-
<!-- Pull in all the cos -->
250+
<!-- Pull in ADLS gen1 support -->
237251
<profile>
238-
<id>hadoop-cos-dependencies</id>
252+
<id>hadoop-azure-datalake-package</id>
239253
<activation>
240-
<activeByDefault>false</activeByDefault>
254+
<property><name>hadoop-azure-datalake-package</name></property>
241255
</activation>
242256
<dependencies>
243257
<dependency>
244258
<groupId>org.apache.hadoop</groupId>
245-
<artifactId>hadoop-cos</artifactId>
259+
<artifactId>hadoop-azure-datalake</artifactId>
246260
<scope>compile</scope>
247261
</dependency>
248262
</dependencies>
249263
</profile>
250264

251-
<!-- Pull in ADLS gen1 -->
265+
<!-- Pull in all the hadoop-cos dependencies -->
252266
<profile>
253-
<id>hadoop-azure-datalake-dependencies</id>
267+
<id>hadoop-cos-package</id>
254268
<activation>
255-
<activeByDefault>false</activeByDefault>
269+
<property><name>hadoop-cos-package</name></property>
256270
</activation>
257271
<dependencies>
258272
<dependency>
259273
<groupId>org.apache.hadoop</groupId>
260-
<artifactId>hadoop-azure-datalake</artifactId>
274+
<artifactId>hadoop-cos</artifactId>
261275
<scope>compile</scope>
262276
</dependency>
263277
</dependencies>
264278
</profile>
265279

266280
<!-- Pull in the huaweicloud dependencies -->
267281
<profile>
268-
<id>hadoop-huaweicloud-dependencies</id>
282+
<id>hadoop-huaweicloud-package</id>
269283
<activation>
270-
<activeByDefault>false</activeByDefault>
284+
<property><name>hadoop-huaweicloud-package</name></property>
271285
</activation>
272286
<dependencies>
273287
<dependency>
@@ -284,11 +298,26 @@
284298
</dependencies>
285299
</profile>
286300

301+
<!-- Pull in the gcp dependencies -->
302+
<profile>
303+
<id>hadoop-gcp-package</id>
304+
<activation>
305+
<property><name>hadoop-gcp-package</name></property>
306+
</activation>
307+
<dependencies>
308+
<dependency>
309+
<groupId>org.apache.hadoop</groupId>
310+
<artifactId>hadoop-gcp</artifactId>
311+
<scope>compile</scope>
312+
</dependency>
313+
</dependencies>
314+
</profile>
315+
287316
<!-- Pull in Volcano TOS -->
288317
<profile>
289-
<id>hadoop-tos-dependencies</id>
318+
<id>hadoop-tos-package</id>
290319
<activation>
291-
<activeByDefault>false</activeByDefault>
320+
<property><name>hadoop-tos-package</name></property>
292321
</activation>
293322
<dependencies>
294323
<dependency>

hadoop-cloud-storage-project/hadoop-tos/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,9 +184,9 @@
184184
<profiles>
185185
<!-- Generate a shaded artifact -->
186186
<profile>
187-
<id>shade-tos</id>
187+
<id>shade</id>
188188
<activation>
189-
<activeByDefault>false</activeByDefault>
189+
<property><name>!skipShade</name></property>
190190
</activation>
191191
<build>
192192
<plugins>

hadoop-project/pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2181,9 +2181,9 @@
21812181
<version>2.4.4</version>
21822182
</dependency>
21832183
<dependency>
2184-
<groupId>com.google.cloud</groupId>
2185-
<artifactId>google-cloud-storage</artifactId>
2186-
<version>2.52.0</version>
2184+
<groupId>com.google.cloud</groupId>
2185+
<artifactId>google-cloud-storage</artifactId>
2186+
<version>2.52.0</version>
21872187
</dependency>
21882188
</dependencies>
21892189
</dependencyManagement>

0 commit comments

Comments
 (0)