Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Jul 22, 2020

Proposed changes

This CL mainly changes:

  1. Add 2 new FE modules

    1. fe-common

      save all common classes for other modules, currently only jmockit

    2. spark-dpp

      The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests.

  2. Change the build.sh

    Add a new param --spark-dpp to compile the spark-dpp alone. And --fe will compile all FE modules.

    the output of spark-dpp module is spark-dpp-1.0.0-jar-with-dependencies.jar, and it will be installed to output/fe/spark-dpp/.

Types of changes

  • Code refactor

Checklist

  • Compiling and unit tests pass locally with my changes

Further comments

After this PR merged, the Spark Load feature will NOT working, it need to wait another PR to modify the way to deploy the new spark-dpp.jar.

@morningman morningman added kind/refactor Issues or PRs to refactor code area/spark-load Issues or PRs related to the spark load labels Jul 22, 2020
@morningman morningman self-assigned this Jul 22, 2020
build.sh Outdated
cp -r -p ${DORIS_HOME}/webroot/* ${DORIS_OUTPUT}/fe/webroot/
# Copy Frontend and Backend
if [ ${BUILD_FE} -eq 1 -o ${BUILD_SPARK_DPP} -eq 1 ]; then
if [ ${BUILD_FE} -eq 1]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ ${BUILD_FE} -eq 1]; then
if [ ${BUILD_FE} -eq 1 ]; then

</dependencies>

<build>
<finalName>spark-dpp-${version}</finalName>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better named it without version, just like "palo-fe.jar", may be "spark-dpp.jar". Because we will have a DppVersion in FeConstant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If without version, it is difficult to found out which version it is when we saw this file.


This module is used to store some common classes of other modules.

# spark-dpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would better add a explain for dpp.

morningman pushed a commit that referenced this pull request Jul 27, 2020
…4163)

### Resume
When users use spark load, they have to upload the dependent jars to hdfs every time.
This cl will add a self-generated repository under working_dir folder in hdfs for saving dependecies of spark dpp programe and spark platform.
Note that, the dependcies we upload to repository include:
1、`spark-dpp.jar`
2、`spark2x.zip`
1 is the dpp library which built with spark-dpp submodule. See details about spark-dpp submodule in pr #4146 .
2 is the spark2.x.x platform library which contains all jars in $SPARK_HOME/jars

**The repository structure** will be like this:

```
__spark_repository__/
    |-__archive_1_0_0/
    |        |-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp.jar
    |        |-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip
    |-__archive_2_2_0/
    |        |-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp.jar
    |        |-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip
    |-__archive_3_2_0/
    |        |-...
```

The followinng conditions will force fe to upload dependencies:
1、When fe find its dppVersion is absent in repository.
2、The MD5 value of remote file does not match the local file.
Before Fe uploads the dependencies, it will create an archive directory with name `__archive_{dppVersion}` under the repository.
@xy720
Copy link
Member

xy720 commented Jul 29, 2020

LGTM

Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 0e79f69 into apache:master Jul 29, 2020
@EmmyMiao87 EmmyMiao87 mentioned this pull request Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/spark-load Issues or PRs related to the spark load kind/refactor Issues or PRs to refactor code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants