Skip to content

Conversation

@wyb
Copy link
Contributor

@wyb wyb commented May 28, 2020

After user creates a spark load job which status is PENDING, Fe will schedule and submit the spark etl job.

  1. Begin transaction
  2. Create a SparkLoadPendingTask for submitting etl job
    2.1 Create etl job configuration according to Spark load interface #3010 (comment)
    2.2 Upload the configuration file and job jar to HDFS with broker
    2.3 Submit etl job to spark cluster
    2.4 Wait for etl job submission result
  3. Update job state to ETL and log job update info if etl job is submitted successfully

#3433

@imay imay added area/load Issues or PRs related to all kinds of load kind/feature Categorizes issue or PR as related to a new feature. labels May 29, 2020
@imay imay self-assigned this May 29, 2020
public class Pair<F, S> {
public static PairComparator<Pair<?, Comparable>> PAIR_VALUE_COMPARATOR = new PairComparator<>();

@SerializedName(value = "first")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'am not sure this is ok, cause there is no guarantee that the F and S object can also be serialized by GSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users guarantee this when use Pair class?like Map and List.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, this is not work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a comment.
When using Pair for persistence, users need to guarantee that F and S can be serialized through Gson

private static final String CONFIG_FILE_NAME = "jobconfig.json";
private static final String APP_RESOURCE_LOCAL_PATH = PaloFe.DORIS_HOME_DIR + "/lib/" + APP_RESOURCE_NAME;
private static final String JOB_CONFIG_DIR = "configs";
private static final String MAIN_CLASS = "org.apache.doris.load.loadv2.etl.SparkEtlJob";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about get it from SparkEtlJob.class.getXXX()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I comment and will replace with it when SparkEtlJob class is merged.

throw new LoadException(errMsg + "spark app state: " + state.toString());
}
if (retry >= GET_APPID_MAX_RETRY_TIMES) {
throw new LoadException(errMsg + "wait too much time for getting appid. spark app state: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new LoadException(errMsg + "wait too much time for getting appid. spark app state: "
throw new LoadException(errMsg + " wait too much time for getting appi d. spark app state: "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errMsg already have a space at the end.

@wyb wyb changed the title [Spark load] Fe submit spark etl job [Spark load] [Fe 4/5] Fe submit spark etl job May 30, 2020
@wyb wyb changed the title [Spark load] [Fe 4/5] Fe submit spark etl job [Spark load][Fe 4/5] Fe submit spark etl job May 30, 2020
@wyb wyb changed the title [Spark load][Fe 4/5] Fe submit spark etl job [Spark load][Fe 5/6] Fe submit spark etl job Jun 10, 2020
@wyb wyb force-pushed the spark_load_fe_submit_etl_job branch 2 times, most recently from 6d0ef4e to 20df6f2 Compare June 13, 2020 13:06
public class Pair<F, S> {
public static PairComparator<Pair<?, Comparable>> PAIR_VALUE_COMPARATOR = new PairComparator<>();

@SerializedName(value = "first")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, this is not work

+ ", msg=" + tReadResponse.getOpStatus().getMessage());
}
failed = false;
return tReadResponse.getData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broker's pread() method does not guarantee to read the specified length of data currently.
But #3881 is trying to solve this problem. Just for remind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

morningman
morningman previously approved these changes Jun 18, 2020
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

imay
imay previously approved these changes Jun 18, 2020
@wyb wyb dismissed stale reviews from imay and morningman via ee27b34 June 18, 2020 08:51
imay
imay previously approved these changes Jun 18, 2020
Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wyb wyb force-pushed the spark_load_fe_submit_etl_job branch from ee27b34 to efcc796 Compare June 19, 2020 02:44
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman added the approved Indicates a PR has been approved by one committer. label Jun 19, 2020
@morningman morningman merged commit 532d15d into apache:master Jun 19, 2020
@EmmyMiao87 EmmyMiao87 mentioned this pull request Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load kind/feature Categorizes issue or PR as related to a new feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants