Skip to content

Conversation

@wyb
Copy link
Contributor

@wyb wyb commented May 28, 2020

  1. Users create spark load job through MySQL client.
    Spark load interface #3010 (comment)
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

  1. Fe analyzes LoadStmt and creates SparkLoadJob in LoadManager.

  2. Abstract a base class BulkLoadJob that contains shared code between BrokerLoadJob and SparkLoadJob.

  3. Users cancel spark load job through MySQL client.

CANCEL LOAD WHERE LABEL = 'label0'

#3433

@wyb wyb changed the title [Spark load] fe create job [Spark load] Fe create job May 28, 2020
@imay imay added area/load Issues or PRs related to all kinds of load kind/feature Categorizes issue or PR as related to a new feature. api-review Categorizes an issue or PR as actively needing an API review. labels May 29, 2020
private static final Logger LOG = LogManager.getLogger(BulkLoadJob.class);

// input params
protected BrokerDesc brokerDesc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think brokerDesc should be in the subclass of BulkLoadJob.
Although currently both broker load and spark load need a broker, but for spark load, it may not be required in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @morningman, I will improve this later, including persistence with json

private static final Logger LOG = LogManager.getLogger(SparkLoadJob.class);

// for global dict
public static final String BITMAP_DATA_PROPERTY = "bitmap_data";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this property is hard to understand and is coupled with the detail implementation of the global dict.
How about changing it to a more abstract nouns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for temporary use. I am investigating load from hive table, and i will update it recently.

// TODO(wyb): spark-load
//handler.killEtlJob(sparkAppHandle, appId, id, sparkResource);
} catch (Exception e) {
LOG.warn("kill etl job failed. id: {}, state: {}", id, state, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save the error msg somewhere for user to get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s not necessary, because clear job is just trying to kill etl job as much as possible.

@wyb wyb changed the title [Spark load] Fe create job [Spark load][Fe 3/5] Fe create job May 30, 2020
@wyb wyb force-pushed the spark_load_fe_create_job branch from 0d78b0c to edfa668 Compare June 3, 2020 13:29
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

:}
;

opt_cluster ::=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you just removed this grammar?

Copy link
Contributor Author

@wyb wyb Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoop uses opt_system, opt_cluster is no longer used

@imay imay added the approved Indicates a PR has been approved by one committer. label Jun 9, 2020
@morningman morningman merged commit 4fa9d8c into apache:master Jun 9, 2020
@EmmyMiao87 EmmyMiao87 mentioned this pull request Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load kind/feature Categorizes issue or PR as related to a new feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants