-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Spark load][Fe 5/6] Fe submit spark etl job #3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| public class Pair<F, S> { | ||
| public static PairComparator<Pair<?, Comparable>> PAIR_VALUE_COMPARATOR = new PairComparator<>(); | ||
|
|
||
| @SerializedName(value = "first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'am not sure this is ok, cause there is no guarantee that the F and S object can also be serialized by GSON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users guarantee this when use Pair class?like Map and List.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, this is not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a comment.
When using Pair for persistence, users need to guarantee that F and S can be serialized through Gson
| private static final String CONFIG_FILE_NAME = "jobconfig.json"; | ||
| private static final String APP_RESOURCE_LOCAL_PATH = PaloFe.DORIS_HOME_DIR + "/lib/" + APP_RESOURCE_NAME; | ||
| private static final String JOB_CONFIG_DIR = "configs"; | ||
| private static final String MAIN_CLASS = "org.apache.doris.load.loadv2.etl.SparkEtlJob"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about get it from SparkEtlJob.class.getXXX()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I comment and will replace with it when SparkEtlJob class is merged.
| throw new LoadException(errMsg + "spark app state: " + state.toString()); | ||
| } | ||
| if (retry >= GET_APPID_MAX_RETRY_TIMES) { | ||
| throw new LoadException(errMsg + "wait too much time for getting appid. spark app state: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw new LoadException(errMsg + "wait too much time for getting appid. spark app state: " | |
| throw new LoadException(errMsg + " wait too much time for getting appi d. spark app state: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errMsg already have a space at the end.
6d0ef4e to
20df6f2
Compare
| public class Pair<F, S> { | ||
| public static PairComparator<Pair<?, Comparable>> PAIR_VALUE_COMPARATOR = new PairComparator<>(); | ||
|
|
||
| @SerializedName(value = "first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, this is not work
| + ", msg=" + tReadResponse.getOpStatus().getMessage()); | ||
| } | ||
| failed = false; | ||
| return tReadResponse.getData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broker's pread() method does not guarantee to read the specified length of data currently.
But #3881 is trying to solve this problem. Just for remind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
imay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ee27b34 to
efcc796
Compare
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After user creates a spark load job which status is PENDING, Fe will schedule and submit the spark etl job.
2.1 Create etl job configuration according to Spark load interface #3010 (comment)
2.2 Upload the configuration file and job jar to HDFS with broker
2.3 Submit etl job to spark cluster
2.4 Wait for etl job submission result
#3433