-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minerkasch/spark #2407
Minerkasch/spark #2407
Conversation
'totalDuration': ('spark.executor.total_duration', HISTOGRAM), | ||
'totalInputBytes': ('spark.executor.total_input_bytes', HISTOGRAM), | ||
'totalShuffleRead': ('spark.executor.total_shuffle_read', HISTOGRAM), | ||
'totalShuffleWrite': ('spark.executor.HISTOGRAM', HISTOGRAM), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the metric name here
Thanks @zachradtka! I've added a few comments here and there, but overall the check looks good! Once you've addressed them we'll test the check on our own Spark setup, I may have more feedback for you then. |
@olivielpeau, Thanks for the change suggestions! All have been rectified. |
Instead of: this should be called: |
@buryat: Thanks for the feedback. What version of Spark are you using and what version of YARN are you using? The URL that you gave me is a bit interesting, because it uses Also, could you possibly run zepplin again and give me the output of With that I should be able to fix the errors you are seeing. Thanks! |
@zachradtka yeah, I'm sorry, I made a typo when I was redacting the hostname. I've corrected my original comment. I use AWS EMR-4.3.0, it runs Hadoop 2.7.1, Spark 1.6.0.
|
@buryat Thanks for the response and for the output. I updated the integration to get the Spark application ID's from the Spark rest interface. That should fix the problems you were seeing before. I also updated the unit tests to reflect my code changes. Please let me know how it goes! |
@olivielpeau
I added the following metrics to easily get the job, stage, and executor count:
I also added the ability to add custom tags in the yaml file and required users to enter a Let me know if there are any more requests. |
self.service_check(SPARK_SERVICE_CHECK, | ||
AgentCheck.OK, | ||
tags=['url:%s' % am_address], | ||
message='Connection to ApplicationMaster "%s" was successful' % rm_address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want am_address
instead of rm_address
here
I added the |
When I specify
Config file: init_config:
instances:
- resourcemanager_uri: http://ip-10-154-242-131.ec2.internal:8088
- cluster_name: MySparkCluster |
I just checked my setup and I was able to replicate your error and fix it by removing the Try the following config: init_config:
instances:
- resourcemanager_uri: http://ip-10-154-242-131.ec2.internal:8088
cluster_name: MySparkCluster I am not the best at YAML, but I believe the |
@zachradtka oh, I'm so sorry, that was mistake completely. |
@buryat has some additional feedback on the To address this the check could send @zachradtka What do you think? |
@olivielpeau It would probably be best to tag the metric with the state. I am not sure what complexities that will add. I will look into it though. |
- resourcemanager_uri: http://localhost:8088 | ||
|
||
# An optional friendly name can be specified for the cluster. | ||
# cluster_name: MySparkCluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: could you add two spaces of indentation on these two lines (given that they're at the same level as the tags
, i.e. at the instance level)
@olivielpeau I made a few updates:
|
self._set_metrics_from_json(tags, job, SPARK_JOB_METRICS) | ||
|
||
if len(response): | ||
self._set_metric('spark.job.count', INCREMENT, len(response), tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tags
that are passed here are the tags
of the last job of the loop above, so the status
tag won't be accurate.
Instead we should either:
- increment this count by 1 for every job, and with the
status
tag (in order to have acount
per status) - increment this count by
len(response)
only once per app, and without thestatus
tag (in order to have only a globalcount
for all statuses)
I would go with option 1 since having a count per status sounds useful to me, but let me know what you think.
This applies to the spark.stage.count
metric too.
Also, if possible it'd be good to have tests that reflect how different statuses on the jobs
and stages
should be counted.
@zachradtka Added in one comment on your latest changes |
@olivielpeau Thanks for the feedback, I should have caught the count with the incorrect tag. That has been fixed and tests have been added. |
@zachradtka Looks good to me, thanks! Could you squash your commits into one? |
224971b
to
2f96b4a
Compare
@olivielpeau Commits squashed! |
Merging! 🎉 |
An AgentCheck that gathers metrics for Apache Spark
The metrics collected are: