Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最flinkx master版本 yarn 模式提交任务失败 #51

Closed
yelijun opened this issue May 9, 2019 · 12 comments
Closed

最flinkx master版本 yarn 模式提交任务失败 #51

yelijun opened this issue May 9, 2019 · 12 comments

Comments

@yelijun
Copy link

yelijun commented May 9, 2019

flink version: 1.5.6
flinkx version github:master
提交任务失败
image

@yangsishu
Copy link
Contributor

有更详细的错误日志吗,只有这些报错很难排查的

@yangsishu
Copy link
Contributor

还有提交的命令发下

@yelijun
Copy link
Author

yelijun commented May 9, 2019

提交命令:bin/flinkx -mode yarn -job ./config-json/yarn-sale_rate_sale_rate.json -plugin /data/software/flinkx/plugins -flinkconf /data/software/flink-1.5.6/conf -yarnconf /etc/hadoop/conf
nohup.out.log

我把nohup日志发出来了

@yelijun
Copy link
Author

yelijun commented May 9, 2019

15:33:53.346 [flink-akka.actor.default-dispatcher-2] DEBUG akka.remote.transport.netty.NettyTransport - Remote connection to [test-ai-etl-c1-1/10.3.8.49:30736] was disconnected because of [id: 0x1edde4f2, /10.3.8.49:6385 :> test-ai-etl-c1-1/10.3.8.49:30736] DISCONNECTED
15:33:53.349 [flink-akka.actor.default-dispatcher-2] DEBUG akka.remote.transport.ProtocolStateActor - Association between local [tcp://flink@test-ai-etl-c1-1:6385] and remote [tcp://flink@test-ai-etl-c1-1:30736] was disassociated because the ProtocolStateActor failed: Unknown
15:33:53.353 [flink-akka.actor.default-dispatcher-2] WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@test-ai-etl-c1-1:30736] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@test-ai-etl-c1-1:30736]] Caused by: [The remote system explicitly disassociated (reason unknown).]
感觉和这个有关,但是不知道怎么解决

@yangsishu
Copy link
Contributor

你的flink yarnsession 是否启动正常

@yelijun
Copy link
Author

yelijun commented May 11, 2019

正常启动,yarn-session.sh -d
在yarn资源管理 能查看的到

@lijiangbo
Copy link
Contributor

能把yarn界面运行的yarn session截图发下吗

@yelijun
Copy link
Author

yelijun commented May 11, 2019

image
image
这样可以不?

@lijiangbo
Copy link
Contributor

taskmanager没有起起来吗,看下jobmanager的启动情况

@yelijun
Copy link
Author

yelijun commented May 11, 2019

2019-05-09 16:11:45,653 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-05-09 16:11:45,654 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-05-09 16:11:45,657 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon is running as: root Yarn client user obtainer: root
2019-05-09 16:11:45,659 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2019-05-09 16:11:45,659 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: internal.cluster.execution-mode, NORMAL
2019-05-09 16:11:45,659 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, application_1544768545362_1516
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 10.3.8.49
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024
2019-05-09 16:11:45,660 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-05-09 16:11:45,674 INFO org.apache.flink.runtime.clusterframework.BootstrapTools - Setting directories for temporary files to: /data/yarn/nm/usercache/root/appcache/application_1544768545362_1516,/data1/yarn/nm/usercache/root/appcache/application_1544768545362_1516,/data2/yarn/nm/usercache/root/appcache/application_1544768545362_1516,/data3/yarn/nm/usercache/root/appcache/application_1544768545362_1516
2019-05-09 16:11:45,686 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint.
2019-05-09 16:11:45,686 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2019-05-09 16:11:45,736 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2019-05-09 16:11:45,750 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2019-05-09 16:11:45,755 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at test-ai-etl-c1-2:20698
2019-05-09 16:11:46,218 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-05-09 16:11:46,269 INFO akka.remote.Remoting - Starting remoting
2019-05-09 16:11:46,422 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@test-ai-etl-c1-2:20698]
2019-05-09 16:11:46,431 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink@test-ai-etl-c1-2:20698
2019-05-09 16:11:46,452 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /data3/yarn/nm/usercache/root/appcache/application_1544768545362_1516/blobStore-3e28573d-d50a-4740-9d62-e321a6053cdf
2019-05-09 16:11:46,453 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:1599 - max concurrent requests: 50 - max backlog: 1000
2019-05-09 16:11:46,468 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2019-05-09 16:11:46,472 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /data/yarn/nm/usercache/root/appcache/application_1544768545362_1516/executionGraphStore-90531c01-f565-4b1e-8249-5d9004ee0d1c, expiration time 3600000, maximum cache size 52428800 bytes.
2019-05-09 16:11:46,495 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /data3/yarn/nm/usercache/root/appcache/application_1544768545362_1516/blobStore-fdba6d30-bd71-443a-8822-d7f4621efb7e
2019-05-09 16:11:46,504 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-e2e9a8e7-e5e7-444d-b571-683487e9fd1f/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-05-09 16:11:46,504 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-e2e9a8e7-e5e7-444d-b571-683487e9fd1f/flink-web-upload for file uploads.
2019-05-09 16:11:46,507 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2019-05-09 16:11:46,758 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /data2/yarn/container-logs/application_1544768545362_1516/container_1544768545362_1516_01_000001/jobmanager.log
2019-05-09 16:11:46,758 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /data2/yarn/container-logs/application_1544768545362_1516/container_1544768545362_1516_01_000001/jobmanager.out
2019-05-09 16:11:46,837 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at test-ai-etl-c1-2:15911
2019-05-09 16:11:46,837 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://test-ai-etl-c1-2:15911 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-05-09 16:11:46,837 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://test-ai-etl-c1-2:15911.
2019-05-09 16:11:46,892 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2019-05-09 16:11:46,933 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2019-05-09 16:11:46,967 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at test-ai-etl-c1-3/10.3.8.48:8030
2019-05-09 16:11:47,199 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2019-05-09 16:11:47,201 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2019-05-09 16:11:47,203 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@test-ai-etl-c1-2:20698/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2019-05-09 16:11:47,203 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2019-05-09 16:11:47,216 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink@test-ai-etl-c1-2:20698/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2019-05-09 16:11:47,218 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2019-05-10 14:22:29,706 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken
2019-05-10 14:22:34,709 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken
2019-05-10 14:22:39,710 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken
2019-05-10 14:22:44,712 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken
2019-05-10 14:22:49,713 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken
2019-05-10 14:22:54,715 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Updating with new AMRMToken

我之前也是这样的,没有提交任务都是tm个数都是0的,如果有任务提交了,就有对应个数的tm

@lijiangbo
Copy link
Contributor

我看你的flink是1.5.6版本的,master分支是基于1.5.4版本,没有测过1.5.6版本

@yelijun
Copy link
Author

yelijun commented May 11, 2019

杨思枢老师说1.5.x应该都可以的
那我用1.5.4版本的flink试试看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants