Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution For Connectivity Problem When Submitting Work From Master Node To Worker(s) #8

Open
Thelin90 opened this issue Sep 12, 2020 · 9 comments

Comments

@Thelin90
Copy link

Thelin90 commented Sep 12, 2020

I was inspired by this repository, and continue to build on it.

However, I also got the issue faced here: #1

I was getting:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I bashed my head around this for 2 nights, not being an expert in K8S I first thought something was wrong with how I started it up.

Either way, this is how I reproduced the problem:

1)

I checked my resources, and I made the following config:

spark-defaults.conf

spark.master spark://sparkmaster:7077
spark.driver.host sparkmaster
spark.driver.bindAddress sparkmaster
spark.executor.cores 1
spark.executor.memory 512m
spark.driver.extraLibraryPath /opt/hadoop/lib/native
spark.app.id KubernetesSpark

2)

And I ran minikube with:

minikube start --memory 8192 --cpus 4 --vm=true

3)

These were my spark-master and spark-worker scripts:

spark-worker.sh

#!/bin/bash

. /common.sh

getent hosts sparkmaster

if ! getent hosts sparkmaster; then
  sleep 5
  exit 0
fi

/usr/local/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://sparkmaster:7077 --webui-port 8081 --memory 2g

### Note I put 2g here just to be 100% confident I was not using to much resources.

spark-worker.sh

#!/bin/bash

. /common.sh

echo "$(hostname -i) sparkmaster" >> /etc/hosts

/usr/local/spark/bin/spark-class org.apache.spark.deploy.master.Master --host sparkmaster --port 7077 --webui-port 8080

4)

I then ran:

kubectl exec <master-pod-name> -it -- pyspark
>>>
sc.parallelize([1,2,3,4]).collect()
>>>
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

And error occurred!

I made sure to get access to 8081 and 4040 to investigate logs further:

kubectl port-forward <spark-worker-pod> 8081:8081
kubectl port-forward <spark-master-pod> 4040:4040

I then went in and:

http://localhost:8081/ --> Find my executor --> stderr (`http://localhost:8081/logPage/?appId=<APP-ID>&executorId=<EXECUTOR-ID>&logType=stderr`) ->

5)

I scratched my head, and I knew! I have enough resources, why does this not work!

And I could see:

Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: sparkmaster/10.101.97.213:41607
Caused by: java.net.ConnectException: Connection timed out

I then thought well, I done this right:

spark.driver.host sparkmaster
spark.driver.bindAddress sparkmaster

The docs mention that it can be either HOST or IP, I am good I thought. I saw the possible solution of:

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

Well this was not a problem for me, actually I had no iptables to resolve at all.

So I then verified the master IP with:

kubectl get pods -o wide

I then took the MASTER-IP and added it directly:

pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP>
>>> ....

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.0.0
      /_/

Using Python version 3.7.9 (default, Sep 10 2020 17:42:58)
SparkSession available as 'spark'.
>>> sc.parallelize([1,2,3,4,5,6]).collect()
[1, 2, 3, 4, 5, 6] <---- BOOOOOOM!!!!!!!!!!!!!!!

6)

SOLUTION:

spark-defaults.conf

spark.master spark://sparkmaster:7077
spark.executor.cores 1
spark.executor.memory 512m
spark.driver.extraLibraryPath /opt/hadoop/lib/native
spark.app.id KubernetesSpark

And add the IPs correctly:

spark-worker.sh

#!/bin/bash

. /common.sh

echo "$(hostname -i) sparkmaster" >> /etc/hosts

# We must set the IP address to the executors of the master pod, othewerwise we will get the error
# inside the worker trying to connect to master:
#
# 20/09/12 15:56:55 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your
# cluster UI to ensure that workers are registered and have sufficient resources
#
# When investigating the worker we can see:
# Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: s
#   parkmaster/10.101.97.213:34881
# Caused by: java.net.ConnectException: Connection timed out
#
# This means that when the spark-class ran, it was able to create the connection at init stage, but
# when pushing the spark-submit, it failed.
echo "spark.driver.host $(hostname -i)" >> /usr/local/spark/conf/spark-defaults.conf
echo "spark.driver.bindAddress $(hostname -i)" >> /usr/local/spark/conf/spark-defaults.conf

/usr/local/spark/bin/spark-class org.apache.spark.deploy.master.Master --host sparkmaster --port 7077 --webui-port 8080

In this case my SPARK_HOME is /usr/local/spark

My Dockerfile

FROM python:3.7-slim-stretch

# PATH
ENV PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Spark
ENV SPARK_VERSION 3.0.0
ENV SPARK_HOME /usr/local/spark
ENV SPARK_LOG_DIR /var/log/spark
ENV SPARK_PID_DIR /var/run/spark
ENV PYSPARK_PYTHON /usr/local/bin/python
ENV PYSPARK_DRIVER_PYTHON /usr/local/bin/python
ENV PYTHONUNBUFFERED 1
ENV HADOOP_COMMON org.apache.hadoop:hadoop-common:2.7.7
ENV HADOOP_AWS org.apache.hadoop:hadoop-aws:2.7.7
ENV SPARK_MASTER_HOST sparkmaster

# Java
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/

# Install curl
RUN apt-get update && apt-get install -y curl

# Install procps
RUN apt-get install -y procps

# Install coreutils
RUN apt-get install -y coreutils

# https://github.com/geerlingguy/ansible-role-java/issues/64
RUN apt-get update && mkdir -p /usr/share/man/man1 && apt-get install -y openjdk-8-jdk && \
    apt-get install -y ant && apt-get clean && rm -rf /var/lib/apt/lists/ && \
    rm -rf /var/cache/oracle-jdk8-installer;

# Download Spark, enables full functionality for spark-submit against docker container
RUN curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
        tar -zx -C /usr/local/ && \
        ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 ${SPARK_HOME}

# add scripts and update spark default config
ADD tools/docker/spark/common.sh tools/docker/spark/spark-master.sh tools/docker/spark/spark-worker.sh /
ADD tools/docker/spark/example_spark.py /

RUN chmod +x /common.sh /spark-master.sh /spark-worker.sh

ADD tools/docker/spark/spark-defaults.conf ${SPARK_HOME}/conf/spark-defaults.conf
ENV PATH $PATH:${SPARK_HOME}/bin

Currently bulding a streaming platform in this repo:

https://github.com/Thelin90/deiteo

@leehuwuj
Copy link

@Thelin90 thank you so much, your 2 nights save my first night :D

@leehuwuj
Copy link

@Thelin90 When I config as your, I can submit task from inside master pod but still have "Initial job has not accepted any resources" issue when submit task from other client pods.

@Thelin90
Copy link
Author

Thelin90 commented Nov 26, 2020

@leehuwuj it might be that you are not adding the IP correctly, make sure to check these steps again:

kubectl get pods -o wide

Go inside the pod as I describe above:

kubectl exec -it <pod> -n <namespace> -- /bin/bash

Take the IP, and run within the pod as I describe above:

pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP>

And then run:

sc.parallelize([1,2,3,4,5,6]).collect()

That should work.

Alternatively, I recommend you fork my repository:

https://github.com/Thelin90/deiteo

And try to run it with my instructions there, you might find if you done something different from me there.

I have automated the whole process with Makefile so should just be you pressing enter more or less, and if that does not work you can make an issue in the repo.

@Thelin90
Copy link
Author

@leehuwuj also are you running on Ubuntu or Mac?

@leehuwuj
Copy link

leehuwuj commented Nov 26, 2020

@Thelin90 I'm running on Mac. Run Submit command inside master node is OK but I can not run submit task from other pod.

@Thelin90
Copy link
Author

@leehuwuj I think you will need to be more specific with your issue, similar to what I have done here, I have described my problem in very great detail, and how i solved it, if you have a new problem you must try to put down step by step, what you are doing, and what is going wrong. I can't guess based on what you have written unfortunately.

But, I would recommend you to have a look in my own repo and try that, if that works for you, you might find something you are not doing right in your own code.

@Thelin90
Copy link
Author

Dear Mr. I just learned K8s not long ago. In my case, I need to process data in hdfs, so I may need to build a hadoop+spark cluster. But I heard that hadoop(hdfs) is not suitable for running on Kubernets, do you have any insights?

------------------ 原始邮件 ------------------ 发件人: "testdrivenio/spark-kubernetes" <notifications@github.com>; 发送时间: 2020年11月30日(星期一) 晚上10:22 收件人: "testdrivenio/spark-kubernetes"<spark-kubernetes@noreply.github.com>; 抄送: "tbabm"<1454088456@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [testdrivenio/spark-kubernetes] Solution For Connectivity Problem When Submitting Work From Master Node To Worker(s) (#8) Hello, sir. Excuse me, have you tried Kubernetes+hdfs+spark? @Thelin90 No since the only reason to use HDFS would be if you need to run operations on disk, hence ~10 TB+, which I never had to deal with, so no. In memory has worked just fine so far for my case. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Please stick to the context of this issue, your question has nothing to do with what is being discussed here.

@ashu2012
Copy link

ashu2012 commented Jan 7, 2021

spark-shell is fine from inside pod but spark submit for PI example is not succesfull causing some websocket closed connection issue

` spark-submit --name sparkpi-1 \

--master k8s://http://spark-master:7077
--deploy-mode cluster
--conf spark.kubernetes.driver.pod.name=$DRIVER_NAME
--conf spark.kubernetes.container.image=$DOCKER_IMAGE
--conf spark.kubernetes.container.image.pullPolicy=Never
--conf spark.driver.host=172.17.0.6
--conf spark.kubernetes.kerberos.enabled=false
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=2
local:///opt/spark-3.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.0.0.jar 10000000
21/01/06 15:32:20 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
21/01/06 15:32:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/01/06 15:32:21 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
21/01/06 15:32:21 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
21/01/06 15:32:21 WARN WatchConnectionManager: Exec Failure
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@6cd5d189 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@25cdb1c6[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:305)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:50)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:218)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:211)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:209)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:211)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.lang.Throwable: waiting here
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:144)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:341)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:755)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:739)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:70)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$1(KubernetesClientApplication.scala:129)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2538)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:129)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
... 4 more
21/01/06 15:32:21 INFO ShutdownHookManager: Shutdown hook called
21/01/06 15:32:21 INFO ShutdownHookManager: Deleting directory /tmp/spark-caec2177-42af-4c26-830a-a2a720e0da43
`

@saadzimat430
Copy link

saadzimat430 commented Apr 12, 2021

@leehuwuj it might be that you are not adding the IP correctly, make sure to check these steps again:

kubectl get pods -o wide

Go inside the pod as I describe above:

kubectl exec -it <pod> -n <namespace> -- /bin/bash

Take the IP, and run within the pod as I describe above:

pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP>

And then run:

sc.parallelize([1,2,3,4,5,6]).collect()

That should work.

Alternatively, I recommend you fork my repository:

https://github.com/Thelin90/deiteo

And try to run it with my instructions there, you might find if you done something different from me there.

I have automated the whole process with Makefile so should just be you pressing enter more or less, and if that does not work you can make an issue in the repo.

This also worked.
MASTER-POD is the Spark master pod name
MASTER-IP is the Spark master IP found in kubectl get pods -o wide

kubectl exec <MASTER-POD> -it -- pyspark --conf spark.driver.bindAddress=<MASTER-IP> --conf spark.driver.host=<MASTER-IP>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants