Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align the scaling strategy of the Ansible-based setup with the other setups #201

Merged
merged 3 commits into from
Aug 27, 2024

Conversation

julienrf
Copy link
Collaborator

@julienrf julienrf commented Aug 22, 2024

Fixes #192.

@julienrf
Copy link
Collaborator Author

I tested these changes by creating two Ubuntu containers with an SSH server, and a DynamoDB instance:

services:
  master:
    build: dockerfiles/ansible
  worker:
    build: dockerfiles/ansible

  dynamodb:
    command: "-jar DynamoDBLocal.jar -sharedDb -inMemory"
    image: "amazon/dynamodb-local:latest"
    expose:
      - 8000
    ports:
      - "8000:8000"
    working_dir: /home/dynamodblocal

Where, dockerfiles/ansible/Dockerfile is the following:

FROM ubuntu

RUN apt-get update && apt-get install -y openssh-server sudo software-properties-common iproute2

RUN mkdir /var/run/sshd

RUN useradd -ms /bin/bash ubuntu \
    && echo 'ubuntu:aaaaaa' | chpasswd \
    && sudo adduser ubuntu sudo \
    && echo "ubuntu ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ubuntu

RUN echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config \
    && echo "PermitRootLogin yes" >> /etc/ssh/sshd_config

EXPOSE 22

CMD ["/usr/sbin/sshd", "-D"]

I noted the IP addresses of the Spark master and worker nodes with the following commands:

docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' scylla-migrator-worker-1
docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' scylla-migrator-master-1

And I used those IP addresses in the Ansible inventory.

Then I ran ansible-playbook scylla-migrator.yml to set up the Migrator on both the Spark master and worker nodes.

Afterwards, I opened a terminal on both nodes to run start-spark.sh and start-slave.sh.

I created a DynamoDB table and put an item in it. Then, I edited the file dynamodb.config.yml to configure a migration from this table. Finally, I executed the migration with submit-alternator-job.sh.

@guy9
Copy link
Collaborator

guy9 commented Aug 25, 2024

@pdbossman please review

- Use `{start,stop}-worker.sh` instead of the deprecated `{start,stop}-slave.sh`
- Use `{start,stop}-mesos-shuffle-service.sh` instead of `{start,stop}-shuffle-service.sh`
According to Spark documentation, the configuration property `spark.driver.memory` has no effect in our case (we use the “client” deploy-mode):

>  In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

https://spark.apache.org/docs/latest/configuration.html
@julienrf julienrf force-pushed the align-ansible-setup branch from c0a2e06 to 84f8e9b Compare August 27, 2024 16:07
@julienrf julienrf marked this pull request as ready for review August 27, 2024 16:07
@julienrf julienrf merged commit 04aa85c into scylladb:master Aug 27, 2024
3 checks passed
@julienrf julienrf deleted the align-ansible-setup branch August 27, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Ansible-based approach does not document precisely how to configure the Spark cluster
2 participants