scylladb · guy9 · Jul 14, 2024 · Jun 10, 2024 · Jun 27, 2024 · Jun 29, 2024
diff --git a/README.md b/README.md
@@ -1,129 +1,12 @@
-# Ansible deployment
+# ScyllaDB Migrator
 
-An ansible playbook is provided in ansible folder.  The ansible playbook will install the pre-requisites, spark, on the master and workers added to the `ansible/inventory/hosts` file.  Scylla-migrator will be installed on the spark master node.
-1. Update `ansible/inventory/hosts` file with master and worker instances
-2. Update `ansible/ansible.cfg` with location of private key if necessary
-3. The `ansible/template/spark-env-master-sample` and `ansible/template/spark-env-worker-sample` contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.
-4. run `ansible-playbook scylla-migrator.yml`
-5. On the spark master node:
-  cd scylla-migrator
-  `./start-spark.sh`
-6. On the spark worker nodes:
-  `./start-slave.sh`
-7. Open spark web console
-  - Ensure networking is configured to allow you access spark master node via 8080 and 4040
-  - visit http://<spark-master-hostname>:8080
-8. Review and modify `config.yaml` based whether you're performing a migration to CQL or Alternator
-  - If you're migrating to Scylla CQL interface (from Cassandra, Scylla, or other CQL source), make a copy review the comments in `config.yaml.example`, and edit as directed.
-  - If you're migrating to Alternator (from DynamoDB or other Scylla Alternator), make a copy, review the comments in `config.dynamodb.yml`, and edit as directed.
-9. As part of ansible deployment, sample submit jobs were created.  You may edit and use the submit jobs.
-  - For CQL migration: Edit `scylla-migrator/submit-cql-job.sh`, change line `--conf spark.scylla.config=config.yaml \` to point to the whatever you named the config.yaml in previous step.
-  - For Alternator migration: Edit `scylla-migrator/submit-alternator-job.sh`, change line `--conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \` to reference the config.yaml file you created and modified in previous step.
-10. Ensure the table has been created in the target environment.
-11. Submit the migration by submitting the appropriate job
-  - CQL migration: `./submit-cql-job.sh`
-  - Alternator migration: `./submit-alternator-job.sh`
-12. You can monitor progress by observing the spark web console you opened in step 7.  Additionally, after the job has started, you can track progress via http://<spark-master-hostname>:4040.  
-  FYI: When no spark jobs are actively running, the spark progress page at port 4040 displays unavailable.  It is only useful and renders when a spark job is in progress.
+The ScyllaDB Migrator is a Spark application that migrates data to ScyllaDB from CQL-compatible or DynamoDB-compatible databases.
 
-# Configuring the Migrator
+## Documentation
 
-Create a `config.yaml` for your migration using the template `config.yaml.example` in the repository root. Read the comments throughout carefully.
+See https://migrator.docs.scylladb.com.
 
-# Running on a live Spark cluster
-
-The Scylla Migrator is built against Spark 3.5.1, so you'll need to run that version on your cluster.
-
-Download the latest [release](https://github.com/scylladb/scylla-migrator/releases) of the migrator:
-
-~~~ sh
-wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar
-~~~
-
-Alternatively, you can [build](#building) a custom version of the migrator.
-
-Copy the jar `scylla-migrator-assembly.jar` and the `config.yaml` you've created to the Spark master server.
-
-Start the spark master and slaves.
-`cd scylla-migrator`
-`./start-spark.sh`
-
-On worker instances:
-`./start-slave.sh`
-
-Configure and confirm networking between:
-- source and spark servers
-- target and spark servers
-
-Create schema in target server.
-
-Then, run this command on the Spark master server:
-```shell
-spark-submit --class com.scylladb.migrator.Migrator \
-  --master spark://<spark-master-hostname>:7077 \
-  --conf spark.scylla.config=<path to config.yaml> \
-  <path to scylla-migrator-assembly.jar>
-```
-
-If you pass on the truststore file or ssl related files use `--files` option:
-```shell
-spark-submit --class com.scylladb.migrator.Migrator \
-  --master spark://<spark-master-hostname>:7077 \
-  --conf spark.scylla.config=<path to config.yaml> \
-  --files truststorefilename \
-  <path to scylla-migrator-assembly.jar>
-```
-
-# Running the validator
-
-This project also includes an entrypoint for comparing the source
-table and the target table. You can launch it as so (after performing
-the previous steps):
-
-```shell
-spark-submit --class com.scylladb.migrator.Validator \
-  --master spark://<spark-master-hostname>:7077 \
-  --conf spark.scylla.config=<path to config.yaml> \
-  <path to scylla-migrator-assembly.jar>
-```
-
-# Running locally
-
-To run in the local Docker-based setup:
-
-1. First start the environment:
-```shell
-docker compose up -d
-```
-
-2. Launch `cqlsh` in Cassandra's container and create a keyspace and a table with some data:
-```shell
-docker compose exec cassandra cqlsh
-<create stuff>
-```
-
-3. Launch `cqlsh` in Scylla's container and create the destination keyspace and table with the same schema as the source table:
-```shell
-docker compose exec scylla cqlsh
-<create stuff>
-```
-
-4. Edit the `config.yaml` file; note the comments throughout.
-
-5. Run `build.sh`.
-
-6. Then, launch `spark-submit` in the master's container to run the job:
-```shell
-docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
-  --master spark://spark-master:7077 \
-  --conf spark.driver.host=spark-master \
-  --conf spark.scylla.config=/app/config.yaml \
-  /jars/scylla-migrator-assembly.jar
-```
-
-The `spark-master` container mounts the `./migrator/target/scala-2.13` dir on `/jars` and the repository root on `/app`. To update the jar with new code, just run `build.sh` and then run `spark-submit` again.
-
-# Building
+## Building
 
 To test a custom version of the migrator that has not been [released](https://github.com/scylladb/scylla-migrator/releases), you can build it yourself by cloning this Git repository and following the steps below:
 
@@ -132,3 +15,7 @@ To test a custom version of the migrator that has not been [released](https://gi
    JDK installation.
 3. Run `build.sh`.
 4. This will produce the .jar file to use in the `spark-submit` command at path `migrator/target/scala-2.13/scylla-migrator-assembly.jar`.
+
+## Contributing
+
+Please refer to the file [CONTRIBUTING.md](/CONTRIBUTING.md).
diff --git a/ansible/templates/spark-env-master-sample b/ansible/templates/spark-env-master-sample
@@ -8,7 +8,7 @@
 # MEMORY is used in the spark-submit job and allocates the memory per executor.
 # You can have one or more executors per worker.
 # 
-# By using multiple workers on an instance, we can control the velocit of the migration.
+# By using multiple workers on an instance, we can control the velocity of the migration.
 #
 # Eg. 
 #    Target system is 3 x i4i.4xlarge (16 vCPU, 128G)

diff --git a/ansible/templates/spark-env-worker-sample b/ansible/templates/spark-env-worker-sample
@@ -8,7 +8,7 @@
 # MEMORY is used in the spark-submit job and allocates the memory per executor.
 # You can have one or more executors per worker.
 # 
-# By using multiple workers on an instance, we can control the velocit of the migration.
+# By using multiple workers on an instance, we can control the velocity of the migration.
 #
 # Eg. 
 #    Target system is 3 x i4i.4xlarge (16 vCPU, 128G)

diff --git a/config.yaml.example b/config.yaml.example
@@ -268,8 +268,7 @@ renames: []
 # create a savepoint file with this filled.
 skipTokenRanges: []
 
-# Configuration section for running the validator. The validator is run manually (see README)
-# and currently only supports comparing a Cassandra source to a Scylla target.
+# Configuration section for running the validator. The validator is run manually (see README).
 validation:
   # Should WRITETIMEs and TTLs be compared?
   compareTimestamps: true

diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -1,23 +1,4 @@
-version: '3'
-
 services:
-  scylla:
-    image: scylladb/scylla:latest
-    networks:
-      - scylla
-    volumes:
-      - ./data/scylla:/var/lib/scylla
-    ports:
-      - "8000:8000"
-    command: "--smp 2 --memory 2048M --alternator-port 8000 --alternator-write-isolation always_use_lwt"
-
-  cassandra:
-    image: cassandra:latest
-    networks:
-      - scylla
-    volumes:
-      - ./data/cassandra:/var/lib/cassandra
-
   spark-master:
     build: dockerfiles/spark
     command: master
@@ -26,7 +7,7 @@ services:
     environment:
       SPARK_PUBLIC_DNS: spark-master
     networks:
-      - scylla
+      - spark
     expose:
       - 7001
       - 7002
@@ -58,7 +39,7 @@ services:
       SPARK_WORKER_WEBUI_PORT: 8081
       SPARK_PUBLIC_DNS: spark-worker
     networks:
-      - scylla
+      - spark
     expose:
       - 7012
       - 7013
@@ -75,4 +56,4 @@ services:
       - spark-master
 
 networks:
-  scylla:
+  spark:
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -109,7 +109,7 @@
     "hide_feedback_buttons": "false",
     "github_issues_repository": "scylladb/scylla-migrator",
     "github_repository": "scylladb/scylla-migrator",
-    "site_description": "Migrate data extract using Spark to Scylla, normally from Cassandra.",
+    "site_description": "Migrate data using Spark from Cassandra or DynamoDB to Scylla.",
     "hide_version_dropdown": [],
     "zendesk_tag": "",
     "versions_unstable": UNSTABLE_VERSIONS,