Skip to content

5/cassandra-analytics

This branch is 1 commit ahead of, 23 commits behind apache/cassandra-analytics:trunk.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

66b5935 · Aug 19, 2024

History

73 Commits
Apr 22, 2024
Aug 19, 2024
Aug 8, 2024
Aug 8, 2024
Aug 19, 2024
Aug 19, 2024
Aug 19, 2024
Aug 19, 2024
Aug 19, 2024
Aug 3, 2024
Apr 2, 2024
Jun 29, 2023
Aug 19, 2024
Dec 7, 2023
Jun 29, 2023
Apr 4, 2024
May 19, 2023
Oct 6, 2023
Jul 16, 2024
Jun 29, 2023
Dec 5, 2023
Aug 14, 2024
Jan 27, 2024
Jun 29, 2023
Jun 27, 2023
Jul 17, 2024
Aug 19, 2024
May 19, 2023
Jul 12, 2024
Apr 4, 2024
Aug 19, 2024

Repository files navigation

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_contact_points", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.

Releases

No releases published

Packages

No packages published

Languages

  • Java 99.5%
  • Shell 0.5%