Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

1/15/2018

Reading time:1 min

markthebault/importCSVSparkCassandra

by John Doe

README.md This short example will show you how easily is to import CSV files from your AWS S3 bucketsusing spark into cassandra.Setting up your applicaitonClone this repository git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.gitOpen the file 'src/main/ressources/project.conf' and change your settings.You need to change the following values:CassandrahostportkeyspacetableAWSaccessKeysecretKeybucketfileNameBuild a jarTo build the Jar of your application you just need to run the command sbt clean assemblyDeploy the Jar on a spark clusterTo deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside.You have to push this Jar to a S3 public bucket aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jarOnce you have done that, you just need to run the spark-submit command as following:$SPARK_HOME/bin/spark-submit \ --verbose \ --master spark://IP_SPARK_MASTER:PORT \ --deploy-mode cluster \ --driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \ --class Application \ https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jarNote:Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link:http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jarplease consider of the http link have to be encoded you can use this website to encode the linkNextIf you want to contribute to this project feel free to do it, if you see some mistake please leave me an issue.

Illustration Image

README.md

This short example will show you how easily is to import CSV files from your AWS S3 buckets using spark into cassandra.

Setting up your applicaiton

Clone this repository git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.git Open the file 'src/main/ressources/project.conf' and change your settings.

You need to change the following values:

  • Cassandra
    • host
    • port
    • keyspace
    • table
  • AWS
    • accessKey
    • secretKey
    • bucket
    • fileName

Build a jar

To build the Jar of your application you just need to run the command sbt clean assembly

Deploy the Jar on a spark cluster

To deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside. You have to push this Jar to a S3 public bucket aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jar

Once you have done that, you just need to run the spark-submit command as following:

$SPARK_HOME/bin/spark-submit \
	--verbose \
	--master spark://IP_SPARK_MASTER:PORT \
	--deploy-mode cluster \
	--driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \
	--class Application \
	https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jar

Note: Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link: http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jar please consider of the http link have to be encoded you can use this website to encode the link

Next

If you want to contribute to this project feel free to do it, if you see some mistake please leave me an issue.

Related Articles

sstable
cassandra
spark

Spark and Cassandra’s SSTable loader

Arunkumar

11/1/2024

cassandra
spark

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

github