Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/15/2018

Reading time:1 min

Spark SQL cassandra delete records

by John Doe

Vote count:
 
 0
 

 
 





Is there a way to delete some records based on a select query?I have this query,Select min(id) from ID having count(*)>1 which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?
 asked Apr 28 '16 at 4:48

 ashK
 
 1881214
 

 1 Answer
 1
Vote count:
 
 0
 






Spark SQL does not support DELETE.If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:import scala.collection.JavaConverters._import scala.collection.JavaConversions._import com.datastax.driver.core.{Cluster, Session, BatchStatement}import com.datastax.driver.core.querybuilder.QueryBuilderval cluster = Cluster.builder().addContactPoint(host_ip).build()val session = cluster.connect(keyspace)val idsToDelete = ... // perform your query and collect the idsval queries = idsToDelete.map({ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) })val batch = batchStatement().addAll(queries.asJava)session.execute(batch)cluster.close
 edited Oct 27 '16 at 16:34

 
 

 answered Oct 27 '16 at 15:18

 Didier
 

Illustration Image

Vote count: 0

Is there a way to delete some records based on a select query?

I have this query,

Select min(id) from ID having count(*)>1 which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?

asked Apr 28 '16 at 4:48
ashK
1881214

1 Answer 1

Vote count: 0

Spark SQL does not support DELETE.

If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:

import scala.collection.JavaConverters._
import scala.collection.JavaConversions._
import com.datastax.driver.core.{Cluster, Session, BatchStatement}
import com.datastax.driver.core.querybuilder.QueryBuilder
val cluster = Cluster.builder().addContactPoint(host_ip).build()
val session = cluster.connect(keyspace)
val idsToDelete = ... // perform your query and collect the ids
val queries = idsToDelete.map({ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) })
val batch = batchStatement().addAll(queries.asJava)
session.execute(batch)
cluster.close
answered Oct 27 '16 at 15:18

Related Articles

sstable
cassandra
spark

Spark and Cassandra’s SSTable loader

Arunkumar

11/1/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra