Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

12/2/2020

Reading time:2 min

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

by Spark Summit

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin SlideShare Explore You Successfully reported this slideshow.Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinUpcoming SlideShareLoading in …5× 5 Comments 11 Likes Statistics Notes Yuta Watanabe Luciano Resende , Open Source Data Science/AI Platform Architect at IBM Victor Coustenoble , Technical regional manager EMEA at Trifacta Ramin Orujov , Big Data Software Engineer at Luxoft Nitin Sareen , Manager at Accenture at Accenture Show More No DownloadsNo notes for slide 1. ©2013 DataStax Confidential. Do not distribute without consent.@PatrickMcFadinPatrick McFadin
Chief Evangelist, DataStaxSpark and Cassandra: An amazingApache love story1 2. Store a ton of data Analyze a ton of data 3. Community Response? 4. CassandraOnly DC 5. CassandraOnly DCCassandra+ Spark DCSparkJobs 6. CassandraOnly DCCassandra+ Spark DCSparkJobsSparkStreaming 7. WorkerWorkerWorker WorkerAnalytics WorkloadTransactional Workload 8. DataStax Enterprise 9. DataStax Enterprise 10. • 10T of high frequency event data daily• Constant increasing volume“The web server that powers the interface can query bothdatacenters, depending on which the user is closest to,”“A small set of signals tend to double every eight months. Sowe needed a model that can scale linearly.”- Arun Jayandra, Microsoft 11. REST APIO365 Event HubIngestion Worker(Azure worker role using DataStax C# driver)C* AnalyticsREST APIO365KafkaC*/Spark StreamingAnalyticsG4 – Local SSDKafka: G4 – Data DiskZooKeeper: A7 – Data DiskPaaS SmallG4 – Local SSDCluster 1:Cluster 2:20k – 50k events/sec200k+ events/sec 12. Data Protection• Maximilian Schrems v Data Protection Commissioner• No longer OK to ship EU data to US under “SafeHarbour”Product_Catalog RF=3Product_Catalog RF=3 Customer_Data RF=3Customer_Data RF=0Product_Catalog RF=3Customer_Data RF=3 13. • 300k customers• Report on energy usage• Predict boiler failure“We’re dealing largely with time series data, and Spark is 10 to 100times quicker as it is operating on data in-memory…Cassandradelivers what we need today and if you look at the Internet of Thingsspace; that is what is really useful right now.” - Jim Anning, British GasHive Active Heating™ 14. CassandraOnly DCCassandra+ Spark DCSparkJobsSparkStreamingHome Data CenterHive Active Heating™ 15. Store a ton of data Analyze a ton of dataThank you! Recommended Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...Spark Summit A Scaleable Implemenation of Deep Leaning on Spark- Alexander UlanovSpark Summit Netflix branding stumblesMayur Verma Distributed Heterogeneous Mixture Learning On SparkSpark Summit Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...Spark Summit Breaking Down Analytical and Computational Barriers Across the Energy Industr...Spark Summit Sparkling Random Ferns by P Dendek and M FedoryszakSpark Summit Data Science at Scale by Sarah GuidoSpark Summit Improving the power of a picture at Netflix -- the Science and Engineering Be...Gopal Krishnan Integrating with Hadoop: Couchbase Connect 2014Couchbase About Blog Terms Privacy Copyright × Public clipboards featuring this slideNo public clipboards found for this slideSelect another clipboard ×Looks like you’ve clipped this slide to already.Create a clipboardYou just clipped your first slide! Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips. Description Visibility Others can see my Clipboard

Illustration Image
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Successfully reported this slideshow.

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist, DataStax...
Store a ton of data Analyze a ton of data
Community Response?
Cassandra
Only DC
Cassandra
Only DC
Cassandra
+ Spark DC
Spark
Jobs
Cassandra
Only DC
Cassandra
+ Spark DC
Spark
Jobs
Spark
Streaming
Worker
Worker
Worker Worker
Analytics WorkloadTransactional Workload
DataStax Enterprise
DataStax Enterprise
• 10T of high frequency event data daily
• Constant increasing volume
“The web server that powers the interface can query ...
REST	API
O365 Event	Hub
Ingestion	
Worker
(Azure	worker	role	
using	DataStax C#	
driver)
C* Analytics
REST	API
O365
Kafka
...
Data Protection
• Maximilian Schrems v Data Protection Commissioner
• No longer OK to ship EU data to US under “Safe
Harbo...
• 300k customers
• Report on energy usage
• Predict boiler failure
“We’re dealing largely with time series data, and Spark...
Cassandra
Only DC
Cassandra
+ Spark DC
Spark
Jobs
Spark
Streaming
Home Data Center
Hive Active Heating™
Store a ton of data Analyze a ton of data
Thank you!
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Upcoming SlideShare

Loading in …5

×

  1. 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist, DataStax Spark and Cassandra: An amazing Apache love story 1
  2. 2. Store a ton of data Analyze a ton of data
  3. 3. Community Response?
  4. 4. Cassandra Only DC
  5. 5. Cassandra Only DC Cassandra + Spark DC Spark Jobs
  6. 6. Cassandra Only DC Cassandra + Spark DC Spark Jobs Spark Streaming
  7. 7. Worker Worker Worker Worker Analytics WorkloadTransactional Workload
  8. 8. DataStax Enterprise
  9. 9. DataStax Enterprise
  10. 10. • 10T of high frequency event data daily • Constant increasing volume “The web server that powers the interface can query both datacenters, depending on which the user is closest to,” “A small set of signals tend to double every eight months. So we needed a model that can scale linearly.” - Arun Jayandra, Microsoft
  11. 11. REST API O365 Event Hub Ingestion Worker (Azure worker role using DataStax C# driver) C* Analytics REST API O365 Kafka C*/ Spark Streaming Analytics G4 – Local SSD Kafka: G4 – Data Disk ZooKeeper: A7 – Data Disk PaaS Small G4 – Local SSD Cluster 1: Cluster 2: 20k – 50k events/sec 200k+ events/sec
  12. 12. Data Protection • Maximilian Schrems v Data Protection Commissioner • No longer OK to ship EU data to US under “Safe Harbour” Product_Catalog RF=3 Product_Catalog RF=3 Customer_Data RF=3 Customer_Data RF=0 Product_Catalog RF=3 Customer_Data RF=3
  13. 13. • 300k customers • Report on energy usage • Predict boiler failure “We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.” - Jim Anning, British Gas Hive Active Heating™
  14. 14. Cassandra Only DC Cassandra + Spark DC Spark Jobs Spark Streaming Home Data Center Hive Active Heating™
  15. 15. Store a ton of data Analyze a ton of data Thank you!

×

Related Articles

python
cassandra
spark

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

andreia-negreira

12/2/2023

cassandra
spark

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra