Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

5/29/2020

Reading time:4 min

DSE 6.8 - Recommended production settings

by John Doe

Configure the chunk cache based on your workload type to increase performance.Beginning in DataStax Enterprise (DSE) 6.0, the amount of native memory used by the DSE process has increased significantly.The main reason for this increase is the chunk cache (or file cache), which is like an OS page cache. The following sections provide additional information:See Chunk cache history for a historical description of the chunk cache, and how it is calculated in DSE 6.0 and later. See Chunk cache differences from OS page cache to understand key differences between the chunk cache and the OS page cache. Consider the following recommendations depending on workload type for your cluster.DSE recommendationsRegarding DSE, consider the following recommendations when choosing the max direct memory and file cache size:Total server memory size Adequate memory for the OS and other applications Adequate memory for the Java heap size Adequate memory for native raw memory (such as bloom filters and off-heap memtables) For 64 GB servers, the default settings are typically adequate. For larger servers, increase the max direct memoryincrease the max direct memory (-XX:MaxDirectMemorySize), but leave approximately 15-20% of memory for the OS and other in-memory structures. The chunk cache (file_cache_size_in_mb) is set automatically to be half of the MaxDirectMemorySize value. This calculated file_cache_size_in_mb default, though, might not result in optimal performance. If the cache hit rate is too low and there is still available memory on the server, in your development environment, try increasing the file_cache_size_in_mb value by setting it explicitly in , up to 90% of the MaxDirectMemorySize value. Disabling asynchronous I/O (AIO) and explicitly setting the chunk cache size (file_cache_size_in_mb) improves performance for most DSE Search workloads. When enforced, SSTables and Lucene segments, as well as other minor off-heap elements, will reside in the OS page cache and be managed by the kernel.A potentially negative impact of disabling AIO might be measurably higher read latency when DSE goes to disk, in cases where the dataset is larger than available memory.To disable AIO and set the chunk cache size, see Disable AIO.DSE Analytics recommendationsDSE Analytics relies heavily on memory for performance. Because Apache Spark™ effectively manages its own memory through the Apache Spark application settings, you must determine how much memory the Apache Spark application receives. Therefore, you must think about how much memory to allocate to the chunk cache versus how much memory to allocate for Apache Spark applications. Similar to DSE Search, you can disable AIO and lower the chunk cache size to provide Apache Spark with more memory.DSE Graph recommendationsBecause DSE Graph heavily relies on several different workloads, it’s important to follow the previous recommendations for the specific workload. If you use DSE Search or DSE Analytics with DSE Graph, lower the chunk cache and disable AIO for the best performance. If you use DSE Graph only on top of Apache Cassandra, increase the chunk cache gradually, leaving 15-20% of memory available for other processes.Chunk cache differences from OS page cacheThere are several differences between the chunk cache and the OS page cache, and a full description is outside the scope of this information. However, the following differences are relevant to DSE:Because the OS page cache is sized dynamically by the operating system, it can grow and shrink depending on the available server memory. The chunk cache must be sized statically.If the chunk cache is too small, the available server memory will be unused. For servers with large amounts of memory (50 GB or more), the memory is wasted. If the chunk cache is too large, the available memory on the server can reduce enough that the OS will kill the DSE process to avoid an out of memory issue.Note: At the time of writing, the size of the chunk cache cannot be changed dynamically so to change the size of the chunk cache the DSE process must be restarted. Restarting the DSE process will destroy the chunk cache, so each time the process is restarted, the chunk cache will be cold. The OS page cache only becomes cold after a server restart. The memory used by the file cache is part of the DSE process memory, and is therefore seen by the OS as user memory. However, the OS page cache memory is seen as buffer memory. The chunk cache uses mostly NIO direct memory, storing file chunks into NIO byte buffers. However, NIO does have an on-heap footprint, which DataStax is working to reduce. Chunk cache historyThe chunk cache is not new to Apache Cassandra, and was originally intended to cache small parts (chunks) of SSTable files to make read operations faster. However, the default file access mode was memory mapped until DSE 5.1, so the chunk cache had a secondary role and its size was limited to 512 MB.Note: The default setting of 512 MB was configured by thefile_cache_size_in_mbparameter in cassandra.yaml.In DSE 6.0 and later, the chunk cache has increased relevance, not just because it replaces the OS page cache for database read operations, but because it is a central component of the asynchronous thread-per-core (TPC) architecture.By default, the chunk cache is configured to use the following portion of the max direct memory:One-half (½) of the max direct memory for the DSE process One-fourth (¼) of the max direct memory for tools The max direct memory is calculated as one-half (½) of the system memory minus the JVM heap size:Max direct memory = ((system memory - JVM heap size))/2You can explicitly configure the max direct memory by setting the JVM MaxDirectMemorySize (-XX:MaxDirectMemorySize) parameter. Alternatively, you can override the max direct memory setting by explicitly configuring the file_cache_size_in_mb parameter in .

Illustration Image

Configure the chunk cache based on your workload type to increase performance.

Beginning in DataStax Enterprise (DSE) 6.0, the amount of native memory used by the DSE process has increased significantly.

The main reason for this increase is the chunk cache (or file cache), which is like an OS page cache. The following sections provide additional information:

Consider the following recommendations depending on workload type for your cluster.

DSE recommendations

Regarding DSE, consider the following recommendations when choosing the max direct memory and file cache size:

  • Total server memory size
  • Adequate memory for the OS and other applications
  • Adequate memory for the Java heap size
  • Adequate memory for native raw memory (such as bloom filters and off-heap memtables)

For 64 GB servers, the default settings are typically adequate. For larger servers, increase the max direct memoryincrease the max direct memory (-XX:MaxDirectMemorySize), but leave approximately 15-20% of memory for the OS and other in-memory structures. The chunk cache (file_cache_size_in_mb) is set automatically to be half of the MaxDirectMemorySize value. This calculated file_cache_size_in_mb default, though, might not result in optimal performance. If the cache hit rate is too low and there is still available memory on the server, in your development environment, try increasing the file_cache_size_in_mb value by setting it explicitly in , up to 90% of the MaxDirectMemorySize value.

Disabling asynchronous I/O (AIO) and explicitly setting the chunk cache size (file_cache_size_in_mb) improves performance for most DSE Search workloads. When enforced, SSTables and Lucene segments, as well as other minor off-heap elements, will reside in the OS page cache and be managed by the kernel.

A potentially negative impact of disabling AIO might be measurably higher read latency when DSE goes to disk, in cases where the dataset is larger than available memory.

To disable AIO and set the chunk cache size, see Disable AIO.

DSE Analytics recommendations

DSE Analytics relies heavily on memory for performance. Because Apache Spark™ effectively manages its own memory through the Apache Spark application settings, you must determine how much memory the Apache Spark application receives. Therefore, you must think about how much memory to allocate to the chunk cache versus how much memory to allocate for Apache Spark applications. Similar to DSE Search, you can disable AIO and lower the chunk cache size to provide Apache Spark with more memory.

DSE Graph recommendations

Because DSE Graph heavily relies on several different workloads, it’s important to follow the previous recommendations for the specific workload. If you use DSE Search or DSE Analytics with DSE Graph, lower the chunk cache and disable AIO for the best performance. If you use DSE Graph only on top of Apache Cassandra, increase the chunk cache gradually, leaving 15-20% of memory available for other processes.

Chunk cache differences from OS page cache

There are several differences between the chunk cache and the OS page cache, and a full description is outside the scope of this information. However, the following differences are relevant to DSE:

  • Because the OS page cache is sized dynamically by the operating system, it can grow and shrink depending on the available server memory. The chunk cache must be sized statically.

    If the chunk cache is too small, the available server memory will be unused. For servers with large amounts of memory (50 GB or more), the memory is wasted. If the chunk cache is too large, the available memory on the server can reduce enough that the OS will kill the DSE process to avoid an out of memory issue.

    Note: At the time of writing, the size of the chunk cache cannot be changed dynamically so to change the size of the chunk cache the DSE process must be restarted.

  • Restarting the DSE process will destroy the chunk cache, so each time the process is restarted, the chunk cache will be cold. The OS page cache only becomes cold after a server restart.
  • The memory used by the file cache is part of the DSE process memory, and is therefore seen by the OS as user memory. However, the OS page cache memory is seen as buffer memory.
  • The chunk cache uses mostly NIO direct memory, storing file chunks into NIO byte buffers. However, NIO does have an on-heap footprint, which DataStax is working to reduce.

Chunk cache history

The chunk cache is not new to Apache Cassandra, and was originally intended to cache small parts (chunks) of SSTable files to make read operations faster. However, the default file access mode was memory mapped until DSE 5.1, so the chunk cache had a secondary role and its size was limited to 512 MB.
Note: The default setting of 512 MB was configured by thefile_cache_size_in_mbparameter in cassandra.yaml.

In DSE 6.0 and later, the chunk cache has increased relevance, not just because it replaces the OS page cache for database read operations, but because it is a central component of the asynchronous thread-per-core (TPC) architecture.

By default, the chunk cache is configured to use the following portion of the max direct memory:
  • One-half (½) of the max direct memory for the DSE process
  • One-fourth (¼) of the max direct memory for tools
The max direct memory is calculated as one-half (½) of the system memory minus the JVM heap size:
Max direct memory = ((system memory - JVM heap size))/2

You can explicitly configure the max direct memory by setting the JVM MaxDirectMemorySize (-XX:MaxDirectMemorySize) parameter. Alternatively, you can override the max direct memory setting by explicitly configuring the file_cache_size_in_mb parameter in .

Related Articles

migration
proxy
datastax

GitHub - datastax/zdm-proxy: An open-source component designed to seamlessly handle the real-time client application activity while a migration is in progress.

datastax

11/1/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra