Skip to main content

Cache

To improve the performance, JuiceFS supports caching in multiple levels to reduce the latency and increase throughput, including metadata cache, data cache and cache sharing among a group of clients.

Metadata Cache

JuiceFS caches metadata in the kernel and memory of client to improve the performance.

Metadata Cache in Kernel

Three kinds of metadata can be cached in kernel: attribute, entry (file) and direntry (directory). The timeout is configurable through the following parameters:

--attrcacheto=ATTRCACHETO
attribute cache timeout, default 1s
--entrycacheto=ENTRYCACHETO
file entry cache timeout, default 1s
--direntrycacheto=DIRENTRYCACHETO
directory entry cache timeout, default 1s

Attribute, entry and direntries are cached for 1 second by default, to speedup lookup and getattr operations.

Metadata Cache in Client Memory

In some cases that the client needs to list and query the meta server frequently, client can cache all the dentryies in visited directory in memory, which can be enalbed by the following parameter:

--metacache         cache metadata in client memory

When enabled, the visited directory are cached in client memory for 5 minutes. The cache will be invalidated upon any change to the directory to guarantee consistency. The meta cache will improve performance for operations such as lookup, getattr, access, open and etc.

Besides that, client will also cache symbolic links. Since symbolic links are immutable (new symbolic is created when overwriting the existing one), the cached content will never expire.

Consistency

If only one client is connected, the cached metadata will be invalidated automatically upon modification. No impact on consistency.

In case multiple clients, the only way to invalidate metadata cache in the kernel is waiting for timeout. Metadata cache in client memory is invalidated automatically on modification, however, asynchronously.

In extreme condition, it is possible that the modification made in client A is not visible to client B in a short time window.

Data Cache

Data cache is also provided in JuiceFS to improve performance, including page cache in the kernel and local cache in client host.

Data Cache in Kernel

Kernel will cache content of recently visited files automatically. When the file is reopened, the content can be fetched from kernel cache directly for best performance.

JuiceFS meta server tracks a list of recently opened files. If one file is opened by the same client again, meta server will judge whether the kernel cache is valid by checking the modification history and provide it to client, so that the client can always read the latest data.

Reading the same file in JuiceFS repeatedly will be extremely fast, with milliseconds latency and gigabytes throughput.

Write cache in the kernel is not enabled in the current JuiceFS client, all write operations from the application will be passed from FUSE to client directly. Start from Linux kernel 3.15, FUSE supports \"writeback-cache mode\", which means the write() syscall can often complete very fast. You could enable writeback-cache mode by -o writeback_cache option when run juicefs mount command. It\'s recommended enable it when write very small data (e.g. 100 bytes) frequently.

Read Cache in Client

The client will perform prefetch and cache automatically to improve sequence read performance according to the read mode in the application.

Data will be cached in the local file system. Any local file system based on HDD, SSD or memory is fine.

Local cache can be configured with the following parameters:

--cache-dir=CACHEDIR
cache directory, default in /var/jfsCache
--cache-size=CACHESIZE
cache size, default 1 GiB
--free-space-ratio=<free_space_ratio>
minimum size of the cache directory (ratio), default is 0.2
--cache-partial-only
only cache the blocks for small reads, default is false

JuiceFS client will write the data downloaded from object storage (including also the data newly uploaded) into cache directory, uncompressed and no encryption. Since JuiceFS will generate a unique key for all data written to object storage, and all objects are immutable, the cache data will never expire. When cache grows over the size limit (or disk full), it will be automatically cleaned up. The current rule is create-time and file size, older and larger file will be cleaned first.

Local cache will effectively improve random read performance. It is recommended to use faster speed storage and larger cache size to accelerate the application that requires high performance in random read, e.g. MySQL, Elasticsearch, ClickHouse and etc.

Write Cache in Client

The Client will cache the data written by application in memory. It is flushed to object storage until a chunk is filled full or forced by application with close()/fsync() or after a while. When an application calls fsync() or close(), the client will not return until data is uploaded to object storage and meta server is notified, ensuring data integrity. Asynchronous uploading may help to improve performance if local storage is reliable. In this case, close() will not be blocked while data is being uploaded to object storage, instead it will return immediately when data is written to local cache directory.

Asynchronous upload can be enabled with the following parameter:

--writeback         Upload data asynchronously after written to local cache

When there is a demand to write lots of small files in a short period, --writeback is recommended to improve write performance. After the job is done, remove this option and remount to disable it. For the scenario with massive random write (for example, during MySQL incremental backup), --writeback is also recommended.

::: warning ::: title Warning :::

When --writeback is enabled, never delete content in /var/jfsCache/<fs-name>/rawstaging. Otherwise data will get lost. :::

Note that when --writeback is enabled, the reliability of data write is somehow depending on the cache reliability. It should be used with caution when reliability is important.

--writeback is disabled by default.

Group Cache Sharing

When clients in the same cluster need to access the same dataset repeatly, e.g. repeating training on the same dataset in machine learning, JuiceFS provides cache sharing to improve the performance. It can be enabled with the following parameters:

--cache-group=CACHEGROUP
Share cached data among clients in the same group

For clients in the same LAN mounting the same file system, when cache group is enabled, it will report a random listening port in intranet to meta server and meanwhile discover other clients in the same group and communicate with each other.

When a client needs to access a specified data block, it will ask the node that owns the block and reading from its cache (or reading from object storage and write to cache). The clients in one cache group consist of a ring of consistent hashing, similar to Memcached. When a new client join or an old client leaves the group, only a few data cached will be affected.

Group cache is suitable for deep learning scenario based on GPU cluster. By caching the training dataset into the memory of all nodes, the access performance will be so high that it will not be the bottleneck for GPU.

Independent Cache Cluster

If the computing cluster is dynamically created and scaled, the object storage can be accelerated by creating an independent cache cluster. It is based on the Group Cache Sharing <client_cache_sharing>{.interpreted-text role="ref"}, and achieved by adding the mount parameter --no-sharing to the computing cluster. Compute nodes that add the --no-sharing parameter and have the same --cache-group configuration will not participate in the establishment of the cache cluster, but will only read data from the cache cluster.

Suppose there is a dynamic computing cluster A and a cluster B dedicated to caching. Both of them need to add the same mount parameter --cache-group=CACHEGROUP to build a cache group, in which cluster A nodes need to add the --no-sharing parameter when mounting. Cluster B needs to be configured with enough cache disks (SSD is recommended) and high enough network bandwidth is required.

When the application of cluster A reads data, if there is no cache data in the memory and cache disk of the current node, it will select a node from cluster B to read the data according to the consistent hashing.

At this time, there will be three levels of cache: the system cache of the computing node, the disk cache of the computing node, and the disk cache of a node in the cache cluster B (system cache is also effective). The cache media and cache size at each level can be configured according to the access characteristics of the specific application.

When you need to access a fixed data set, you can use juicefs warmup <warmup>{.interpreted-text role="ref"} to warm up the data set in advance to improve the performance when accessing the data for the first time.

When new data is written, it will be written directly to the underlying object storage by the node that writes the data.

Frequent Asked Questions

Why 60 GiB disk spaces are occupied while I set cache size to 50 GiB?

It is difficult to calculate the exact disk space used by cached data in local file system. Currently, JuiceFS estimates it by accumulating all sizes of cached objects with a fixed overhead (4 KiB). It may be the different than the result from du command.

When the free space of the file system where is low, JuiceFS will remove cached objects to avoid filling out.