Cache

JuiceFS comes with a powerful cache design, read more in JuiceFS Community Edition, JuiceFS Cloud Service. This chapter introduces cache related settings and best practices in CSI Driver.

With CSI Driver, you can use either a host directory or a PVC for cache storage. Their main differences are in isolation level and data locality rather than performance. Here is a breakdown:

Host directories (hostPath) are easy to use. Cache data is stored directly on local cache disks, so observation and management are fairly straightforward. However, if Mount Pods (with application Pods) get scheduled to different nodes, all cache content will be lost, leaving residual data that might need to be cleaned up in this process (read sections below on cache cleanup). If you have no special requirements on isolation or data locality, use this method.
If all worker nodes are used to run JuiceFS Mount Pods, and they host similar cache content (similar situation if you use distributed caching), Pod migration is not really a problem, and you can still use host directories for cache storage.
When using a PVC for cache storage, different JuiceFS PVs can isolate cache data. If the Mount Pod is migrated to another node, the PVC reference remains the same. This ensures that the cache is unaffected.

Using host path (`hostPath`)

By default, CSI Driver uses the standard JuiceFS Client cache directory /var/jfsCache on the host, if you intend to use data disk as cache storage, make sure the correct path is configured, otherwise cache can drain the system disk.

Specify --cache-dir in mount options, preferably in ConfigMap, and then CSI Driver will handle the mounts accordingly:

values-mycluster.yaml
...
globalConfig:
  enabled: true
  mountPodPatch:
    - mountOptions:
      - cache-dir=/data/cache
      - cache-size=10T

When Mount Pod starts, it will include the hostPath mounts:

...
    volumeMounts:
    ...
    - mountPath: /data/cache
      name: cachedir-0
  volumes:
  ...
  - hostPath:
      path: /data/cache
      type: DirectoryOrCreate
    name: cachedir-0

If you need to further customize cache related options, check out the option list in JuiceFS Community Edition and JuiceFS Cloud Service.

note

In CSI Driver, cache-dir parameter does not support wildcard character, if you need to use multiple disks as storage devices, specify multiple directories joined by the : character.
For scenarios that involve intensive small writes, we usually recommend users to temporarily enable client write cache, but due to its inherent risks, this is advised against when using CSI Driver, because Pod lifecycle is significantly more unstable, and can cause data loss if Pod exists unexpectedly.

Using ConfigMap

Demostrated in the above code snippets.

Define in PV (deprecated)

Since CSI Driver v0.25.1, cache directories are supported in ConfigMap. Please refer to the previous section to manage all PV settings in a centralized place. The following practice of defining cache directories in PVs is deprecated.

Static provisioning:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  capacity:
    storage: 10Pi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  mountOptions:
    - cache-dir=/dev/vdb1
    - cache-size=204800
  csi:
    driver: csi.juicefs.com
    volumeHandle: juicefs-pv
    fsType: juicefs
    nodePublishSecretRef:
      name: juicefs-secret
      namespace: default

Dynamic provisioning:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: juicefs-sc
provisioner: csi.juicefs.com
parameters:
  csi.storage.k8s.io/provisioner-secret-name: juicefs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-publish-secret-name: juicefs-secret
  csi.storage.k8s.io/node-publish-secret-namespace: default
mountOptions:
  - cache-dir=/dev/vdb1
  - cache-size=204800

Use PVC as cache path

If you have higher demands for cache isolation, or cannot use hostPath for cache due to other reasons, consider using PVC as cache storage.

PVC should be created in advance, and if you are using one of the following service providers, you can refer to their manual:

tip

For custom volumes, make sure mountPath and hostPath doesn't contain duplicates, to avoid conflicts.

Assuming a PVC named jfs-cache-pvc is already created in the same namespace as the Mount Pod (which defaults to kube-system), use the following example to set this PVC as the cache directory for JuiceFS CSI Driver.

Using ConfigMap

The minimum required version is CSI Driver v0.25.1. Upon modification, application Pods need to be re-created for changes to take effect.

When multiple cache directories are used, make sure all items have the same available capacity, and then set --cache-size to the sum.

  - cacheDirs:
      - type: PVC
        name: jfs-cache-pvc
      - type: HostPath
        path: /var/jfsCache
      # emptyDir is supported in v0.26.2 and above
      - type: EmptyDir
        sizeLimit: 1024Mi
        medium: Memory
    mountOptions:
      - cache-size=204800
      - free-space-ratio=0.01
    # Optional, used when you need to customize individual PVCs
    pvcSelector:
      matchLabels:
        need-cachedirs: "true"

Define in PV (deprecated)

Static provisioning:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  capacity:
    storage: 10Pi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: csi.juicefs.com
    volumeHandle: juicefs-pv
    fsType: juicefs
    nodePublishSecretRef:
      name: juicefs-secret
      namespace: default
    volumeAttributes:
      juicefs/mount-cache-pvc: "jfs-cache-pvc"

Dynamic provisioning:

Reference jfs-cache-pvc in StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: juicefs-sc
provisioner: csi.juicefs.com
parameters:
  csi.storage.k8s.io/provisioner-secret-name: juicefs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-publish-secret-name: juicefs-secret
  csi.storage.k8s.io/node-publish-secret-namespace: default
  juicefs/mount-cache-pvc: "jfs-cache-pvc"

Cache warm-up

The JuiceFS client runs inside the Mount Pod, so cache warm-up must be performed inside the Mount Pod. Use the commands below to enter the Mount Pod and carry out the warm-up operation:

# application Pod information will be used in below commands, save them as environment variables.
APP_NS=default  # application Pod namespace
APP_POD_NAME=example-app-xxx-xxx

# Enter the Mount Pod using a single command
kubectl -n kube-system exec -it $(kubectl -n kube-system get po --field-selector spec.nodeName=$(kubectl -n $APP_NS get po $APP_POD_NAME -o jsonpath='{.spec.nodeName}') -l app.kubernetes.io/name=juicefs-mount -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep $(kubectl get pv $(kubectl -n $APP_NS get pvc $(kubectl -n $APP_NS get po $APP_POD_NAME -o jsonpath='{..persistentVolumeClaim.claimName}' | awk '{print $1}') -o jsonpath='{.spec.volumeName}') -o jsonpath='{.spec.csi.volumeHandle}')) -- bash

# Locate the JuiceFS mount point inside pod
df -h | grep JuiceFS

# Run warmup command
juicefs warmup /jfs/pvc-48a083ec-eec9-45fb-a4fe-0f43e946f4aa/data

For dedicated cache cluster scenarios, if you need to automate the warmup process, consider using Kubernetes Job:

warmup-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: warmup
  labels:
    app.kubernetes.io/name: warmup
spec:
  backoffLimit: 0
  activeDeadlineSeconds: 3600
  ttlSecondsAfterFinished: 86400
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: warmup
        app.kubernetes.io/name: warmup
    spec:
      serviceAccountName: default
      containers:
        - name: warmup
          command:
            - bash
            - -c
            - |
              # Below shell code is only needed in on-premise environments, which unpacks JSON and set its key-value pairs as environment variables
              for keyval in $(echo $ENVS | sed -e 's/": "/=/g' -e 's/{"//g' -e 's/", "/ /g' -e 's/"}//g' ); do
                echo "export $keyval"
                eval export $keyval
              done

              # Authenticate and mount JuiceFS, all environment variables comes from the volume credentials within the Kubernetes Secret
              # ref: https://juicefs.com/docs/cloud/getting_started#create-file-system
              /usr/bin/juicefs auth --token=${TOKEN} --access-key=${ACCESS_KEY} --secret-key=${SECRET_KEY} ${VOL_NAME}

              # Mount with --no-sharing to avoid download cache data to local container storage
              # Replace CACHEGROUP with actual cache group name
              /usr/bin/juicefs mount $VOL_NAME /mnt/jfs --cache-size=0 --cache-group=CACHEGROUP

              # Check if warmup succeeds, by default, if any of the data blocks fails to download, the command fails, and client log needs to be check for troubleshooting
              /usr/bin/juicefs warmup /mnt/jfs
              code=$?
              if [ "$code" != "0" ]; then
                cat /var/log/juicefs.log
              fi
              exit $code
          image: juicedata/mount:ee-5.0.2-69f82b3
          securityContext:
            privileged: true
          env:
            - name: VOL_NAME
              valueFrom:
                secretKeyRef:
                  key: name
                  name: juicefs-secret
            - name: ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: access-key
                  name: juicefs-secret
            - name: SECRET_KEY
              valueFrom:
                secretKeyRef:
                  key: secret-key
                  name: juicefs-secret
            - name: TOKEN
              valueFrom:
                secretKeyRef:
                  key: token
                  name: juicefs-secret
            - name: ENVS
              valueFrom:
                secretKeyRef:
                  key: envs
                  name: juicefs-secret
      restartPolicy: Never

Cache and Pod memory usage

In some Kubernetes environments, reading log cache data can increase pagecache usage and potentially cause OOM kills (read this issue for more). When this happens, increasing limits.memory should be your first option.

If your system cannot allocate more memory, you can add the JFS_DROP_OSCACHE=1 environment variable to the JuiceFS client. This setting prompts the client to actively mark the cache data state, so that Kernel evicts cache more aggressively, thereby reducing memory usage. Evidently, this affects cache hit ratio and can hinder performance when cache data needs to be read repeatedly.

apiVersion: v1
kind: Secret
metadata:
  name: juicefs-secret
  namespace: default
type: Opaque
stringData:
  ...
  envs: "{JFS_DROP_OSCACHE: 1}"

Cache cleanup

Local cache can be a precious resource, especially when dealing with large scale data. For this reason, JuiceFS CSI Driver does not delete cache by default when the Mount Pod exits. If this behavior does not fit your needs, you can configure it to clear the local cache when the Mount Pod exits.

Static provisioning

Modify volumeAttributes in PV definition, add juicefs/clean-cache: "true":

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  capacity:
    storage: 10Pi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: csi.juicefs.com
    volumeHandle: juicefs-pv
    fsType: juicefs
    nodePublishSecretRef:
      name: juicefs-secret
      namespace: default
    volumeAttributes:
      juicefs/clean-cache: "true"

Dynamic provisioning

Configure parameters in StorageClass definition, add juicefs/clean-cache: "true":

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: juicefs-sc
provisioner: csi.juicefs.com
parameters:
  csi.storage.k8s.io/provisioner-secret-name: juicefs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-publish-secret-name: juicefs-secret
  csi.storage.k8s.io/node-publish-secret-namespace: default
  juicefs/clean-cache: "true"

Dedicated cache cluster

note

Dedicated cache cluster is only supported in JuiceFS Cloud Service & Enterprise Edition, Community Edition is not supported.

Kubernetes containers are usually ephemeral, a distributed cache cluster built on top of ever-changing containers is unstable, which really hinders cache utilization. For this type of situation, you can deploy a dedicated cache cluster to achieve a stable cache service.

There are currently two ways to deploy a distributed cache cluster in Kubernetes:

For most scenarios, it can be deployed through "Cache Group Operator";
For scenarios that require flexible customization of deployment configuration, you can deploy it through "Write your own YAML configuration file".

Using host path (hostPath)​

Using ConfigMap​

Define in PV (deprecated)​

Use PVC as cache path​

Using ConfigMap​

Define in PV (deprecated)​

Cache warm-up​

Cache and Pod memory usage​

Cache cleanup​

Static provisioning​

Dynamic provisioning​

Dedicated cache cluster​

Using host path (`hostPath`)

Using ConfigMap

Define in PV (deprecated)

Use PVC as cache path

Using ConfigMap

Define in PV (deprecated)

Cache warm-up

Cache and Pod memory usage

Cache cleanup

Static provisioning

Dynamic provisioning

Dedicated cache cluster