juicefs.cache-dir | memory | Local cache directory, default to process memory, can specify multiple directories separate by : , or use wildcards * . When using local directories, you should create them in advance and give 0777 permission so components could share cache data. This option is the same meaning as --cache-dir . |
juicefs.cache-size | 100 | Cache capacity in MiB. Default size is small because Hadoop SDK uses memory as default cache location. This option is the same meaning as --cache-size . |
juicefs.cache-replica | 1 | Number of nodes that a Block can be scheduled on. Hadoop applications support data locality scheduling by checking data blocks' BlockLocation attribute, so setting a higher replica will allow blocks to be put on more nodes, hence increasing compute task concurrency. Block size is controlled by juicefs.block.size configuration. |
juicefs.cache-group | | Cache group name for distributed cache. Nodes within the same group share cache, disabled by default. Recommended for applications like Spark where perfect data locality isn't available. |
juicefs.no-sharing | false | When inside a cache group, only fetch cache data from others, but never share its own cache. Use this option on ephemeral mount points (like Kubernetes Pod). |
juicefs.cache-full-block | true | Cache full sized data block, default to true. Disable this when you need to frequently access a same set of small files, or when disk throughput is smaller th an object storage throughput. This option is the opposite meaning as --cache-partial-only . |
juicefs.memory-size | 300 | Maximum memory for read write buffer in MiB, same meaning as --buffer-size . |
juicefs.auto-create-cache-dir | true | Whether to create cache directories automatically. When set to false, non-existent cache directories will be ignored, effectively disabling cache. |
juicefs.free-space | 0.2 | Minimum free space ratio. When free space is under this ratio, it will clear the cache to free disk space, default to 20%. This option is the same meaning as --free-space-ratio . |
juicefs.metacache | true | Enable metadata cache. |
juicefs.discover-nodes-url | | Specify the node discovery API, the node list will be refreshed every 10 minutes. Node list is also used as a whitelist for the cache group, only nodes in this list can join the cache group. Use this method to prevent clients outside the computing cluster from joining the cache group, hindering the distributed cache group performance (read cache group troubleshooting for more).
- All nodes:
all , this mode disables auto discovery, hence data locality scheduling isn't available because there's no way to generate BlockLocation - YARN:
yarn - Spark Standalone:
http://spark-master:web-ui-port/json/ - Spark ThriftServer:
http://thrift-server:4040/api/v1/applications/ - Presto:
http://coordinator:discovery-uri-port/v1/service/presto/ - File system:
jfs://{VOLUME}/etc/nodes , you need to create this file manually, and write the hostname of the node into this file line by line For Kerberos clusters, only "All nodes" and "File system" configurations are supported. |
juicefs.hflush-delay | 0 | Delay hflush (in ms) operations so that data writes is consolidated, this results in fewer object storage PUT requests while increasing overall throughput. Typically used to increase HBase WAL. |
juicefs.write-group-cache | false | Build distributed cache for newly written blocks. Same meaning as --fill-group-cache . |
juicefs.cache-priority Added in v5.0.14 | 0 | The priority of the cache block. The available values are: 0, 1, 2, and 3. The larger the number, the higher the priority. When cache is evicted, data with lower priority will be evicted first. |
juicefs.entry-cache | 0.0 | File entry cache timeout in seconds. |
juicefs.dir-entry-cache | 0.0 | Directory entry cache timeout in seconds. |
juicefs.attr-cache | 0.0 | File attribute cache timeout in seconds. |
juicefs.block.size | dfs.blocksize or 128MB | Logical block size for Hadoop SDK, controls task data sizes for applications like Spark. |
juicefs.cache-group-size | 4 * juicefs.block.size | JuiceFS Client performs readahead and prefetch, so for files smaller than this size, client will try to schedule all its data blocks into a single node, to maximize cache utilization. |