Shared Block Device
Due to JuiceFS' decoupled architecture, reads and writes generally involve metadata access and object storage access. The latter often has higher latency. This is why in JuiceFS, the performance of small files or large amounts of random reads/writes is significantly worse than sequential reads/writes. If you have high demands for read/write performance and your infrastructure supports shared block storage, such as AWS Multi-Attach, you can use the shared block device support introduced in JuiceFS 5.0.
A shared block device is a block device that can be mounted on multiple cloud hosts. Typical local file systems cannot be used directly on a shared block device, as it requires a clustered file system. JuiceFS supports shared block devices, allowing you to use a shared block device (an unformatted raw disk) as a file system storage and configure different storage policies, such as:
- Set the slice size threshold (a slice is a continuous write, with a length ranging from 0 to 64MB. For details, see our architecture). Slices below this threshold are written to the block device; otherwise, they are uploaded directly to object storage.
- Set the retention time on the block device. All data stored for longer than the specified time is automatically transferred to object storage.
- Set the slice size threshold for transferring to object storage. Slices smaller than the threshold are always stored on the block device.
Although the feature is named "shared block device," JuiceFS supports the use of any unformatted block device, not necessarily a multi-mount shared block device. Therefore, you can also use regular single-mounted cloud disks in conjunction with this feature. Since it is single-mounted, other clients cannot see the files on the block device. This type of block device is suitable for scenarios where JuiceFS is accessed by only one mounting point.
Typical scenarios for using shared block devices in JuiceFS include:
- Intensive small file writes / random writes, such as exploratory data analysis (EDA), data processing before computer vision (CV) training, and performing tasks in Elasticsearch and ClickHouse.
- Low-latency requirements for both reads and writes, using block devices for data storage, or even configuring block devices as permanent storage.
- Shared block devices offer good write performance and support multiple-node mounts. This makes them suitable for solving the pain point of write cache (
--writeback
), where pending data is not visible to other clients. The specific usage is to mount the same shared block device on multiple client nodes, not enable write caching, and write data normally. This is because the write performance of the block device itself is good enough. In a shared block device scenario, there is no need for write caching. Data is written directly to the shared device which is visible to other clients.
Note that:
- You do not need to format the file system. JuiceFS' metadata engine directly manages raw block devices.
- You can scale out your shared block device online by adding disks or increasing capacity of existing disks.
- All clients must mount the disk at the same path.
- Currently, the multi-zone approach is not supported. On a single zone, the storage limit is 500 million files.
The support for shared block devices is currently in the public testing phase. If you need it, contact Juicedata engineers for assistance.