JuiceFS consists of three parts:
JuiceFS Client: All file I/O happens in JuiceFS Client, this even includes background jobs like data compaction and trash file expiration. So obviously, JuiceFS Client talk to both object storage and metadata service. A variety of implementations are supported:
- FUSE, JuiceFS file system can be mounted on host in a POSIX-compatible manner, allowing the massive cloud storage to be used as a local storage.
- Hadoop Java SDK, JuiceFS can replace HDFS and provide massive storage for Hadoop at a significantly lower cost.
- Kubernetes CSI Driver, use JuiceFS CSI Driver in Kubernetes to provide shared storage for containers.
- With S3 Gateway, applications using S3 as the storage layer can directly access JuiceFS file system, and tools such as AWS CLI, s3cmd, and MinIO client are also allowed to be used to access to the JuiceFS file system at the same time.
- With WebDAV Server, files in JuiceFS can be operated directly using HTTP protocol.
Data Storage: File data will be split into chunks and stored in object storage, you can use object storage provided by public cloud services, or self-hosted, JuiceFS supports virtually all types of object storage, including typical self-hosted ones like OpenStack Swift, Ceph, and MinIO.
Metadata Engine: High performance metadata engine that's developed in-house by Juicedata. Metadata Engine stores file metadata, which contains:
- Common file system metadata: file name, size, permission information, creation and modification time, directory structure, file attribute, symbolic link, file lock.
- JuiceFS specific metadata: file inode, chunk and slice mapping, client session, etc.
Juicedata already deployed metadata service in most public cloud regions, as Cloud Service users, you will be using metadata service via public internet (in the same region), but if you are using JuiceFS in larger scale and require even better access latency, contact JuiceFS team to bring private internet support via VPC peering.
How JuiceFS Stores Files
The file system acts as a medium for interaction between user and hard drive, which allows files to be stored on the hard drive properly. As is widely known, FAT32 and NTFS are commonly used on Windows, while Ext4, XFS and Btrfs are commonly used on Linux. Each file system has its own unique way of organizing and managing files, which determines the file system features such as storage capacity and performance.
The strong consistency and high performance of JuiceFS is ascribed to its special file management model. Traditional file systems use local disks to store both file data and metadata, while JuiceFS formats data first and then store them in object storage, with the corresponding metadata being stored in dedicated metadata engine.
Each file stored in JuiceFS is split into "Chunk"(s) at a fixed size with the default upper limit of 64 MiB. Each Chunk is composed of one or more "Slice"(s), and the length of the slice varies depending on how the file is written. Each slice is composed of size-fixed "Block"(s), which are 4 MiB by default. These blocks will be stored in object storage in the end; at the same time, the metadata information of the file and its Chunks, Slices, and Blocks will be stored in metadata engines via JuiceFS.
Using JuiceFS, files will eventually be split into Chunks and stored in object storage. Therefore, you may notice that the original files stored in JuiceFS cannot be found directly in the object storage, instead, you'll only see a directory of chunks and a bunch of numbered directories and files in the bucket. Don't panic! That's exactly what makes JuiceFS a high-performance file system.