GlusterFS vs. JuiceFS

2023-09-21
Juicedata Team

GlusterFS is an open-source software-defined distributed storage solution. It can support data storage of PiB levels within a single cluster.

JuiceFS is an open-source, high-performance distributed file system designed for the cloud. It delivers massive, elastic, and high-performance storage at low cost.

This article compares the key attributes of JuiceFS and GlusterFS in a table and then explores them in detail, offering insights to aid your team in the technology selection process. You can easily see their main differences in the table below and delve into specific topics you're interested in within this article.

A quick summary of GlusterFS vs. JuiceFS

The table below provides a quick overview of the differences between GlusterFS and JuiceFS:

Aspect GlusterFS JuiceFS
Metadata Purely distributed Independent database
Data storage Self-managed Relies on object storage
Large file handling Doesn't split files Splits large files
Redundancy protection Replication, erasure coding Relies on object storage
Data compression Partial support Supported
Data encryption Partial support Supported
POSIX compatibility Full Full
NFS protocol Not directly supported Not directly supported
CIFS protocol Not directly supported Not directly supported
S3 protocol Supported (but not updated) Supported
HDFS compatibility Supported (but not updated) Supported
CSI Driver Supported Supported
POSIX ACLs Supported Not supported
Cross-cluster replication Supported Relies on external service
Directory quotas Supported Supported
Snapshots Supported Not supported (but supports cloning)
Trash Supported Supported
Primary maintainer Red Hat, Inc Juicedata, Inc
Development language C Go
Open source license GPLV2 and LGPLV3+ Apache License 2.0

System architecture comparison

GlusterFS’ architecture

GlusterFS employs a fully distributed architecture without centralized nodes. A GlusterFS cluster consists of the server and the client. The server side manages and stores data, often referred to as the Trusted Storage Pool. This pool comprises a set of server nodes, each running two types of processes:

  • glusterd: One per node, which manages and distributes configuration.
  • glusterfsd: One per brick (storage unit), which handles data requests and interfaces with the underlying file system.

All files on each brick can be considered a subset of GlusterFS. File content accessed directly through the brick or via GlusterFS clients is typically consistent. If GlusterFS experiences an exception, users can partially recover original data by integrating content from multiple bricks. Additionally, for fault tolerance during deployment, data is often redundantly protected. In GlusterFS, multiple bricks form a redundancy group, protecting data through replication or erasure coding. When a node experiences a failure, recovery can only be performed within the redundancy group, which may result in longer recovery times. When scaling a GlusterFS cluster, the scaling is typically performed on a redundancy group basis.

The client side, which mounts GlusterFS, presents a unified namespace to applications. The architecture diagram is as follows (source: GlusterFS Architecture):

GlusterFS architecture
GlusterFS architecture

JuiceFS’ architecture

JuiceFS adopts an architecture that separates its data and metadata storage. File data is split and stored in object storage systems like Amazon S3, while metadata is stored in a user-selected database like Redis or MySQL. By sharing the same database and object storage, JuiceFS achieves a strongly consistent distributed file system with features like full POSIX compatibility and high performance. For details about JuiceFS architecture, see its document.

JuiceFS architecture
JuiceFS architecture

Metadata management comparison

GlusterFS: Metadata in GlusterFS is purely distributed, lacking a centralized metadata service. Clients use file name hashing to determine the associated brick. When requests require access across multiple bricks, for example, mv and ls, the client is responsible for coordination. While this design is simple, it can lead to performance bottlenecks as the system scales. For instance, listing a large directory might require accessing multiple bricks, and any latency in one brick can slow down the entire request. Additionally, ensuring metadata consistency when performing cross-brick modifications in the event of failures can be challenging, and severe failures may lead to split-brain scenarios, requiring manual data recovery to achieve a consistent version.

JuiceFS: JuiceFS metadata is stored in an independent database, which is called the metadata engine. Clients transform file metadata operations into transactions within this database, leveraging its transactional capabilities to ensure operation atomicity. This design simplifies JuiceFS implementation but places higher demands on the metadata engine. JuiceFS currently supports three categories of transactional databases. For details, see the metadata engine document.

Data management comparison

GlusterFS stores data by integrating multiple server nodes' bricks (typically built on local file systems like XFS). Therefore, it provides certain data management features, including distribution management, redundancy protection, fault switching, and silent error detection.

JuiceFS, on the other hand, does not use physical disks directly but manages data through integration with various object storage systems. Most of its features rely on the capabilities of its object storage.

Large file splitting

In distributed systems, splitting large files into smaller chunks and storing them on different nodes is a common optimization technique. This often leads to higher concurrency and bandwidth when applications access such files. - GlusterFS does not split large files (although it used to support Striped Volumes for large files, this feature is no longer supported). - JuiceFS splits files into 64 MiB chunks by default, and each chunk is further divided into 4 MiB blocks based on the write pattern. For details, see How JuiceFS stores files.

Redundancy protection

GlusterFS supports both replication (Replicated Volume) and erasure coding (Dispersed Volume). JuiceFS relies on the redundancy capabilities of the underlying object storage it uses.

Data compression

GlusterFS:

  • Supports only transport-layer compression. Files are compressed by clients, transmitted to the server, and decompressed by the bricks.
  • Does not implement storage-layer compression but depends on the underlying file system used by the bricks, such as ZFS.

JuiceFS supports both transport-layer and storage-layer compression. Data compression and decompression are performed on the client side.

Data encryption

GlusterFS: - Supports only transport-layer encryption, relying on SSL/TLS. - Previously supported storage-layer encryption, but it is no longer supported.

JuiceFS supports both transport-layer and storage-layer encryption. Data encryption and decryption are performed on the client side.

Access protocols

POSIX compatibility

Both GlusterFS and JuiceFS offer POSIX compatibility.

NFS protocol

GlusterFS previously had embedded support for NFSv3 but now it is no longer recommended. Instead, it is suggested to export the mount point using an NFS server. JuiceFS does not provide direct support for NFS and requires mounting followed by export via another NFS server.

CIFS protocol

GlusterFS embeds support for Windows, Linux Samba clients, and macOS CLI access (excluding macOS Finder). However, it is recommended to use Samba for exporting mount points. JuiceFS does not offer direct support for CIFS and requires mounting followed by export via Samba.

S3 protocol

GlusterFS supports S3 through the gluster-swift project, but the project hasn't seen recent updates since November 2017. JuiceFS supports S3 through integration with the MinIO S3 gateway.

HDFS compatibility

GlusterFS offers HDFS compatibility through the glusterfs-hadoop project, but the project hasn't seen recent updates since May 2015. JuiceFS provides full compatibility with the HDFS API.

CSI Driver

GlusterFS previously supported CSI Driver but the latest version was released in November 2018, and the repository is marked as DEPRECATED. JuiceFS supports CSI Driver. For details, see the document.

Extended features

POSIX ACLs

In Linux, file access permissions are typically controlled by three entities: the file owner, the group owner, and others. However, when more complex requirements arise, such as the need to assign specific permissions to a particular user within the others category, this standard mechanism does not work. POSIX Access Control Lists (ACLs) offer enhanced permission management capabilities, allowing you to assign permissions to any user or user group as needed.

GlusterFS supports ACLs, including access ACLs and default ACLs. JuiceFS does not support POSIX ACLs.

Cross-cluster replication

Cross-cluster replication indicates replicating data between two independent clusters, often used for geographically distributed disaster recovery. GlusterFS supports one-way asynchronous incremental replication but requires both sides to use the same version of Gluster cluster. JuiceFS depends on the capabilities of the metadata engine and the object storage, allowing one-way replication.

Directory quotas

Both GlusterFS and JuiceFS support directory quotas, including capacity and/or file count limits.

Snapshots

GlusterFS supports volume-level snapshots and requires all bricks to be deployed on LVM thinly provisioned volumes. JuiceFS does not support snapshots but offers directory-level cloning.

Trash

GlusterFS supports the trash functionality, which is disabled by default. JuiceFS supports the trash functionality, which is enabled by default.

Related Posts

How Clobotics Overcame Multi-Cloud and Massive File Storage Challenges

2024-09-11
Clobotics, a global leader in computer vision technology, enhanced its storage infrastructure with …

vivo Migrated from GlusterFS to a Distributed File System for AI Training

2024-07-18
Learn why vivo's AI training platform switched from GlusterFS to a cloud-native high-performance fi…

Exploring Distributed File Systems: Concepts, Design, and Leading Solutions

2023-06-16 caitin
This article introduces the basic concepts of distributed file systems, key design principles behin…

Migrating massive cold data to the cloud with JuiceFS- a use case from Trip.com

2022-09-19 Miaocheng&Xiaofeng
By using JuiceFS, Trip.com achieves separation of storage and computing. Additionally, the migratio…