Skip to main content

What is JuiceFS?

JuiceFS is a high-performance shared filesystem designed for cloud environments, released under AGPL v3.0 license. Provides complete POSIX compatibility, which can use massive and low-cost cloud storage as a local disk, and can also be mounted and read by multiple hosts at the same time.

Using JuiceFS to store data, the data itself will be persisted in object storage (e.g. Amazon S3), and the metadata corresponding to the data will be persisted in various databases such as Redis, MySQL, and SQLite according to your needs.

JuiceFS provides a wealth of APIs that can seamlessly connect to the big data, machine learning, artificial intelligence and other application platforms that have been put into production without modifying the code, and provide them with massive, flexible, and low-cost high-performance storage.

Core Features

  1. POSIX compatible: Use like a local file system, seamlessly docking with existing applications, without business intrusion;
  2. HDFS compatible: Fully compatible with HDFS API, ideal for big data cluster to achieve storage and computer disaggregated;
  3. S3 compatible: Provide S3 Gateway to achieve S3-compatible interface;
  4. Cloud Native: JuiceFS can be easily used in Kubernetes through the Kubernetes CSI Driver;
  5. Sharing: The same file system can be mounted on thousands of servers at the same time, high-performance concurrent reading and writing, and data sharing;
  6. Strong consistency: The confirmed modification will be immediately visible on all servers mounted with the same file system to ensure strong consistency;
  7. Strong performance: millisecond latency, almost unlimited throughput (depending on the scale of object storage);
  8. Data Security: Support encryption in transit and encryption at rest;
  9. File Lock: Support BSD lock (flock) and POSIX lock (fcntl);
  10. Data compression: Support using LZ4 or Zstandard to compress data and save your storage space;

Tips: JuiceFS is suitable for the management, analysis, archiving, and backup of data in all file formats. In particular, it can support the data storage needs of big data analysis and machine learning. It\'s also a scalable replacement for NAS.

The POSIX-compatibility provides a seamless migrating experience. You can easily replace your existing solutions (such as: local disk, NFS, or HDFS) with JuiceFS with zero cost, and you no longer need to maintain the system by yourselves. Your team can focus exclusively on product development with all the hassles of managing a large scale storage system gone.

Architecture

JuiceFS file system consists of three parts:

  • JuiceFS Client: Coordinate the implementation of object storage and metadata storage engines, as well as file system interfaces such as POSIX, Hadoop, Kubernetes, and S3 gateway.
  • Data Storage: Store the data itself, support local disk and object storage.
  • Metadata Engine: Metadata corresponding to the stored data, supporting multiple engines such as Redis, MySQL, and TiKV;

image

As a file system, JuiceFS will process data and its corresponding metadata separately, the data will be stored in the object storage, and the metadata will be stored in the metadata engine.

In terms of data storage, JuiceFS supports almost all public cloud object storage services, as well as privatized object storage such as OpenStack Swift, Ceph, and MinIO.

In terms of metadata storage, JuiceFS adopts a multi-engine design, and currently supports Redis, TiKV, MySQL/MariaDB, PostgreSQL, SQLite as metadata service engines, and will continue to implement more metadata engine. Welcome to Submit Issue to feedback your needs!

In terms of the implementation of file system interface:

  • With FUSE, the JuiceFS file system can be mounted to the server in a POSIX compatible manner, and the massive cloud storage can be used directly as local storage.
  • With Hadoop Java SDK, the JuiceFS file system can directly replace HDFS, providing Hadoop with low-cost mass storage.
  • With Kubernetes CSI Driver, the JuiceFS file system can directly provide mass storage for Kubernetes.
  • Through S3 Gateway, applications that use S3 as the storage layer can be directly accessed, and tools such as AWS CLI, s3cmd, and MinIO client can be used to access the JuiceFS file system.

Use Cases

JuiceFS is designed for massive data storage. It can be a replacement of many distributed filesystem and network filesystem, especially in the following scenarios:

  • Big Data Analytics: JuiceFS has the same interface as accessing local files, so your application will not be bounded to external APIs. It can also work seamlessly with popular distributed computation frameworks such as Apache Spark, Hadoop, Hive, etc. It has unlimited expandable storage space, and you do not need to maintain the service by yourself. The ability to provide high concurrencies and high throughput will let JuiceFS meet the performance need of data analytics.
  • Shared Workspace: JuiceFS does not have any VPC limitation so you can mount it to any machines. There is no limitation on concurrent read and write. POSIX API is compatible with all your existing data stream and scripts.
  • Shared Volume Storage Between Container Clusters: JuiceFS perfectly satisfies the need of persistent storage of container volumes, and it is independent from the life cycle of containers. The strong consistency assures the correctness of your data. JuiceFS will also make it easier for you to create stateless services.
  • Backup: POSIX is the most friendly and the most familiar interface for engineers. It is as easy as managing files on local disks. The storage space can be expanded seamlessly to any size you need. Replication across regions and clouds helps you build your global infrastructure. The flexible requirements allows JuiceFS can be blended into your existing architecture without compromises. Snapshot can be used to recover and validate your data.

Privacy

JuiceFS is open source software that uses object storage and database to store data and its corresponding metadata respectively. The data you store will be split into data blocks according to certain rules and stored in the object storage you define, and the metadata corresponding to the data is stored in the database you define. If necessary, you can read the code of JuiceFS to further understand and audit the design of data privacy.

Note: If you are using the official JuiceFS hosted service, your data is still stored in the object storage you define, but the metadata corresponding to the data will be stored in the high-availability metadata storage cluster officially provided by JuiceFS.