JuiceFS v0.15 is released, open metadata backup and restore function, performance is greatly improved!

2021-07-07

Juicedata

After more than one month of development, JuiceFS v0.15 is released. The new version involves more than 60 changes, the most attractive of which is the implementation of metadata backup, restore and migration functions.

This release also improved the performance significantly for read/write heavy workload by utilizing page cache in kernel and FUSE's cache mode.

Special Note: JuiceFS v0.15 is backward-compatible, and the old version can be safely upgraded to this version.

Let us take a look at the changes in the new version:

Metadata export, import and migration

JuiceFS is a filesystem driven by a database and an object storage. When storing files, the data will be stored in the object storage, and the metadata will be stored in a database.

Compared with directly operating the object storage, retrieving and processing metadata in an independent database will achieve better performance, especially in the scenario of processing large-scale data, operating independent stored metadata will bring significant performance improvement.

However, in the previous version, when the filesystem was created, once the metadata engine was selected, it could no longer be modified. In addition, like Redis, although the performance is strong, but the reliability is worrying.

In order to solve these problems, v0.15 has made special research and development, and introduced two sub-commands dump and load. The former can export metadata to a standard JSON file and can be used to backup or migrate metadata. Another command can import the JSON backup to the database.

Backup metadata to JSON file

The following command exports the metadata in the Redis database to a file named meta.json:

$ juicefs dump redis://192.168.1.6:6379 meta.json

Import metadata from JSON file

The following command restores the metadata in the meta.json file to the Redis database:

$ juicefs load redis://192.168.1.6:6379 meta.json

Metadata migration

All metadata engines in JuiceFS can recognize JSON backup files, and JSON backup files exported from any metadata engine can be restored to any other engine.

For example, the following two commands first export metadata from the Redis, and then import the metadata into an empty MySQL to complete the migration of metadata between the two databases:

$ juicefs dump redis://192.168.1.6:6379 meta.json
$ juicefs load mysql://user:password@(192.168.1.6:3306)/juicefs meta.json

For more information on metadata management, please see documentation.

Kernel read cache optimization

When reading a file through the kernel, the kernel will automatically create a page cache of the data that has already been read, and it will continue to be retained even if the file is closed, and the speed will be very fast when repeatedly read.

JuiceFS v0.15 adds the ability to reuse the page cache in the kernel. When opening a file, it will check whether its mtime has changed. If it does not change, the existing page cache in the kernel can be reused.

The following is a simple test.

First, use the dd command to generate a 1GB file on JuiceFS:

$ dd if=/dev/urandom of=/jfs/test.bin bs=1G count=1 iflag=fullblock

Then continue to read the file generated in the previous step through the dd command:

$ time dd if=/jfs/test.bin of=/dev/null bs=128K count=8192
8192+0 records in
8192+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 14.8655 s, 72.2 MB/s

real    0m14.870s
user    0m0.001s
sys 0m0.404s

It can be seen that the read throughput is 72.2MB/s, and the total time is 14.87 seconds. Then run the same command again:

$ time dd if=/jfs/test.bin of=/dev/null bs=128K count=8192
8192+0 records in
8192+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.121848 s, 8.8 GB/s

real    0m0.126s
user    0m0.000s
sys 0m0.123s

At this time, the read throughput has been increased to 8.8GB/s, which is 125 times faster than the first read performance!

For scenarios where it is clear that the data will not change, the parameter --open-cache N can also be used to specify the interval for checking whether the file is updated, which can further save the overhead of checking mtime during open, for AI training, etc. read-only scenarios can be very useful.

Small data high frequency write optimization

For scenarios where small amounts of data are frequently written (such as 128 bytes each time), the write performance of JuiceFS is poor by default, because the default write-through mode requires the data written each time to be written to by FUSE For JuiceFS client, the overhead caused by FUSE at this time will affect the write performance. Starting from v0.15, JuiceFS supports FUSE's "writeback-cache mode". When this mode is turned on, FUSE will first aggregate the written data in the kernel, and will only write the data after it has accumulated to a certain amount (such as 128KB) JuiceFS client, which can significantly reduce the number of FUSE requests and improve write performance. It should be noted that the writeback-cache mode of FUSE depends on the Linux kernel version 3.15 and later, and it needs to be manually enabled through the -o writeback_cache option when mounting the JuiceFS file system.

The following is a simple write test. With the help of dd command, 128 bytes of data are written each time, and a total of 10MB of data is written.

Let's first look at the write performance without enabling writeback-cache mode:

$ time dd if=/dev/zero of=/jfs/small-write.bin bs=128 count=81920
81920+0 records in
81920+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 2.66483 s, 3.9 MB/s

real 0m2.669s
user 0m0.105s
sys 0m0.951s

It can be seen that the write throughput is 3.9MB/s, and the total time is 2.669 seconds. Then remount JuiceFS, turn on the writeback-cache mode, and run the command just now:

$ time dd if=/dev/zero of=/jfs/small-write.bin bs=128 count=81920
81920+0 records in
81920+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.297128 s, 35.3 MB/s

real 0m0.304s
user 0m0.050s
sys 0m0.062s

At this time, the write throughput is 35.3MB/s, which is 9 times higher than the write performance when it is not turned on!

Other changes

Added command auto-completion function (View Document)
The bench command adds the -p option to support parallel benchmark testing.
Added PostgreSQL as meta engine
Added WebDAV as object storage (View Document)
Added --read-only option to realize read-only mount.
Added --subdir option to mount subdirectories.
Added the --log mount option, which can write logs to the specified file in the background mode.