JuiceFS v1.0 RC1 is released, the optimization of the metadata migration and backup tool is remarkable

Juicedata 2022.06.17

JuiceFS v1.0 RC1 is officially released today. In this version, the most notable aspect is the optimization of the metadata migration and backup tool dump/load. This optimization request came from a heavy community user who encountered high memory usage when migrating metadata with hundreds of millions of files from Redis to TiKV. JuiceFS received this feedback and started to optimize it, and finally reduced the memory needed for dump by 95% and for load by 80%.

Here’s a detailed explanation of the main changes in JuiceFS v1.0 RC1.

Optimization of the metadata migration backup tool: dump/load command

When implementing the dump command, the first step was to load all the data in the metadata engine into the memory of the JuiceFS client, creating a snapshot as a read-only version. Then the command would output the data to a specified file according to the tree-structured file system. v1.0 RC1 optimizes the dump process when Redis is used as the metadata engine. Instead of taking an entire snapshot, this new version spawns several threads to do read-ahead while writing data to the output file, which would save 95% memory and increase speed by 100%. When backing up metadata from SQL and TiKV, a single transaction is used to read the data to ensure consistency across the file system.

When implementing the load command, it loads the entire metadata collection before importing it concurrently into the metadata engine. V1.0 RC1 optimizes all metadata engines, which realizes loading stream data. Again it saves 80% memory and improves speed by 25%.

Let’s take a look at an example. Using Redis as the metadata engine, the performance of v1.0 Beta3 compared to v1.0 RC1 when executing the command Dump & Load 10 million files is as follows.

Many users start out using Redis as their metadata engine. And as their volume of data grows, they may need to migrate the metadata to TiKV or SQL engines. These optimizations ensure the high efficiency of metadata migration when users’ metadata reaches hundreds of millions of files.

We will also introduce the technical details of this dump/load optimization recently, so stay tuned.

New object storage testing tool: objbench

Object storage is where JuiceFS stores the data . When JuiceFS users encounter problems, they are often unsure whether it is a problem with JuiceFS or object storage. So we added the objbench command in v1.0 RC1 to help users verify whether an object storage is supported by JuiceFS and test its performance when used together. Please refer to the documentation for details.

Support for interfacing with Pyroscope, a continuous profiling platform

Inspired by a blog on observability practices, we came up with the idea of interfacing with a continuous profiling analysis tool. By this version, JuiceFS could only inspect the problem through the pprof.

Now JuiceFS has interfaced with Pyroscope, an open source continuous profiling platform. Through which users can record and analyze the running state of JuiceFS, such as the CPU usage time, size of object allocation and other details during a certain period of time.

How to use Pyroscope in JuiceFS, please refer to documentation

Other new features

  1. Support for SQL database, etcd for data storage, please refer to documentation.
  2. Support for juicefs info command to find the full path of a file according to its inode.
    • Note: When searching for files created before v1.0 RC1, the path may not be found or the path may be incomplete.
  3. Add progress bar for juicefs rmr and juicefs warmup commands, and allow interrupt operation.
    • Note: When using the JuiceFS client of v1.0 RC1 to operate the mount points before v1.0 RC1, it will show that there is no progress all the time, but the actual command can be executed successfully.

Other Adjustments

  1. The stability of the SQL metadata engine has been improved significantly. During the stress testing of JuiceFS S3 gateway, we found some issues with the SQL metadata engine under high-load conditions, including several bugs in the ORM framework, which were fixed and fed back to upstream.
  2. The number of single cleanups of recycle bin and file cache is limited in this version so as to improve stability under large-scale applications.
  3. Support using juicefs warmup command inside containers.
  4. Improve the performance of juicefs rmr command and reduce memory usage.
  5. juicefs sync command has been enhanced, improving the experience when copying data through SSH and fixing several bugs.
  6. Support dynamic modification of Access Key and Secret Key of datastore by juicefs config command.
  7. Numerous error log description optimizations.

Bug Fix

  • Fixed the problem that juicefs sync did not print the error log when reading the source file failed.
  • Fixed the problem that read-only clients cannot execute warmup.
  • Fixed the problem of the high frequency of the transactions conflicts caused by Slice 0 when deleting a large number of files.
  • Fixed the problem that some operations are not properly locked when using the SQL database as metadata engine.
  • Fixed the problem of panic in the JuiceFS client caused by nil connections when using TiKV as the metadata engine.
  • Fixed the problem of panic in the JuiceFS client caused by listing metadata backup failure.

Upgrading Suggestions

  • If you use any SQL database (such as MySQL, SQLite, PostgreSQL, etc) as the metadata engine of JuiceFS: Upgrade is HIGHLY recommended.
  • Others: Upgrade is recommended.

Please click HERE to download this new version.

Thanks to Contributors :

Helix-loop(new contributor), Zhouaoe(new contributor), solracsf, showjason, Davies Liu, Zhou Cheng, Sandy Xu, zhijian, sanwan, Changjian Gao, tangyoupeng, Herald Yu, chnliyong, Rui Su, Ray, JennyA, Fuyang Liu