Skip to main content

Single Node Benchmark

Testing Approach

Perform a single node benchmark on JuiceFS by fio.

Testing Tool

The following tests were performed by fio 3.1.

Fio has many parameters that can affect the test results. In the following tests, to provide a performance reference, we will follow the principle: try to use the default values of corresponding parameters, and do not optimize for a specific system, hardware, etc. The following parameters will apply:

ItemsDescription
--nameName of the job.
--directoryPrefix filenames with this directory. In this case, it will be the mounting point of JuiceFS, we will use /jfs here.
--rwType of I/O pattern, we will use read (sequential reads), write (sequential writes) here.
--bsThe block size in bytes used for I/O units.
--sizeThe total size of file I/O for each thread of this job.
--filesizeIndividual file sizes, will use on small files sequential read and write test.
--numjobsNumber of concurrent jobs, fio use multi-process by default.
--nrfilesNumber of files to use for this job.
--refill_buffersBy default, fio will reuse the testing files that generated at the beginning of each job. If this option is given, fio will refill the I/O buffers on every submit. That can ensure the generated test files have sufficient randomness.
--file_service_typeDefines how fio decides which file from a job to service next. This parameter can choose from random, roundrobin, sequential. This parameter is used in small file sequential read and write tests to ensure that fio is read and write files one after the other, without parallel file operations.

Since JuiceFS uses Zstandard for compression when writing data to object storage, --refill_buffers is used for each test task below to make the data generated by fio is as random and irregular as possible and achieve a low compression ratio to simulate the poor performance scenario as the real circumstance. In other words, in the production environment, JuiceFS's performance is mostly better than the performance of this test.

Testing Environment

Performance will vary based on differences in cloud service providers, object storage, virtual machine types, and operating systems.

In the following test results, the JuiceFS file system are created on the AWS us-west2 zone S3 (see the Getting Started for creation), all fio tests based on the c5d.18xlarge EC2 instance (72 CPU, 144G RAM), Ubuntu 18.04 LTS (Kernel 4.15.0) system.

The reason for choosing a c5d.18xlarge model is because of its 25Gbit network and ability to enable the Elastic Network Adapter (ENA) to increase EC2 to S3 communication bandwidth to 25Gbps that can ensure the network bandwidth won't be the bottleneck.

The following tests use the default configuration to mount JuiceFS without special instructions. (see the Getting Started for creation).

Big File Read and Write Test

In many scenarios, such as log collection, data backup, and big data analysis, large file sequential reading and writing is essential. This is also a typical scenario to use JuiceFS.

The JuiceFS block size is the main factor affecting sequential read and write throughput. The larger the block size, the stronger the sequential read and write throughput, see the test results below.

Note: Here you need to create and mount the JuiceFS file system with different block sizes in advance, and replace --directory with the corresponding JuiceFS mount point in the test script.

Big File Sequential Read

big-file-seq-read-2019

fio --name=big-file-sequential-read \
--directory=/jfs \
--rw=read --refill_buffers \
--bs=256k --size=4G

Big File Sequential Write

big-file-seq-write-2019

fio --name=big-file-sequential-write \
--directory=/jfs \
--rw=write --refill_buffers \
--bs=256k --size=4G

Big File Concurrent Read

big-file-multi-read-2019

fio --name=big-file-multi-read \
--directory=/jfs \
--rw=read --refill_buffers \
--bs=256k --size=4G \
--numjobs={1, 2, 4, 8, 16}

Big File Concurrent Write

big-file-multi-write-2019

fio --name=big-file-multi-write \
--directory=/jfs \
--rw=write --refill_buffers \
--bs=256k --size=4G \
--numjobs={1, 2, 4, 8, 16}

Concurrent write in a single process has already reached the bandwidth limit of AWS EC2 and S3 with 600MB/s. To further exploit the advantages of JuiceFS, it is recommended to use a multi-machine parallel approach, especially in big data computing scenarios.

Big File Random Read

big-file-rand-read-2019

fio --name=big-file-rand-read \
--directory=/jfs \
--rw=randread --refill_buffers \
--size=4G --filename=randread.bin \
--bs={4k, 16k, 64k, 256k} --pre_read=1

sync && echo 3 > /proc/sys/vm/drop_caches

fio --name=big-file-rand-read \
--directory=/jfs \
--rw=randread --refill_buffers \
--size=4G --filename=randread.bin \
--bs={4k, 16k, 64k, 256k}

In order to accurately test the performance of large file random read, here we first pre-read the file using fio, then drop the kernel cache (including PageCache, dentries, inodes cache), and then use fio to perform random read test.

In the scenario where frequently read large files randomly, for better performance, it is recommended to set the cache size of the mount parameter to be larger than the file size to be read.

Big File Random Write

big-file-rand-write-2019

fio --name=big-file-random-write \
--directory=/jfs \
--rw=randwrite --refill_buffers \
--size=4G --bs={4k, 16k, 64k, 256k}

Small File Read and Write Test

Small File Sequential Read

JuiceFS enables the 1G local data cache by default when mounted. The cache can considerably improve the IOPS of small file reads. You can turn off the cache by adding --cache-size=0 when you mount JuiceFS. Here is a performance comparison.

small-file-seq-read-2019

fio --name=small-file-seq-read \
--directory=/jfs \
--rw=read --file_service_type=sequential \
--bs={file_size} --filesize={file_size} --nrfiles=1000

Small File Sequential Write

Add --writeback when mounting JuiceFS for the client-side write cache (for details, see cache), which can considerably improve performance on small files sequential write. See the following comparison test.

By default, the fio test approach will leave the file closing operation at the end of the task, which may result in the loss of data in the distributed file system due to factors such as network anomalies. So we use --file_service_type=sequential for fio, then fio will ensure that a file is written (execute flush & close, write all the data to the object storage) and then proceed a file in the test task.

small-file-seq-write-2019

fio --name=small-file-seq-write \
--directory=/jfs \
--rw=write --file_service_type=sequential \
--bs={file_size} --filesize={file_size} --nrfiles=1000

Small File Concurrent Read

small-file-multi-read-2019

fio --name=small-file-multi-read \
--directory=/jfs \
--rw=read --file_service_type=sequential \
--bs=4k --filesize=4k --nrfiles=1000 \
--numjobs={1, 2, 4, 8, 16}

Small File Concurrent Write

small-file-multi-write-2019

fio --name=small-file-multi-write \
--directory=/jfs \
--rw=write --file_service_type=sequential \
--bs=4k --filesize=4k --nrfiles=1000 \
--numjobs={1, 2, 4, 8, 16}

In the multi-process method used by the Fio test task, the performance is linearly related to the number of concurrent processes, with a small amount of attenuation as the number of processes increases.